Dev Builds » 20240528-1634

You are viewing an old NCM Stockfish dev build test. You may find the most recent dev build tests using Stockfish 15 as the baseline here.

Use this dev build

NCM plays each Stockfish dev build 20,000 times against Stockfish 14. This yields an approximate Elo difference and establishes confidence in the strength of the dev builds.

Summary

Host Duration Avg Base NPS Games WLD Standard Elo Ptnml(0-2) Gamepair Elo
ncm-dbt-01 10:27:40 1162075 3490 1553 244 1693 +137.0 +/- 4.91 1 28 416 1261 39 +321.46 +/- 16.72
ncm-dbt-02 10:24:47 1138313 3430 1533 244 1653 +137.29 +/- 5.08 0 28 423 1211 53 +315.85 +/- 16.58
ncm-dbt-03 10:25:44 1188089 3488 1519 275 1694 +129.61 +/- 5.09 0 35 476 1187 46 +292.58 +/- 15.62
ncm-dbt-05 10:38:32 1168572 3436 1520 240 1676 +135.97 +/- 5.14 0 33 425 1207 53 +311.2 +/- 16.55
ncm-dbt-06 10:27:24 1177289 3482 1522 270 1690 +130.77 +/- 5.19 0 38 466 1184 53 +293.74 +/- 15.8
17326 7647 1273 8406 +134.1 +/- 2.27 1 162 2206 6050 244 +306.65 +/- 7.25

Test Detail

ID Host Base NPS Games WLD Standard Elo Ptnml(0-2) Gamepair Elo CLI PGN
368876 ncm-dbt-05 932709 6 3 0 3 +190.46 +/- 18.51 0 0 0 3 0 +1199.83 +/- 276.48
368875 ncm-dbt-02 709585 10 4 1 5 +107.43 +/- 84.35 0 0 2 3 0 +240.65 +/- 570.11
368871 ncm-dbt-01 819752 48 22 3 23 +145.43 +/- 41.38 0 0 6 17 1 +337.98 +/- 165.55
368870 ncm-dbt-03 822627 42 19 5 18 +120.38 +/- 39.67 0 0 7 14 0 +279.54 +/- 142.86
368869 ncm-dbt-06 853328 44 19 3 22 +132.36 +/- 52.14 0 0 8 12 2 +261.24 +/- 129.57
368868 ncm-dbt-02 1196284 420 189 30 201 +138.41 +/- 13.74 0 0 58 145 7 +318.11 +/- 44.76
368867 ncm-dbt-06 1226453 438 194 28 216 +138.58 +/- 14.78 0 4 54 152 9 +313.12 +/- 47.05
368866 ncm-dbt-05 1198141 430 199 32 199 +142.4 +/- 15.54 0 5 49 150 11 +319.41 +/- 49.45
368865 ncm-dbt-01 1219175 442 199 34 209 +136.28 +/- 13.38 0 4 51 163 3 +324.94 +/- 48.49
368864 ncm-dbt-03 1220575 446 199 38 209 +131.34 +/- 14.36 0 3 64 148 8 +292.04 +/- 42.93
368863 ncm-dbt-02 1197951 500 235 35 230 +147.19 +/- 13.58 0 3 56 179 12 +339.63 +/- 46.2
368862 ncm-dbt-06 1243571 500 223 40 237 +133.34 +/- 13.26 0 4 66 173 7 +304.07 +/- 42.38
368861 ncm-dbt-05 1197535 500 223 49 228 +126.17 +/- 14.62 0 10 64 168 8 +277.93 +/- 42.91
368860 ncm-dbt-01 1217062 500 220 35 245 +134.95 +/- 13.03 0 5 60 180 5 +315.35 +/- 44.57
368859 ncm-dbt-03 1226675 500 221 34 245 +136.56 +/- 13.73 0 6 59 177 8 +312.48 +/- 44.93
368858 ncm-dbt-02 1198488 500 220 39 241 +131.74 +/- 13.49 0 5 66 172 7 +298.62 +/- 42.41
368857 ncm-dbt-06 1208373 500 215 55 230 +115.22 +/- 14.21 0 7 84 151 8 +245.2 +/- 37.37
368856 ncm-dbt-05 1196395 500 224 39 237 +134.95 +/- 12.84 0 3 65 176 6 +312.48 +/- 42.67
368855 ncm-dbt-01 1216936 500 219 31 250 +137.37 +/- 13.15 0 5 58 181 6 +321.19 +/- 45.36
368854 ncm-dbt-03 1234724 500 220 38 242 +132.54 +/- 13.1 0 5 63 177 5 +306.84 +/- 43.45
368853 ncm-dbt-02 1196628 500 229 39 232 +138.99 +/- 13.1 0 3 62 177 8 +321.19 +/- 43.77
368852 ncm-dbt-06 1205593 500 222 25 253 +144.71 +/- 12.68 0 3 54 186 7 +346.12 +/- 47.1
368851 ncm-dbt-05 1197468 500 221 31 248 +138.99 +/- 12.5 0 3 59 183 5 +330.23 +/- 44.94
368850 ncm-dbt-01 1219113 500 223 35 242 +137.37 +/- 12.96 0 2 66 174 8 +315.35 +/- 42.24
368849 ncm-dbt-03 1241094 500 222 37 241 +134.95 +/- 13.59 0 5 63 174 8 +306.84 +/- 43.45
368848 ncm-dbt-06 1232924 500 213 37 250 +127.76 +/- 13.92 0 8 64 172 6 +288.06 +/- 43.03
368847 ncm-dbt-05 1223359 500 215 32 253 +133.34 +/- 12.69 0 3 66 176 5 +309.64 +/- 42.33
368846 ncm-dbt-01 1191177 500 227 42 231 +134.95 +/- 14.13 1 5 60 176 8 +309.64 +/- 44.55
368845 ncm-dbt-02 1196765 500 216 37 247 +130.14 +/- 14.23 0 12 51 183 4 +301.33 +/- 47.43
368844 ncm-dbt-03 1257349 500 208 35 257 +125.38 +/- 13.79 0 7 69 168 6 +280.42 +/- 41.44
368843 ncm-dbt-06 1208992 500 217 45 238 +124.6 +/- 13.8 0 6 73 164 7 +275.45 +/- 40.24
368842 ncm-dbt-01 1202334 500 219 35 246 +134.15 +/- 12.27 0 3 63 181 3 +318.25 +/- 43.39
368841 ncm-dbt-05 1193499 500 222 27 251 +143.07 +/- 13.54 0 4 57 179 10 +330.23 +/- 45.79
368840 ncm-dbt-02 1198127 500 221 34 245 +136.56 +/- 13.36 0 4 63 175 8 +312.48 +/- 43.44
368839 ncm-dbt-03 1257643 500 220 36 244 +134.15 +/- 12.27 0 1 69 175 5 +312.48 +/- 41.11
368838 ncm-dbt-06 1239082 500 219 37 244 +132.54 +/- 13.65 0 6 63 174 7 +301.33 +/- 43.45
368837 ncm-dbt-01 1211053 500 224 29 247 +143.07 +/- 12.54 0 4 52 189 5 +346.12 +/- 48.03
368836 ncm-dbt-02 1212676 500 219 29 252 +138.99 +/- 12.5 0 1 65 177 7 +324.17 +/- 42.47
368835 ncm-dbt-03 1244028 500 210 52 238 +113.68 +/- 14.05 0 8 82 154 6 +245.2 +/- 37.88
368834 ncm-dbt-05 1209473 500 213 30 257 +133.34 +/- 13.63 0 5 65 172 8 +301.33 +/- 42.75

Commit

Commit ID a169c78b6d3b082068deb49a39aaa1fd75464c7f
Author Tomasz Sobczyk
Date 2024-05-28 16:34:15 UTC
Improve performance on NUMA systems Allow for NUMA memory replication for NNUE weights. Bind threads to ensure execution on a specific NUMA node. This patch introduces NUMA memory replication, currently only utilized for the NNUE weights. Along with it comes all machinery required to identify NUMA nodes and bind threads to specific processors/nodes. It also comes with small changes to Thread and ThreadPool to allow easier execution of custom functions on the designated thread. Old thread binding (WinProcGroup) machinery is removed because it's incompatible with this patch. Small changes to unrelated parts of the code were made to ensure correctness, like some classes being made unmovable, raw pointers replaced with unique_ptr. etc. Windows 7 and Windows 10 is partially supported. Windows 11 is fully supported. Linux is fully supported, with explicit exclusion of Android. No additional dependencies. ----------------- A new UCI option `NumaPolicy` is introduced. It can take the following values: ``` system - gathers NUMA node information from the system (lscpu or windows api), for each threads binds it to a single NUMA node none - assumes there is 1 NUMA node, never binds threads auto - this is the default value, depends on the number of set threads and NUMA nodes, will only enable binding on multinode systems and when the number of threads reaches a threshold (dependent on node size and count) [[custom]] - // ':'-separated numa nodes // ','-separated cpu indices // supports "first-last" range syntax for cpu indices, for example '0-15,32-47:16-31,48-63' ``` Setting `NumaPolicy` forces recreation of the threads in the ThreadPool, which in turn forces the recreation of the TT. The threads are distributed among NUMA nodes in a round-robin fashion based on fill percentage (i.e. it will strive to fill all NUMA nodes evenly). Threads are bound to NUMA nodes, not specific processors, because that's our only requirement and the OS can schedule them better. Special care is made that maximum memory usage on systems that do not require memory replication stays as previously, that is, unnecessary copies are avoided. On linux the process' processor affinity is respected. This means that if you for example use taskset to restrict Stockfish to a single NUMA node then the `system` and `auto` settings will only see a single NUMA node (more precisely, the processors included in the current affinity mask) and act accordingly. ----------------- We can't ensure that a memory allocation takes place on a given NUMA node without using libnuma on linux, or using appropriate custom allocators on windows (https://learn.microsoft.com/en-us/windows/win32/memory/allocating-memory-from-a-numa-node), so to avoid complications the current implementation relies on first-touch policy. Due to this we also rely on the memory allocator to give us a new chunk of untouched memory from the system. This appears to work reliably on linux, but results may vary. MacOS is not supported, because AFAIK it's not affected, and implementation would be problematic anyway. Windows is supported since Windows 7 (https://learn.microsoft.com/en-us/windows/win32/api/processtopologyapi/nf-processtopologyapi-setthreadgroupaffinity). Until Windows 11/Server 2022 NUMA nodes are split such that they cannot span processor groups. This is because before Windows 11/Server 2022 it's not possible to set thread affinity spanning processor groups. The splitting is done manually in some cases (required after Windows 10 Build 20348). Since Windows 11/Server 2022 we can set affinites spanning processor group so this splitting is not done, so the behaviour is pretty much like on linux. Linux is supported, **without** libnuma requirement. `lscpu` is expected. ----------------- Passed 60+1 @ 256t 16000MB hash: https://tests.stockfishchess.org/tests/view/6654e443a86388d5e27db0d8 ``` LLR: 2.95 (-2.94,2.94) <0.00,10.00> Total: 278 W: 110 L: 29 D: 139 Ptnml(0-2): 0, 1, 56, 82, 0 ``` Passed SMP STC: https://tests.stockfishchess.org/tests/view/6654fc74a86388d5e27db1cd ``` LLR: 2.95 (-2.94,2.94) <-1.75,0.25> Total: 67152 W: 17354 L: 17177 D: 32621 Ptnml(0-2): 64, 7428, 18408, 7619, 57 ``` Passed STC: https://tests.stockfishchess.org/tests/view/6654fb27a86388d5e27db15c ``` LLR: 2.94 (-2.94,2.94) <-1.75,0.25> Total: 131648 W: 34155 L: 34045 D: 63448 Ptnml(0-2): 426, 13878, 37096, 14008, 416 ``` fixes #5253 closes https://github.com/official-stockfish/Stockfish/pull/5285 No functional change
Copyright 2011–2024 Next Chess Move LLC