Dev Builds » 20251102-1504

You are viewing an old NCM Stockfish dev build test. You may find the most recent dev build tests using Stockfish 15 as the baseline here.

Use this dev build

NCM plays each Stockfish dev build 20,000 times against Stockfish 14. This yields an approximate Elo difference and establishes confidence in the strength of the dev builds.

Summary

Host	Duration	Avg Base NPS	Games	WLD	Standard Elo	Ptnml(0-2)	Gamepair Elo

Test Detail

ID	Host	Base NPS	Games	WLD	Standard Elo	Ptnml(0-2)	Gamepair Elo	CLI	PGN

Commit

Commit ID	69a01b88f35db2a5003d42116f573207ca5c275b
Author	Tomasz Sobczyk
Date	2025-11-02 15:04:09 UTC
Use shared memory for network weights This enables different Stockfish processes that use the same weights to use the same memory. The approach establishes equivalence by memory content, and is compatible with NUMA replication. The benefit of sharing is reduced memory usage and a speedup thanks to improved (inter-process) caching of the network in the CPUs cache, and thus reduced bandwidth usage to main memory. Even though this change doesn't benefit a user running a single process, this helps on fishtest or e.g. for Lichess, when multiple games run concurrently, or multiple positions are analyzed in parallel. This concept was probably first introduced in the Monty engine (https://github.com/official-monty/Monty/pull/62), after a discussion in https://github.com/official-stockfish/fishtest/issues/2077 on the issue of memory pressure. Measurements based on Torch (https://github.com/user-attachments/files/21386224/verbatim.pdf) further suggested that large gains were possible. Multiple other engines have adopted this 'verbatim' format as well. The implementation here adds the flexibility needed for SF, for example, retains the ability to bundle compressed networks with the binary, to load nets by uci option, and to distribute the shared nets to the proper NUMA region. This flexibility comes with a fair amount of complexity in the implementation, such as OS specific code, and fallback code. For most users this should be transparent. However, for example, those running docker containers should ensure the `--ipc` flag is set correctly, and `--shm-size` is sufficiently large. The benefits of this patch significantly depend on hardware, with systems with many cores and a large (O(150MB), the net size) L3 cache benefitting typically most. On such systems SF speedups (as measured via nps playing games with large concurrency but just 1 thread) can be 38%, which results in master vs. patch Elo which gains about 25 Elo. ``` # PLAYER : RATING ERROR POINTS PLAYED (%) 1 shared_memoryPR : 24.8 1.9 39432.0 73728 53 2 master : 0.0 ---- 34296.0 73728 47 ``` In a multithreaded setup, where weights are already shared, that benefit is smaller, for example on the same HW as above, but with 8t for each side. ``` # PLAYER : RATING ERROR POINTS PLAYED (%) 1 shared_memoryPR : 5.2 3.5 9351.0 18432 51 2 master : 0.0 ---- 9081.0 18432 49 ``` On fishtest with a typical hardware mix of our contributors, the following was measured: STC, 60k games https://tests.stockfishchess.org/tests/view/69074a49ea4b268f1fac236c Elo: 4.69 ± 1.4 (95%) LOS: 100.0% Total: 60000 W: 16085 L: 15275 D: 28640 Ptnml(0-2): 154, 6440, 16053, 7148, 205 nElo: 9.38 ± 2.8 (95%) PairsRatio: 1.12 To verify correctness with a single process on a NUMA architecture, speedtest was used, confirming near equivalence: ``` master: Average (over 10): 296236186 shared_memory: Average (over 10): 295769332 ``` Currently, using large pages for the shared network weights is not always possible, which can lead to a small slowdown (1-2%), in case a single process is run. closes https://github.com/official-stockfish/Stockfish/pull/6173 No functional change Co-authored-by: disservin <disservin.social@gmail.com> Co-authored-by: Joost VandeVondele <Joost.VandeVondele@gmail.com>