Dev Builds » 20260609-1735

You are viewing an old NCM Stockfish dev build test. You may find the most recent dev build tests using Stockfish 15 as the baseline here.

Use this dev build

NCM plays each Stockfish dev build 20,000 times against Stockfish 14. This yields an approximate Elo difference and establishes confidence in the strength of the dev builds.

Summary

Host	Duration	Avg Base NPS	Games	WLD	Standard Elo	Ptnml(0-2)	Gamepair Elo

Test Detail

ID	Host	Base NPS	Games	WLD	Standard Elo	Ptnml(0-2)	Gamepair Elo	CLI	PGN

Commit

Commit ID	8e711c29fe7d5d9b317de46ec5f0cd848e56fbaf
Author	anematode
Date	2026-06-09 17:35:05 UTC
Add wasm32 and wasm32-relaxed-simd targets, and light optimizations Lichess maintains some patches on top of SF dev to get it working with Emscripten. This PR moves some of these patches into SF and adds WASM to CI. It also adds a few changes in places where the x86 intrinsics don't cleanly map onto WebAssembly SIMD instructions; otherwise, we use Emscripten's x86 compatibility layer and take SSE4.1 code paths. Summary of the compatibility changes: - Define `wasm32` and `wasm32-relaxed-simd` targets. - We don't support wasm without SIMD; it'd be a waste of time. - Add option to disable TBs - This is required because `tbprobe.cpp` pulls in `mmap`. This option can be used on any target, of course, but it's only enabled by default for wasm. - Add compilation job + test to CI And the changes for performance: - Disable atomics for shared history on wasm - Atomics are always `seq_cst` there, which can be quite slow (even on the x86, stores are locked `xchg [mem], reg`) - Add SSE code path to `get_changed_pieces`, modeled after the AVX2 path - `_mm_mulhi_epi16` has a complicated emulation sequence, so for the pairwise multiplication, use an approach similar to the NEON impl. - __int128 is gets lowered to runtime functions on wasm, so use the fallback impl for `mul_hi64` - V8 does a poor job with the NNZ finding, so use a slightly different sequence there - Add relaxed simd support for `m128_dpbusd`. Some local perf figures (single-threaded speedtest): ``` wasm Nodes/second : 902523 sse4.1 Nodes/second : 1155380 avx512icl Nodes/second : 1676184 ``` Further avenues to explore: - Optimize for performance under V8's experimental AVX revectorizer (Currently it's about +10% in my testing, but could definittely be more) - Branch hinting. For example, run bench while collecting branch frequency info, then inject it late in the WASM compilation pipeline. I tried this locally and it didn't help much, but maybe I'm missing something. - PGO. Gives +1.5% NPS locally, but hard to integrate with WASM compilation wrokflows closes https://github.com/official-stockfish/Stockfish/pull/6875 No functional change