Universal binary for x86-64 Linux and Windows
We maintain support for quite a few x86 ISA extensions but relied on the end
user to select the binary that is best for their system. We can detect at
runtime which architecture to use, and ship a single binary (per OS/ISA
combination). This PR does so, maintaining performance.
This is what I've landed on after quite a few iterations. Basically, we build
each arch separately, use `cpuid` to select one, and jump to that arch's main
function. Some details:
- The preprocessor macro `UNIVERSAL_BINARY` is defined when building for the universal binary.
- The Makefile target `universal-object-[no]pgo` is added, which produces a `stockfish.o` in each arch's build directory.
- To prevent symbol collisions between the multiple builds, we `#define Stockfish Stockfish_[arch]`. Furthermore we wrap the `main` function in `namespace Stockfish { }`.
- We can still get PGO data by linking to a per-arch binary, with `Stockfish_x86_64_avx2::main` as `main`, and running `bench`.
- For arches not supported by the host we can use Intel SDE – this is what we do in CI.
- When embedding the NNUE, we use C++26/C23 `#embed` instead of `INCBIN`. The issue with `INCBIN` is that it injects assembly directives that we have no control over.
- To ensure there's only one copy of the networks in the final binary, we define the network data as weak symbols in each per-arch build. Then, in the final link, we add a special nnue_embed.cpp which provides strong symbols.
- This does require GCC 15. Statically linking libstdc++ allows the binary to still run on older OSes though. And latest msys2 already does static linking + is based off GCC 15.2.0
building can be as simple as
make -j profile-build ARCH=x86-64-universal
or using the sde if the host doesn't support all architectures supported in the
universal binary, and the default host compiler is not recent enough.
make -j profile-build ARCH=x86-64-universal RUN_PREFIX="/path/to/sde -future --" CXX=g++-15
This change is also integrated in CI, so universal binaries are available as
downloadable artifacts.
Next we could also do ARM64, especially Android (all Apple silicon users use
the same binary). It also might be worth getting this to work with clang.
closes https://github.com/official-stockfish/Stockfish/pull/6740
No functional change