arm64 universal binaries redux
Introduce the armv8-universal target. 🥴
The entry point is essentially the same as x86's, except we no longer have access to `__builtin_cpu_supports` so instead we need an OS-specific query for whether we support dotprod. The Makefile is modified to support both universal builds.
If in the future we add more ARM targets, such as SVE, we'll need to add qemu to the RUN_PREFIX in CI, because currently we assume (for PGO purposes) that the CI host supports all the used ARM instructions.
### clang/Windows
The painful part here is clang on Windows, which, until arm64 mingw is stabilized, is required for targeting arm64. This PR also gets it to work for x86. In the Makefile this setup corresponds to `use_lto_emit_asm=yes`.
In particular, `--defsym` and `--save-temps` are not supported by `lld-link`, and objcopy `--rename-section` doesn't work on COFF binaries because of how section names work there.
- `--defsym` is needed to define `main` for PGO purposes and assigns it to the namespaced, per-arch main function. Instead, we define `main` in `main.cpp` so that the compilation is successful, then delete it before the final link.
- Instead of `--save-temps` to get the LTO intermediate object, we pass `--lto-emit-asm` to the linker, which outputs `stockfish.exe.lto.s`.
- Finally, we have a small AWK script to find the `.ctors` section, neuter it, and put start/stop symbols around it with the same naming scheme as ELF (`__start_*_init`/`__stop_*_init`).
I'm lowk a Windows programming noob so if there's simpler ways of going about this, I'd appreciate a pointer. @PikaCat-OuO + Codex used an approach that involved going in and modifying the LLVM bitcode, but that felt more complicated to me.
closes https://github.com/official-stockfish/Stockfish/pull/6823
No functional change
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>