Add 1GB page support on x86 Linux
1GB pages for the TT give a small performance bump (`./stockfish speedtest 256 32000`):
master:
Nodes/second : 230514479
patch
Nodes/second : 237997728
With a large enough TT there is still dTLB thrashing with 2MB pages. 1GB pages reduces that and makes page table walks a bit faster (they're shorter, and there's fewer PTEs, which fits better in cache). `madvise` won't give you huge pages, only large pages, so we use mmap here.
We guard the huge page attempt to x86 + at least 8 huge pages per NUMA node, to avoid memory oversubscription when for example running with a 1.1 GB hash. Additionally, we change the TT clearing slightly so that the pages are better distributed across NUMA nodes.
Data from Torom:
master:
Nodes/second : 43524292
patch:
Nodes/second : 44080034
Data on an 8-thread Emerald Rapids VM:
./stockfish speedtest 8 65536
Baseline speedtest (2MB huge pages):
Nodes/second : 7366232
Nodes/second : 7262794
Nodes/second : 7355431
Nodes/second : 7318777
Baseline speedtest (1GB huge pages):
Nodes/second : 7541027
Nodes/second : 7572720
Nodes/second : 7550268
Nodes/second : 7538925
closes https://github.com/official-stockfish/Stockfish/pull/6681
No functional change