Prefetch threat weight rows during append_changed_indices
Passed STC (first run):
https://tests.stockfishchess.org/tests/view/698030ee6362aee5c8a552c7
LLR: 2.96 (-2.94,2.94) <0.00,2.00>
Total: 120800 W: 31336 L: 30912 D: 58552
Ptnml(0-2): 330, 13084, 33172, 13460, 354
Passed LTC:
https://tests.stockfishchess.org/tests/view/6983946d473df9d1d24a916d
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 86850 W: 22352 L: 21951 D: 42547
Ptnml(0-2): 45, 8506, 25934, 8883, 57
Passed STC SMP (after changes made after opening the PR):
https://tests.stockfishchess.org/tests/view/6987b4c1b0f3ca5200aaf9da
LLR: 2.95 (-2.94,2.94) <0.00,2.00>
Total: 84480 W: 21994 L: 21627 D: 40859
Ptnml(0-2): 98, 9378, 22932, 9723, 109
Passed STC (second run, as a sanity check made after changes made after opening the PR -- these changed, however, didn't change assembly code produced by the compiler):
https://tests.stockfishchess.org/tests/view/6987ad21b0f3ca5200aaf9c9
LLR: 2.93 (-2.94,2.94) <0.00,2.00>
Total: 132192 W: 34526 L: 34083 D: 63583
Ptnml(0-2): 421, 14511, 35759, 15014, 391
GCC maps __builtin_prefetch locality=1 to PREFETCHT2 instruction, targeting L2
cache. To make things clear, added template-based approach as suggested by
Disservin.
PREFETCHT2 should be the appropriate choice for the 78 MB threat weight table:
avoids L1d pollution, parks data in L2 where it is promoted to L1d on actual
load.
With 200-500 cycle lead time from make_index, data has time to arrive in L2.
suggestions by Disservin and anematode included
closes https://github.com/official-stockfish/Stockfish/pull/6602
No functional change