prefetch earlier if checKEP is false
Only a modest amount of work happens between the transposition table prefetch
and the probe, so the probe still often stalls waiting for DRAM. The vast
majority of the time (in particular, if !checkEP), the key is known much
earlier in the do_move function and the latency can be better hidden.
passed STC SMP
https://tests.stockfishchess.org/tests/view/68f337c528e6d77fcffa066a
LLR: 2.95 (-2.94,2.94) <0.00,2.00>
Total: 65256 W: 16806 L: 16462 D: 31988
Ptnml(0-2): 76, 7386, 17362, 7726, 78
but failed to gain STC
https://tests.stockfishchess.org/tests/view/68f3378328e6d77fcffa0665
LLR: -2.94 (-2.94,2.94) <0.00,2.00>
Total: 109824 W: 28523 L: 28618 D: 52683
Ptnml(0-2): 311, 11799, 30788, 11702, 312
In local tests, the speedup grows with thread count
closes https://github.com/official-stockfish/Stockfish/pull/6372
No functional change