Faster movepick sorting on AVX512ICL
The core idea here is to greatly speed up the backwards insertion loop of the insertion sort. We can store up to 16 sorted moves and their values/scores in vectors. We obtain all moves with a smaller score (i.e., sorted later than the move-to-insert) as a mask, giving us something like `1111111111000000`. If we subtract one this gives us `1111111110111111` which can be can be used as an expand mask, to insert the new move.
Passed STC:
https://tests.stockfishchess.org/tests/view/69e9c9190e9667dd5a765853
LLR: 2.94 (-2.94,2.94) <0.00,2.00>
Total: 45024 W: 11867 L: 11550 D: 21607
Ptnml(0-2): 101, 4938, 12142, 5205, 126
Local test:
```
Result of 50 runs
==================
base (...kfish.master) = 2026071 +/- 5124
test (./stockfish ) = 2083770 +/- 5964
diff = +57699 +/- 8535
speedup = +0.0285
P(speedup > 0) = 1.0000
```
closes https://github.com/official-stockfish/Stockfish/pull/6768
No functional change