Speed up feature transform on neon
STC:
LLR: 2.95 (-2.94,2.94) <0.00,2.00>
Total: 109344 W: 27919 L: 27520 D: 53905
Ptnml(0-2): 200, 11718, 30434, 12123, 197
https://tests.stockfishchess.org/tests/view/6a123ed0818cacc1db0ac39e
Updated to anematode’s suggestion - apparently clang already optimized it to this.
Apple M1
Result of 100 runs
base (...kfish.master) = 1307276 +/- 10930
test (./stockfish ) = 1320940 +/- 11279
diff = +13664 +/- 2223
speedup = +0.0105
P(speedup > 0) = 1.0000
closes https://github.com/official-stockfish/Stockfish/pull/6855
No functional change