Speed up find_nnz on neon (Refactored)
Passed STC:
LLR: 2.97 (-2.94,2.94) <0.00,2.00>
Total: 349568 W: 90131 L: 89393 D: 170044
Ptnml(0-2): 725, 37712, 97280, 38234, 833
https://tests.stockfishchess.org/tests/view/69e470510e9667dd5a765046
This is a refactor of https://github.com/official-stockfish/Stockfish/pull/6766 so we don't have to include nnue_architecture.h
It also includes the fix in https://github.com/official-stockfish/Stockfish/pull/6769 which was needed to make this work.
mstembera made rhirsch123 the author. credits also to anematode.
[Edit] Also fixes a bug where _mm512_packus_epi32 was being used instead of the correct _mm512_packs_epi32 for the IceLake version.
closes https://github.com/official-stockfish/Stockfish/pull/6774
No functional change