Dev Builds » 20220210-1854

You are viewing an old NCM Stockfish dev build test. You may find the most recent dev build tests using Stockfish 15 as the baseline here.

Use this dev build

NCM plays each Stockfish dev build 20,000 times against Stockfish 14. This yields an approximate Elo difference and establishes confidence in the strength of the dev builds.

Summary

Host Duration Avg Base NPS Games WLD Standard Elo Ptnml(0-2) Gamepair Elo
ncm-dbt-01 10:07:11 1176741 3366 1260 481 1625 +81.89 +/- 5.67 1 110 711 831 30 +166.5 +/- 12.76
ncm-dbt-02 10:03:53 1234375 3310 1235 459 1616 +83.0 +/- 5.75 1 113 677 837 27 +169.79 +/- 13.09
ncm-dbt-03 10:08:45 1268887 3330 1217 448 1665 +81.71 +/- 5.82 2 113 701 812 37 +164.43 +/- 12.86
ncm-dbt-04 10:08:17 1252251 3352 1271 453 1628 +86.53 +/- 5.75 0 109 676 855 36 +175.7 +/- 13.1
ncm-dbt-05 10:04:19 1264652 3318 1197 471 1650 +77.27 +/- 5.72 0 127 700 811 21 +157.64 +/- 12.87
ncm-dbt-06 10:07:30 1262587 3324 1203 459 1662 +79.1 +/- 5.87 2 125 696 805 34 +159.1 +/- 12.91
20000 7383 2771 9846 +81.59 +/- 2.35 6 697 4161 4951 185 +165.49 +/- 5.28

Test Detail

ID Host Base NPS Games WLD Standard Elo Ptnml(0-2) Gamepair Elo CLI PGN
205152 ncm-dbt-05 1230266 150 54 23 73 +72.85 +/- 26.11 0 5 35 34 1 +147.19 +/- 57.98
205151 ncm-dbt-02 1192527 152 53 22 77 +71.86 +/- 26.65 0 6 34 35 1 +145.02 +/- 59.12
205150 ncm-dbt-06 1231825 156 56 22 78 +76.96 +/- 23.94 0 4 36 38 0 +162.31 +/- 56.98
205149 ncm-dbt-03 1234697 162 54 19 89 +76.26 +/- 24.15 0 5 36 40 0 +160.68 +/- 57.3
205148 ncm-dbt-04 1221892 184 72 16 96 +109.2 +/- 21.59 0 3 30 59 0 +245.58 +/- 63.59
205147 ncm-dbt-01 1135538 196 74 26 96 +86.85 +/- 25.25 0 8 37 50 3 +172.42 +/- 56.74
205146 ncm-dbt-05 1231114 500 185 77 238 +76.25 +/- 15.09 0 21 104 121 4 +153.86 +/- 33.54
205145 ncm-dbt-02 1196267 500 185 69 246 +82.1 +/- 15.39 1 17 103 123 6 +165.8 +/- 33.7
205144 ncm-dbt-06 1234298 500 179 62 259 +82.83 +/- 15.41 1 19 96 130 4 +171.02 +/- 34.92
205143 ncm-dbt-03 1249722 500 183 70 247 +79.9 +/- 15.46 0 20 104 119 7 +157.24 +/- 33.54
205142 ncm-dbt-04 1209185 500 179 66 255 +79.9 +/- 15.2 0 22 96 129 3 +164.07 +/- 34.9
205141 ncm-dbt-01 1119184 500 178 75 247 +72.61 +/- 15.5 0 23 107 114 6 +142.26 +/- 33.06
205140 ncm-dbt-05 1238439 500 178 61 261 +82.83 +/- 14.86 0 19 98 130 3 +171.02 +/- 34.58
205139 ncm-dbt-02 1187103 500 181 72 247 +76.97 +/- 14.97 0 22 99 127 2 +158.93 +/- 34.38
205138 ncm-dbt-06 1224894 500 168 75 257 +65.38 +/- 15.1 1 20 119 105 5 +129.35 +/- 31.2
205137 ncm-dbt-03 1237767 500 178 58 264 +85.04 +/- 14.92 0 19 95 133 3 +176.33 +/- 35.12
205136 ncm-dbt-04 1216156 500 202 69 229 +94.69 +/- 15.4 0 16 93 133 8 +190.85 +/- 35.52
205135 ncm-dbt-01 1139238 500 187 70 243 +82.83 +/- 14.29 0 15 106 126 3 +171.02 +/- 33.15
205134 ncm-dbt-05 1223864 500 189 58 253 +93.2 +/- 13.91 0 12 97 139 2 +198.34 +/- 34.72
205133 ncm-dbt-06 1215396 500 174 66 260 +76.25 +/- 14.95 0 20 106 120 4 +153.86 +/- 33.21
205132 ncm-dbt-02 1194719 500 181 71 248 +77.7 +/- 15.4 0 20 107 116 7 +152.18 +/- 33.05
205131 ncm-dbt-04 1223813 500 187 70 243 +82.83 +/- 14.72 0 14 112 117 7 +164.07 +/- 32.12
205130 ncm-dbt-03 1215327 500 176 73 251 +72.61 +/- 14.97 0 22 106 119 3 +147.19 +/- 33.22
205129 ncm-dbt-01 1129503 500 192 73 235 +84.3 +/- 14.33 0 13 110 122 5 +171.02 +/- 32.42
205128 ncm-dbt-05 1221791 500 178 66 256 +79.17 +/- 15.04 0 20 102 124 4 +160.64 +/- 33.88
205127 ncm-dbt-06 1229429 500 183 64 253 +84.3 +/- 15.59 0 20 98 125 7 +167.53 +/- 34.57
205126 ncm-dbt-03 1224630 500 201 77 222 +88.0 +/- 15.81 0 17 103 119 11 +169.27 +/- 33.7
205125 ncm-dbt-02 1202266 500 190 70 240 +85.04 +/- 14.2 0 14 105 128 3 +176.33 +/- 33.3
205124 ncm-dbt-04 1217490 500 186 73 241 +79.9 +/- 14.5 0 15 112 118 5 +160.64 +/- 32.15
205123 ncm-dbt-01 1123329 500 179 80 241 +69.71 +/- 14.45 0 20 113 115 2 +142.26 +/- 32.1
205122 ncm-dbt-06 1241890 500 197 78 225 +84.3 +/- 14.76 0 17 101 128 4 +172.78 +/- 34.05
205121 ncm-dbt-05 1236249 500 178 87 235 +63.94 +/- 14.91 0 22 120 103 5 +124.6 +/- 31.07
205120 ncm-dbt-03 1237366 500 191 61 248 +92.45 +/- 15.36 0 15 99 127 9 +183.51 +/- 34.39
205119 ncm-dbt-02 1198210 500 193 65 242 +90.96 +/- 14.47 0 14 98 134 4 +189.0 +/- 34.56
205118 ncm-dbt-01 1143011 500 196 67 237 +91.71 +/- 14.19 0 12 101 133 4 +190.85 +/- 33.97
205117 ncm-dbt-04 1224986 500 195 61 244 +95.44 +/- 14.84 0 15 91 139 5 +198.34 +/- 35.92
205116 ncm-dbt-05 1235332 500 173 80 247 +65.38 +/- 15.23 0 27 105 116 2 +132.54 +/- 33.35
205115 ncm-dbt-06 1214757 500 184 62 254 +86.52 +/- 15.51 0 18 100 124 8 +171.02 +/- 34.23
205114 ncm-dbt-02 1195074 500 188 66 246 +86.52 +/- 14.67 0 17 97 133 3 +179.9 +/- 34.76
205113 ncm-dbt-01 1132283 500 191 69 240 +86.52 +/- 15.09 1 14 103 126 6 +176.33 +/- 33.67
205112 ncm-dbt-04 1206382 500 182 77 241 +74.06 +/- 15.42 0 21 110 112 7 +143.89 +/- 32.58
205111 ncm-dbt-03 1235670 500 175 66 259 +76.98 +/- 13.83 1 9 124 112 4 +157.24 +/- 30.01
177511 ncm-dbt-02 1508841 158 64 24 70 +89.91 +/- 24.31 0 3 34 41 1 +187.93 +/- 58.79
177510 ncm-dbt-03 1515922 168 59 24 85 +73.46 +/- 26.32 1 6 34 43 0 +159.18 +/- 59.27
177509 ncm-dbt-01 1491849 170 63 21 86 +87.64 +/- 24.65 0 5 34 45 1 +182.77 +/- 59.3
177508 ncm-dbt-06 1508214 168 62 30 76 +66.99 +/- 25.92 0 7 40 35 2 +129.8 +/- 54.21
177507 ncm-dbt-04 1498109 168 68 21 79 +99.86 +/- 23.61 0 3 32 48 1 +213.66 +/- 61.13
177506 ncm-dbt-05 1500163 168 62 19 87 +90.95 +/- 20.83 0 1 39 44 0 +196.41 +/- 53.41

Commit

Commit ID cb9c2594fcedc881ae8f8bfbfdf130cf89840e4c
Author Tomasz Sobczyk
Date 2022-02-10 18:54:31 UTC
Update architecture to "SFNNv4". Update network to nn-6877cd24400e.nnue. Architecture: The diagram of the "SFNNv4" architecture: https://user-images.githubusercontent.com/8037982/153455685-cbe3a038-e158-4481-844d-9d5fccf5c33a.png The most important architectural changes are the following: * 1024x2 [activated] neurons are pairwise, elementwise multiplied (not quite pairwise due to implementation details, see diagram), which introduces a non-linearity that exhibits similar benefits to previously tested sigmoid activation (quantmoid4), while being slightly faster. * The following layer has therefore 2x less inputs, which we compensate by having 2 more outputs. It is possible that reducing the number of outputs might be beneficial (as we had it as low as 8 before). The layer is now 1024->16. * The 16 outputs are split into 15 and 1. The 1-wide output is added to the network output (after some necessary scaling due to quantization differences). The 15-wide is activated and follows the usual path through a set of linear layers. The additional 1-wide output is at least neutral, but has shown a slightly positive trend in training compared to networks without it (all 16 outputs through the usual path), and allows possibly an additional stage of lazy evaluation to be introduced in the future. Additionally, the inference code was rewritten and no longer uses a recursive implementation. This was necessitated by the splitting of the 16-wide intermediate result into two, which was impossible to do with the old implementation with ugly hacks. This is hopefully overall for the better. First session: The first session was training a network from scratch (random initialization). The exact trainer used was slightly different (older) from the one used in the second session, but it should not have a measurable effect. The purpose of this session is to establish a strong network base for the second session. Small deviations in strength do not harm the learnability in the second session. The training was done using the following command: python3 train.py \ /home/sopel/nnue/nnue-pytorch-training/data/nodes5000pv2_UHO.binpack \ /home/sopel/nnue/nnue-pytorch-training/data/nodes5000pv2_UHO.binpack \ --gpus "$3," \ --threads 4 \ --num-workers 4 \ --batch-size 16384 \ --progress_bar_refresh_rate 20 \ --random-fen-skipping 3 \ --features=HalfKAv2_hm^ \ --lambda=1.0 \ --gamma=0.992 \ --lr=8.75e-4 \ --max_epochs=400 \ --default_root_dir ../nnue-pytorch-training/experiment_$1/run_$2 Every 20th net was saved and its playing strength measured against some baseline at 25k nodes per move with pure NNUE evaluation (modified binary). The exact setup is not important as long as it's consistent. The purpose is to sift good candidates from bad ones. The dataset can be found https://drive.google.com/file/d/1UQdZN_LWQ265spwTBwDKo0t1WjSJKvWY/view Second session: The second training session was done starting from the best network (as determined by strength testing) from the first session. It is important that it's resumed from a .pt model and NOT a .ckpt model. The conversion can be performed directly using serialize.py The LR schedule was modified to use gamma=0.995 instead of gamma=0.992 and LR=4.375e-4 instead of LR=8.75e-4 to flatten the LR curve and allow for longer training. The training was then running for 800 epochs instead of 400 (though it's possibly mostly noise after around epoch 600). The training was done using the following command: The training was done using the following command: python3 train.py \ /data/sopel/nnue/nnue-pytorch-training/data/T60T70wIsRightFarseerT60T74T75T76.binpack \ /data/sopel/nnue/nnue-pytorch-training/data/T60T70wIsRightFarseerT60T74T75T76.binpack \ --gpus "$3," \ --threads 4 \ --num-workers 4 \ --batch-size 16384 \ --progress_bar_refresh_rate 20 \ --random-fen-skipping 3 \ --features=HalfKAv2_hm^ \ --lambda=1.0 \ --gamma=0.995 \ --lr=4.375e-4 \ --max_epochs=800 \ --resume-from-model /data/sopel/nnue/nnue-pytorch-training/data/exp295/nn-epoch399.pt \ --default_root_dir ../nnue-pytorch-training/experiment_$1/run_$run_id In particular note that we now use lambda=1.0 instead of lambda=0.8 (previous nets), because tests show that WDL-skipping introduced by vondele performs better with lambda=1.0. Nets were being saved every 20th epoch. In total 16 runs were made with these settings and the best nets chosen according to playing strength at 25k nodes per move with pure NNUE evaluation - these are the 4 nets that have been put on fishtest. The dataset can be found either at ftp://ftp.chessdb.cn/pub/sopel/data_sf/T60T70wIsRightFarseerT60T74T75T76.binpack in its entirety (download might be painfully slow because hosted in China) or can be assembled in the following way: Get the https://github.com/official-stockfish/Stockfish/blob/5640ad48ae5881223b868362c1cbeb042947f7b4/script/interleave_binpacks.py script. Download T60T70wIsRightFarseer.binpack https://drive.google.com/file/d/1_sQoWBl31WAxNXma2v45004CIVltytP8/view Download farseerT74.binpack http://trainingdata.farseer.org/T74-May13-End.7z Download farseerT75.binpack http://trainingdata.farseer.org/T75-June3rd-End.7z Download farseerT76.binpack http://trainingdata.farseer.org/T76-Nov10th-End.7z Run python3 interleave_binpacks.py T60T70wIsRightFarseer.binpack farseerT74.binpack farseerT75.binpack farseerT76.binpack T60T70wIsRightFarseerT60T74T75T76.binpack Tests: STC: https://tests.stockfishchess.org/tests/view/6203fb85d71106ed12a407b7 LLR: 2.94 (-2.94,2.94) <0.00,2.50> Total: 16952 W: 4775 L: 4521 D: 7656 Ptnml(0-2): 133, 1818, 4318, 2076, 131 LTC: https://tests.stockfishchess.org/tests/view/62041e68d71106ed12a40e85 LLR: 2.94 (-2.94,2.94) <0.50,3.00> Total: 14944 W: 4138 L: 3907 D: 6899 Ptnml(0-2): 21, 1499, 4202, 1728, 22 closes https://github.com/official-stockfish/Stockfish/pull/3927 Bench: 4919707
Copyright 2011–2024 Next Chess Move LLC