Version 16 of Stockfish has been recently released. The new version brings a larger neural network, and the program is stronger by about ten Elo points compared to its previous version.
Since Stockfish was equipped with a neural network (NNUE) with the release of version 12 in 2020, the program’s growth has been disruptive, gaining over 200 Elo points in just 3 years.
Before this fundamental change, the program’s evaluation relied on an evaluation function entirely programmed by humans. This evaluation function, referred to as “classic,” was still present in the program until the July 2023 version (it was recently removed a few days ago), to the extent that users could choose to disable the neural network and evaluate a position on the chessboard using the old function, as it was not only considerably faster but also advantageous in certain specific positions.
This raises the question of whether the “classic” evaluation function also benefited, at least in part, from the strength increase that characterized the latest versions of the program or if the developers’ decision to remove the possibility of using it can be considered correct. The purpose of the following analysis is precisely to clarify this aspect.
Tested Versions and Test Conditions
The tested versions of Stockfish include all official releases by the developers starting from version 12 (the first version equipped with a neural network):
- Stockfish 12
- Stockfish 13
- Stockfish 14
- Stockfish 14.1
- Stockfish 15
- Stockfish 15.1
- Stockfish 16
The test conditions are the same as the “rating list” I occasionally publish:
- Time control of 40/120′ repeated, scaled to the equivalent speed (emulated) of a Pentium 90. With the modern PCs used in the tests, this corresponds to blitz games (3-5 minutes per game).
- The opening suite consists of 190 different positions, repeated for each engine (each engine played the same opening both as White and Black).
Each program faced Stockfish 11 and Stockfish 10 (the last two versions exclusively equipped with the “classic” evaluation function) with the neural network disabled.
The results show a well-known fact, that with the neural network enabled (represented by the blue curve in the graph), the versions of Stockfish following version 11 have significantly grown in terms of ELO, although the growth has decreased substantially since the release of version 14.1. Between Stockfish 11 and Stockfish 16, with NNUE enabled, there is an Elo difference of about 290 points.
Instead, this is not the case when switching to the classic evaluation function (represented by the orange curve). The Elo difference between Stockfish 11 and Stockfish 16, with NNUE disabled, is less than 20 Elo points. In fact, from the conducted tests, the program’s strength seems to slightly decrease until version 14.1 and then gradually increase again. Overall, though, the program’s growth without the active neural network has been nearly non-existent. The following graph illustrates this trend more clearly:
This inconsistent trend in the curve is most likely due to the optimizations introduced in the Stockfish code to enhance search with the active neural network, which, in some cases, turned out to be counterproductive when using the classic evaluation function. In light of the results, the decision of the Stockfish developers to remove the classic evaluation function appears to be justified.