nostalgebraist's blog is a must-read regarding GPT-x, including GPT-3. Perhaps, start here ("the transformer... 'explained'?"), which helps to contextualize GPT-x within the history of machine learning.
(Though, I should note that nostalgebraist holds a contrarian "bearish" position on GPT-3 in particular; for the "bullish" case instead, read Gwern.)
Thanks for writing this post. I have a handful of quick questions: (a) What was the reference MIPS (or the corresponding CPU) you used for the c. 2019-2020 data point? (b) What was the constant amount of RAM you used to run Stockfish? (c) Do I correctly understand that the Stockfish-to-MIPS comparison is based on the equation [edit: not sure how to best format this LaTeX...]:
slowed down SF8 running timereference SF8 running time=historical MIPSreference MIPS
So, your post piqued my interest to investigate the Intel 80486 a bit more with the question in mind: how comparable are old vs. new CPUs according not just to MIPS but also to other metrics?
It would seem that improvements in MIPS have outpaced improvements in memory-related metrics, e.g. memory latency in units of cycles has gotten worse. What I don't know is how sensitive Stockfish is to variations in memory performance. However I would conclude that I'd update the estimated SF8 ELO on older hardware upwards when additionally accounting for the effect of memory-related metrics in addition to MIPS. In other words, it seems more likely to me that the SF8 ELO curve is underestimated when you include both MIPS and memory-related effects.
(There is another direction one could follow: get Stockfish to compile on an 80486 or other old hardware that one has lying around, and report back with results.)