These are "milestone" systems selected from the database Parameter, Compute and Data Trends in Machine Learning, using the same criteria as described in Sevilla et al. (2022, p.16): "All models in our dataset are mainly chosen from papers that meet a series of necessary criteria (has an explicit learning component, showcases experimental results, and advances the state-of-the-art) and at least one notability criterion (>1000 citations, historical importance, important SotA advance, or deployed in a notable context). For new models (from 2020 onward) it is harder to assess these criteria, so we fall back to a subjective selection. We refer to models meeting our selection criteria as milestone models."
This growth rate is about 0.2 OOM/year lower than the growth of training compute—measured in floating-point operations (FLOP)—for the same set of systems in the same time period. This is based on the 2010 – 2022 compute trend in FLOP for "all models" (n=98) in Sevilla et al. (2022, Table 3), at 0.7 OOMs/year. Roughly, my growth rate results from the growth rate in compute subtracted by the growth rate in GPU price-performance, estimated by Hobbhahn & Besiroglu (2022, Table 1) as 0.12 OOMs/year.
These results are not my all-things-considered best estimates of what the growth rate will be from now on; rather, it is based on two estimation methods which combine training compute and GPU price-performance data to estimate costs historically. These methods seem informative but have strong simplifying assumptions. I explain my overall best guesses in point 3 of this summary, but those are based on more subjective reasoning. I base my cost estimates on reported hardware prices, which I believe are more accurate than on-demand cloud compute prices at estimating the true cost for the original developer. This means my cost estimates are often one order of magnitude lower than other sources such as Heim (2022).
See the "Long-run growth" bullet in this section of Cotra (2020) titled "Willingness to spend on computation forecast".
The adjustments are explained further in Appendix I and Appendix J.
This is the mean cost predicted by linear regression from the start to the end of the period.
For these large-scale results, I dropped the precision to one significant figure based on an intuitive judgment given the lower sample size and wider confidence interval compared to the "All systems" samples.
It turns out that this difference in growth rate to Method 1 is just due to the smaller dataset, even though the cost estimates differ significantly (roughly twice as large as those of Method 1 on average) (see this appendix for further explanation).
I included this result for completeness, but given the very small sample size and large confidence interval on the growth rate, I do not recommend using it.
The growth rates obtained via Method 1 and Method 2 were aggregated using a weighted mixture of normal distributions implemented in this Guesstimate model. Note that the results given by Guesstimate vary slightly each time the model is accessed due to randomness; the reported value is just one instance.
The mixture method aggregates the growth rates rather than fitting a new regression to a dataset, so I did not obtain a mean prediction for this method.
Summary
Estimation method
(go to explanation)
Jun 2009–
Jul 2022
0.51 OOMs/year
90% CI: 0.45 to 0.57
Oct 2015–
Jun 2022
0.2 OOMs/year[7]
90% CI: 0.1 to 0.4
Jun 2009–
Jul 2022
0.44 OOMs/year[8]
90% CI: 0.34 to 0.52
Sep 2016–
May 2022
0.2 OOMs/year
90% CI: 0.1 to 0.4
Jun 2009–
Jul 2022
0.49 OOMs/year
90% CI: 0.37 to 0.56
Table 1: Estimated growth rate in the dollar cost of compute to train ML systems over time, based on a log-linear regression. OOM = order of magnitude (10x). See the section Summary of regression results for expanded result tables.
Figure 1: estimated cost of compute in US dollars for the final training run of ML systems. The costs here are estimated based on the trend in price-performance for all GPUs in Hobbhahn & Besiroglu (2022) (known as "Method 1" in this report).
Read the rest of the report here
I updated this linkpost on Feb 9 to reflect updates to the website version. The main changes are that: (1) extrapolations were removed from plots to avoid implying that the extrapolations are reliable predictions; (2) summary point 3 was reframed to forecast the year by which a spending threshold would be reached under different assumptions, rather than just stating my best guess about future growth rates.
These are "milestone" systems selected from the database Parameter, Compute and Data Trends in Machine Learning, using the same criteria as described in Sevilla et al. (2022, p.16): "All models in our dataset are mainly chosen from papers that meet a series of necessary criteria (has an explicit learning component, showcases experimental results, and advances the state-of-the-art) and at least one notability criterion (>1000 citations, historical importance, important SotA advance, or deployed in a notable context). For new models (from 2020 onward) it is harder to assess these criteria, so we fall back to a subjective selection. We refer to models meeting our selection criteria as milestone models."
This growth rate is about 0.2 OOM/year lower than the growth of training compute—measured in floating-point operations (FLOP)—for the same set of systems in the same time period. This is based on the 2010 – 2022 compute trend in FLOP for "all models" (n=98) in Sevilla et al. (2022, Table 3), at 0.7 OOMs/year. Roughly, my growth rate results from the growth rate in compute subtracted by the growth rate in GPU price-performance, estimated by Hobbhahn & Besiroglu (2022, Table 1) as 0.12 OOMs/year.
These results are not my all-things-considered best estimates of what the growth rate will be from now on; rather, it is based on two estimation methods which combine training compute and GPU price-performance data to estimate costs historically. These methods seem informative but have strong simplifying assumptions. I explain my overall best guesses in point 3 of this summary, but those are based on more subjective reasoning. I base my cost estimates on reported hardware prices, which I believe are more accurate than on-demand cloud compute prices at estimating the true cost for the original developer. This means my cost estimates are often one order of magnitude lower than other sources such as Heim (2022).
See the "Long-run growth" bullet in this section of Cotra (2020) titled "Willingness to spend on computation forecast".
The adjustments are explained further in Appendix I and Appendix J.
This is the mean cost predicted by linear regression from the start to the end of the period.
For these large-scale results, I dropped the precision to one significant figure based on an intuitive judgment given the lower sample size and wider confidence interval compared to the "All systems" samples.
It turns out that this difference in growth rate to Method 1 is just due to the smaller dataset, even though the cost estimates differ significantly (roughly twice as large as those of Method 1 on average) (see this appendix for further explanation).
I included this result for completeness, but given the very small sample size and large confidence interval on the growth rate, I do not recommend using it.
The growth rates obtained via Method 1 and Method 2 were aggregated using a weighted mixture of normal distributions implemented in this Guesstimate model. Note that the results given by Guesstimate vary slightly each time the model is accessed due to randomness; the reported value is just one instance.
The mixture method aggregates the growth rates rather than fitting a new regression to a dataset, so I did not obtain a mean prediction for this method.
To be more precise, I formulated this as: what constant growth rate would result in the most accurate prediction of the final training run cost for a Transformative AI system?
The estimates are less robust in the sense that further arguments and evidence could easily change my estimates significantly. The lack of robustness comes from (a) being a forecast rather than an empirical result, (b) being based on a less rigorous process of estimation that relied heavily on intuition.
See Independent impressions - EA Forum
To get this result, I started with an aggregation of the "large scale" and "all systems" growth rates that I estimated, then adjusted down based on reasoning in this section in Cotra (2020) and in Lohn & Musser (2022).
To get this result, I adjusted down more than for my independent impression, due to partially deferring to this section in Cotra (2020) arguing that growth will slow down in the long term.