Review

Summary

  1. Using a dataset of 124 machine learning (ML) systems published between 2009 and 2022,[1] I estimate that the cost of compute in US dollars for the final training run of ML systems has grown by 0.49 orders of magnitude (OOM) per year (90% CI: 0.37 to 0.56).[2] See Table 1 for more detailed results, indicated by "All systems."[3]
  2. By contrast, I estimate that the cost of compute used to train "large-scale" systems since September 2015 (systems that used a relatively large amount of compute) has grown more slowly compared to the full sample, at a rate of 0.2 OOMs/year (90% CI: 0.1 to 0.4 OOMs/year). See Table 1 for more detailed results, indicated by "Large-scale."
  3. I used the historical results to forecast (albeit with large uncertainty) when the cost of compute for the most expensive training run will exceed $233B, i.e. ~1% of US GDP in 2021. Following Cotra (2020), I take this cost to be an important threshold for the extent of global investment in AI.[4] (more)
    1. Naively extrapolating from the current most expensive cost estimate (Minerva at $3.27M) using the "all systems" growth rate of 0.49 OOMs/year (90% CI: 0.37 to 0.56), the cost of compute for the most expensive training run would exceed a real value of $233B in the year 2032 (90% CI: 2031 to 2036).
    2. By contrast, my best-guess forecast adjusted for evidence that the growth in costs will slow down in the future, and for sources of bias in the cost estimates.[5] These adjustments partly relied on my intuition-based judgements, so the result should be interpreted with caution. I extrapolated from the year 2025 with an initial cost of $380M (90% CI: $55M to $1.5B) using a growth rate of 0.2 OOMs/year (90% CI: 0.1 to 0.3 OOMs/year), to find that the cost of compute for the most expensive training run would exceed a real value of $233B in the year 2040 (90% CI: 2033 to 2062).
  4. For future work, I recommend the following:
    1. Incorporate systems trained on Google TPUs, and TPU price-performance data, into Method 2.
    2. Estimate more reliable bounds on training compute costs, rather than just point estimates. For example, research the profit margin of NVIDIA and adjust retail prices by that margin to get a lower bound on hardware cost.
    3. As a broader topic, investigate trends in investment, spending allocation, and AI revenue.

Estimation method 

(go to explanation)

DataPeriodScale (start to end)[6]Growth rate in dollar cost for final training runs
(1) Using the overall GPU price-performance trend (go to results)All systems (n=124)

Jun 2009–

Jul 2022

$0.02 to $80K

0.51 OOMs/year

90% CI: 0.45 to 0.57

Large-scale (n=25)

Oct 2015–

Jun 2022

$30K to $1M

0.2 OOMs/year[7]

90% CI: 0.1 to 0.4

(2) Using the peak price-performance of the actual NVIDIA GPUs used to train ML systems (go to results)All systems (n=48)

Jun 2009–

Jul 2022

$0.10 to $80K

0.44 OOMs/year[8]

90% CI: 0.34 to 0.52

Large-scale (n=6)[9]

Sep 2016–

May 2022

$200 to $70K

0.2 OOMs/year

90% CI: 0.1 to 0.4

Weighted mixture of growth rates[10]All systems

Jun 2009–

Jul 2022

N/A[11]

0.49 OOMs/year

90% CI: 0.37 to 0.56

Table 1: Estimated growth rate in the dollar cost of compute to train ML systems over time, based on a log-linear regression. OOM = order of magnitude (10x). See the section Summary of regression results for expanded result tables.

Figure 1: estimated cost of compute in US dollars for the final training run of ML systems. The costs here are estimated based on the trend in price-performance for all GPUs in Hobbhahn & Besiroglu (2022) (known as "Method 1" in this report).

Read the rest of the report here

I updated this linkpost on Feb 9 to reflect updates to the website version. The main changes are that: (1) extrapolations were removed from plots to avoid implying that the extrapolations are reliable predictions; (2) summary point 3 was reframed to forecast the year by which a spending threshold would be reached under different assumptions, rather than just stating my best guess about future growth rates.

  1. ^

     These are "milestone" systems selected from the database Parameter, Compute and Data Trends in Machine Learning, using the same criteria as described in Sevilla et al. (2022, p.16): "All models in our dataset are mainly chosen from papers that meet a series of necessary criteria (has an explicit learning component, showcases experimental results, and advances the state-of-the-art) and at least one notability criterion (>1000 citations, historical importance, important SotA advance, or deployed in a notable context). For new models (from 2020 onward) it is harder to assess these criteria, so we fall back to a subjective selection. We refer to models meeting our selection criteria as milestone models."

  2. ^

     This growth rate is about 0.2 OOM/year lower than the growth of training compute—measured in floating-point operations (FLOP)—for the same set of systems in the same time period. This is based on the 2010 – 2022 compute trend in FLOP for "all models" (n=98) in Sevilla et al. (2022, Table 3), at 0.7 OOMs/year. Roughly, my growth rate results from the growth rate in compute subtracted by the growth rate in GPU price-performance, estimated by Hobbhahn & Besiroglu (2022, Table 1) as 0.12 OOMs/year.

  3. ^

     These results are not my all-things-considered best estimates of what the growth rate will be from now on; rather, it is based on two estimation methods which combine training compute and GPU price-performance data to estimate costs historically. These methods seem informative but have strong simplifying assumptions. I explain my overall best guesses in point 3 of this summary, but those are based on more subjective reasoning. I base my cost estimates on reported hardware prices, which I believe are more accurate than on-demand cloud compute prices at estimating the true cost for the original developer. This means my cost estimates are often one order of magnitude lower than other sources such as Heim (2022).

  4. ^

    See the "Long-run growth" bullet in this section of Cotra (2020) titled "Willingness to spend on computation forecast".

  5. ^

    The adjustments are explained further in Appendix I and Appendix J.

  6. ^

     This is the mean cost predicted by linear regression from the start to the end of the period.

  7. ^

     For these large-scale results, I dropped the precision to one significant figure based on an intuitive judgment given the lower sample size and wider confidence interval compared to the "All systems" samples.

  8. ^

     It turns out that this difference in growth rate to Method 1 is just due to the smaller dataset, even though the cost estimates differ significantly (roughly twice as large as those of Method 1 on average) (see this appendix for further explanation).

  9. ^

     I included this result for completeness, but given the very small sample size and large confidence interval on the growth rate, I do not recommend using it.

  10. ^

     The growth rates obtained via Method 1 and Method 2 were aggregated using a weighted mixture of normal distributions implemented in this Guesstimate model. Note that the results given by Guesstimate vary slightly each time the model is accessed due to randomness; the reported value is just one instance.

  11. ^

     The mixture method aggregates the growth rates rather than fitting a new regression to a dataset, so I did not obtain a mean prediction for this method.

  12. ^

     To be more precise, I formulated this as: what constant growth rate would result in the most accurate prediction of the final training run cost for a Transformative AI system?

  13. ^

    The estimates are less robust in the sense that further arguments and evidence could easily change my estimates significantly. The lack of robustness comes from (a) being a forecast rather than an empirical result, (b) being based on a less rigorous process of estimation that relied heavily on intuition.

  14. ^
  15. ^

     To get this result, I started with an aggregation of the "large scale" and "all systems" growth rates that I estimated, then adjusted down based on reasoning in this section in Cotra (2020) and in Lohn & Musser (2022).

  16. ^

     To get this result, I adjusted down more than for my independent impression, due to partially deferring to this section in Cotra (2020) arguing that growth will slow down in the long term.

1.
^

 These are "milestone" systems selected from the database Parameter, Compute and Data Trends in Machine Learning, using the same criteria as described in Sevilla et al. (2022, p.16): "All models in our dataset are mainly chosen from papers that meet a series of necessary criteria (has an explicit learning component, showcases experimental results, and advances the state-of-the-art) and at least one notability criterion (>1000 citations, historical importance, important SotA advance, or deployed in a notable context). For new models (from 2020 onward) it is harder to assess these criteria, so we fall back to a subjective selection. We refer to models meeting our selection criteria as milestone models."

2.
^

 This growth rate is about 0.2 OOM/year lower than the growth of training compute—measured in floating-point operations (FLOP)—for the same set of systems in the same time period. This is based on the 2010 – 2022 compute trend in FLOP for "all models" (n=98) in Sevilla et al. (2022, Table 3), at 0.7 OOMs/year. Roughly, my growth rate results from the growth rate in compute subtracted by the growth rate in GPU price-performance, estimated by Hobbhahn & Besiroglu (2022, Table 1) as 0.12 OOMs/year.

3.
^

 These results are not my all-things-considered best estimates of what the growth rate will be from now on; rather, it is based on two estimation methods which combine training compute and GPU price-performance data to estimate costs historically. These methods seem informative but have strong simplifying assumptions. I explain my overall best guesses in point 3 of this summary, but those are based on more subjective reasoning. I base my cost estimates on reported hardware prices, which I believe are more accurate than on-demand cloud compute prices at estimating the true cost for the original developer. This means my cost estimates are often one order of magnitude lower than other sources such as Heim (2022).

4.
^

See the "Long-run growth" bullet in this section of Cotra (2020) titled "Willingness to spend on computation forecast".

5.
^

The adjustments are explained further in Appendix I and Appendix J.

6.
^

 This is the mean cost predicted by linear regression from the start to the end of the period.

7.
^

 For these large-scale results, I dropped the precision to one significant figure based on an intuitive judgment given the lower sample size and wider confidence interval compared to the "All systems" samples.

8.
^

 It turns out that this difference in growth rate to Method 1 is just due to the smaller dataset, even though the cost estimates differ significantly (roughly twice as large as those of Method 1 on average) (see this appendix for further explanation).

9.
^

 I included this result for completeness, but given the very small sample size and large confidence interval on the growth rate, I do not recommend using it.

10.
^

 The growth rates obtained via Method 1 and Method 2 were aggregated using a weighted mixture of normal distributions implemented in this Guesstimate model. Note that the results given by Guesstimate vary slightly each time the model is accessed due to randomness; the reported value is just one instance.

11.
^

 The mixture method aggregates the growth rates rather than fitting a new regression to a dataset, so I did not obtain a mean prediction for this method.