What does LLM training cost? How much does it cost to train an AI these days? There was a time, when an AI training run could be done on a small cluster in your server room, across a couple of hours, or at most, days. That was cute. And ironically, we may someday return to that situation, once these algorithms and architectures get suitably optimized (by AI coding agents, perhaps?). But until that time, a good thing to get your head around is:
The answer is complicated for several reasons, not least of which are AIs companies penchant for secrecy about their models and infrastructure. Nonetheless, there are a substantial number of data points floating around the internets, and by stitching these together, we can make some very educated guesses / approximations about LLM training cost in 2023.
LLM Training Cost: How much?
First and foremost, Altman stated (to a small student audience at MIT), that training ChatGPT cost “significantly more than $100 million.” That’s our starting point. Let’s add this graph:
That graph is extracted from a very well researched paper (“AI and Compute: How Much Longer Can Computing Power Drive AI Progress?“) by Dr. Drew Lohn of CSET, which leads us to make these rough guesses, including our guess as to “How much does it cost to train the model of ChatGPT?”:
|MODEL||TRAINING COST||training TIME||GPUs||parameters|
|GPT2 — 2019||$1 million||2 days||512||1.5B|
|GPT3 — 2020||$10 million||21 days||2,000||175B|
|ChatGPT (3.5) — 2022||$200 million||3 months||8,000||1,800B|
|GPT4 — 2023||$1 billion?||5 months||25,000||10T (10,000B)|
|GPT-X||$? billion?||? year ? months||100,000||?|
It is imperative to note that these are just base training run costs for when you build large language models from scratch. That is, the budget required to run the supercomputers that run the Transformer code iteratively across the training dataset, forging the resultant neural net, or AI brain. These figures do not include all the months (?years?) of RLHF-ing and fine-tuning that smack these raw models into compliance and politeness. (We saw what happened when the “raw” models were released directly to the public… it wasn’t pretty).
And perhaps even more importantly, these figures do not include the costs of actually operating a model post-training (aka inference. btw, why does every AI term have to be something novel and pseudo-technical? Did it have to be inference? Couldn’t they just call it “operating costs”?).
…but how much to operate?
In other words, these are just LLM Training cost : the cost of raw compute and electrical power necessary to forge the Frankenstein AI brain… once it is created, every time it is queried by humans (or by other AIs, or other instances of itself), that incurs additional, and substantial, costs. Some analysts have stated that operational costs of so-called “frontier” or “foundation model” AIs could easily be equivalent to the training cost, per month.
Of course, OpenAI and its competitors charge users for any substantial amount of usage. So while the baseline “free” ChatGPT can be seen as a consumer-facing “loss leader,” anyone who wants to do serious work either signs up for a $20/month “ChatGPT-Plus” plan, or for more substantial corporate use, the API, which allows direct programmatic access to the query/response engine, and charges both input and output on (essentially) a cost-per-word basis.
Look at OpenAI’s basic per token pricing model.
We’ve just entered the:
Days of Billion Dollar AIs
While there is no definitive answer to this question (OMG, I am starting to sound like ChatGPT with my qualifications and disclaimers! Help!!!), there are a few key real world (and rumored) datapoints that will help you start to develop some heuristics. Note that these costs are generally compute + energy only; they do not speak to the (at times significant) overhead of corporate headcount & benefits, rent, CapEx, R&D, etc.
Given the current trend, the CSET study shows clearly that this becomes unsustainable, with the cost of building large language models moving into the trillions within roughly 36 months.
So, where do we go from here?
Its pretty clear that, unless a wildcard ASI solves the stock market or hacks into the World Bank, there will not be AI training runs that cost a trillion USD or more. In fact, argument could be (and is) made that there is simply not sufficient available high performance compute in datacenters and clouds across the earth to even contemplate such a training run.
That said, there are several major vectors to continue to increase foundation model AI performance, capability, and intelligence, without busting the (World) bank.
Possible Future AI Training Efficiencies:
- moving on from Transformer. Transformer has been the primary software architecture of most every (not all!) AI since Google published it (and gifted it to the world) in 2017. The only real innovation has been to scale the hell out of it (bigger training datasets! More parameters! Longer training runs! Bigger context windows! More GPUs!) It is very likely that at some point, a more performant architecture is discovered / created, liberating the industry from the current hyperscaling nightmare.
- new hardware architectures. Again, nVidia’s GPU and Google’s TPU have largely been the only games in town when it comes to hardware. And nVidia has simply crammed more GPU and higher network connectivity bandwidth into each successive generation of AI training & inference hardware. At present, largely due to the twin supply chain bottleneck presented by nVidia and TSMC, every major player has a billion-dollar R&D project to engineer their own custom AI chips. These include:
Tesla Dojo: purpose built AI Supercomputer