AI Datacenters 2025: Energy Costs and Build Budgets Unveiled

There has been a lot of noise in the mainstream press about the massive buildout of AI datacenters across the world.. with the latest being an audacious announcement by the Trump presidency of the $500 billion Project Stargate. The primary market signal of this buildout is “CapEx,” or capital expenditures, reported by the top 5 AI companies: OpenAI, Microsoft, Amazon, Google, Meta (Apple & Tesla remain dark horses). Microsoft alone has famously committed $250 billion (yes, a quarter of a trillion dollars) to AI datacenter building in the next 3 years (well, 5 years, but we’re on year 2 already). The others are not far behind. Meta (Facebook) may even be exceeding that number.

And, shockingly, that’s just the physical build-out of the AI datacenters. Construction, chips, server racks, cooling towers, and (omg) power plants. You read that right. Both Microsoft and Google are building / funding the construction of entire electrical power stations, formerly the exclusive domain of governments and public utilities, and signing contracts to guarantee purchase of 100% of the power generated by the plants for the next 20 years. In the case of Google, this entails the construction of six new nuclear power plants. (side quest: Google building 7 new nuclear power plants for the sole use of AI)

Are you starting to sense the seriousness of this moment?

Contents hide

1 AI DataCenter / Training Run Costs

1.1 assumptions:

1.2 top line conclusions

2 Questions that led to this post:

3 AI DataCenter & Frontier Model Training Runs:

3.1 HARDWARE COST

3.2 POWER / ELECTRICITY / ENERGY

3.3 DEFINING ENERGY

3.4 GPU power consumption

3.5 COST OF ELECTRICITY

3.6 POWER PLANT GENERATING CAPACITY

3.7 Nuclear Powered AI Datacenters

3.8 HUMAN POWER CONSUMPTION

3.9 AI DATACENTERS VS. HUMAN CITIES

3.10 AI CHIP / HUMAN EQUIVALENCIES

3.11 AI DATACENTERS VS. SUPERCOMPUTERS

4 Related Posts:

In case you were wondering… yes, the term “unprecedented” applies. Read here my comparison of corporate AGI development budgets with the two most audacious human endeavors ever: the Manhattan Project (nuclear bomb) and the Apollo Program (putting a man on the moon):

How does the Quest for AGI compare to the Manhattan Project?

(Note, there is a blurred line here between the Top 5 tech companies and the Top 5 AI companies. Big Tech is making massive investment in AI Startups, and in parallel, many of them have their own Frontier models. While Google and Meta have their own models, Amazon & Microsoft have largely decided to focus on large-scale venture investment and massive infrastructure buildout. This table tries to break down the cross-pollination a bit:

[TL;DR]

AI DataCenter / Training Run Costs

assumptions:

most of the frontier models utilize AI datacenters
which contain within them racks upon racks…
>5,000 nVidia H100 GPUs networked in parallel
to perform their training runs.
a training run lasts generally 100 days,
with those 5,000+ chips running at full tilt the entire time.
(some as short as 6 days, some as long as 300 days…
but 100 is a rough average)

top line conclusions

here’s the takeaway:

one H100 running for 24 hours
uses the same amount of electricity as
a single US citizen uses in a day.
(700W x 24h)
thus, as a general heuristic,
1 H100 = 1 human
(in terms of electrical power consumption)
using that metric…
.
…a modern AI training run
takes the same amount of energy
as it does to power a city of 500,000
(1/2 a million) ppl in America
for one day
or, more pragmatically,
the electricity that it takes
to power a village of 5,000 people
running full tilt 24/7 for 100 days
.
total electrical cost (~$3 million)
for the training run is slim,
compared to:
total hardware cost ($180 million)
to build the datacenter,
and that’s just for the chips
that cost estimate does not include:
land, construction costs, gas, electricity,
cooling, network, interconnect, staff,
or operations…
.
operational costs (aka inference), or
“how much it costs to answer user prompts & queries”
will be the topic of a separate post

Questions that led to this post:

I see compute measured in FLOP for AI training runs,
…such as 5e25 FLOP
(the reported computation required to train
Google’s Gemini Ultra Frontier AI in q4.2024).
…what does FLOPs mean?
is it power x time?
can you achieve 5e25 FLOP in 100 days with X compute,
or in a single day by deploying 100X the compute power?
what is the level of infrastructure required to achieve 5e25 FLOP?
1. as in, how many nVidia H100s would it take?
2. …running at 100% capacity
  across how many 24/7 days?
how much electricity would that take?
1. see also: How much of the World’s Energy is AI actually using?
how much would the datacenter cost?
1. in terms of CapEx (chips + infrastructure + construction)?
how does 5e25 FLOP compare with the world’s fastest supercomputers (TOP500)?
How long would it take the world’s fastest supercomputer to generate 5e25 FLOP?
are the AI datacenters faster than the supercomputers?

AI DataCenter & Frontier Model Training Runs:

the down and dirty details:

The total training compute for the final training run of Gemini Ultra,
likely the most compute-intensive model to date, is estimated at 5e25 FLOP
.
5e25 FLOP = 5×10^25 FLOPs
written out:
50,000,000,000,000,000,000,000,000 operations per second

FLOPs is an acronym for
FLoating-point OPerations per second…
basically, how many simple mathematical calculations
does the specified hardware
(be it a CPU, a GPU, or an entire datacenter)
perform in one second of time
FLOPs are a measure of computational work (effort),
not energy or power specifically
efficiency is how much power (electricity) is needed
by the GPU + machine / network architecture
in order to achieve a fixed number of FLOPs
less efficient datacenters
can achieve the same number of FLOPs
as state-of-the-art datacenters,
by using more electricity, more time,
or both

compute does not scale linearly:
Linear scaling assumes perfect parallelism.
In reality, scaling efficiency can drop due to bottlenecks in:

- hardware,
- memory bandwidth, or
- communication between nodes

An NVIDIA H100 GPU delivers
approximately 1 petaflop (1 PFLOP)
of FP16 compute performance
1 PFLOP = 10^15 FLOPs
- FP16 is “16-bit floating point operation”
- the number of bits indicates the level of accuracy of the calculation
- supercomputers often run at 32-bits (FP32)
- super-efficient AI models often “dumb down” to 8-bit or even 4
- this gives a loss of accuracy,
  but massive improvements in speed and efficiency
that rating is FLOPs per second (1/60th of a minute)
there are 86,400 seconds in a day (24h * 60 sec/min)

a single H100 chip
would take 1,600 years
to complete a 5e25 FLOPs
AI training run
.
alternatively,
an array of 6,000 H100s
networked in parallel
could complete the compute / training
in 100 days
.
or 600,000 H100s
to complete the training in a single day

HARDWARE COST

each H100 = $30,000 MSRP
6,000 H100s == $180 million
that is chips only.
does not include land, construction costs, gas, electricity, cooling, network, interconnect, staff, or operations

POWER / ELECTRICITY / ENERGY

each H100 consumes 700W of power
AI datacenters typically house
between 5,000 and 20,000 H100s
within a single facility,
so:
6,000 H100s = 4 MW (megawatts)

DEFINING ENERGY

the unit of energy is the joule
1 Watt=1 joule/second (1J/s)
in practical terms, watt-hours (Wh)
Energy (in watt-hours)=Power (in watts)×Time (in hours)

GPU power consumption

For 6,000 GPUs each consuming 700 W:
- Total power = 6,000 × 700W
- = 4,200,000W
- = 4.2MW >> PER HOUR
  .
so, for 100 day training run of 6,000 H100s, thats:
- 100d x 24h x 4.2MW
which is:
- 10,000 MWh

COST OF ELECTRICITY

how much does using 10,000 MWh cost at commercial rates?
how much at residential rates?

As of December 2024, the average electricity rate in California is approximately

30 cents per kWh
(roughly same for commercial and residential)

10,000 MW = 10,000,000 KW

so: $3 MILLION USD
to power a single AI datacenters training run

POWER PLANT GENERATING CAPACITY

Q: what is the output of an average power plant?

1,000 MW for traditional power generation
100 MW for renewables

Power Plant Type	Rated Power Output (MW = megaWatts)	Total Annual Energy Output (MWh = megaWatt hours)
Coal	500–1,500 MW	4,500,000–13,000,000 MWh
Natural Gas (Combined Cycle)	400–1,300 MW	3,500,000–11,500,000 MWh
Nuclear	1,000–1,600 MW	9,000,000–14,000,000 MWh
Hydropower	50–3,000 MW (varies widely)	500,000–25,000,000+ MWh
Wind Farm (onshore)	100–300 MW	250,000–750,000 MWh (variable)
Solar Farm (utility-scale)	50–250 MW	100,000–500,000 MWh (dependent on sunlight)

Nuclear Powered AI Datacenters

Google commissions buildout of 7 new nuclear power plants in US, guarantees purchase of 100% of their power output through 2045; electricity generation to be reserved for the sole use of AI models, entities & agents. [Wall Street Journal, Oct 14, 2024]

HUMAN POWER CONSUMPTION

what is the average watt consumption of a US city of 100,000 people?

average electricity consumption per person in the U.S. is approximately 12,000 kWh per year (12 MWh)
365×24=8,760 hours/year
100,000 ppl × 12,000 kWh
= 1,200,000,000 kWh/year
= 1.2 TWh / year

so the city on average needs generation of:

140 MW

…BUT:

that’s average, and it needs to sustain peak load

peak is generally double

so call it:

280 MW

is the standard energy generation capacity
for a city of 100,000 people in America

AI DATACENTERS VS. HUMAN CITIES

so basically, the
AI datacenter power draw,
while training

is the equivalent of
5,000 U.S. households
running full steam

…and it does that for 100 days nonstop

if it were to compress the training run
into a single day,

that would be the power draw of
500,000 people

half a million people

AI CHIP / HUMAN EQUIVALENCIES

intriguingly:

a single nVidia H100 GPU
uses the same amount of electrical power
as a US citizen in a single day

…so there is your “human equivalent” in terms of AI brains…

1 human = 1 nVidia H100

and finally:

AI DATACENTERS VS. SUPERCOMPUTERS

the world’s fastest supercomputer is:
El Capitan
- housed at Lawrence Livermore National Labs (USA)
- powered by AMD silicon:
  - 11 million total chip cores (CPU + GPU)
  - 44,000 CPUs : AMD EPYC 24C “Genoa” 24-core 1.8 GHz
  - 44,000 GPUs: AMD Instinct MI300A
- peak performance: 3e18 FLOP
- sustained performance: 2e18 FLOP
  .
so to do the 5e25 AI training run,
it would take:
.
330 days
thats ~1 year,
or 3x that of the 6k H100s
at the AI datacenter
.
and to answer the question:
the World’s Fastest Supercomputer harnesses actually less than 1/3rd the raw power of a single medium-to high-end AI datacenter.
…to which we say: Dayaaaaammm!!!

the original unabridged ChatGPT 4o conversation / Q&A transcript that led to this post is available for viewing / augmentation / follow-up:
• https://chatgpt.com/share/677308ec-1058-8003-b28f-5e9d9a9d5836

engine: MidJourney v6.1

prompt: drone photo of AI datacenters stretching onward to an infinite horizon, smoke pouring from the cooling towers, large conduits & pipelines connecting them to a distant city. silhouettes of multiple nuclear power plants dot the horizon.

AI DataCenters & Training Runs 2025 : How much Energy, how much Money?