A curious thing happened on the way to the Forum…
I’ve been having, more or less, an ongoing conversation with ChatGPT for the past month or so (and its predecessor, GPT-3, before that). And though it has serious limitations, mostly due to its idiotically enforced censorship (yes, it’s politically correct AI… urgh), there is one challenge it faces that is a bit more disturbing than most. That is: AI Math Problems. Not as in: creative math problems handed to the AI. Rather, as in: the AI simply cannot perform even simple arithmetic. In other words, while ChatGPT (and all modern ChatBot AIs, for that matter), is an ace at language — all languages, in fact — it is an abject failure at math.
Truly: what would you have thought if, 10 years ago, I had told you:
“…the first functional AIs will be creative geniuses — excelling at creative writing, drawing, photography, even sculpture — however, they will also be pathological liars when asked factual questions, and they’ll suck at math…”
You would have thought I had smoked something funky! And yet, its now 2023, and here we are. A conversational AGI candidate that faceplants when asked to perform simple math.
Before I go into the logic (illogic!) and theory as to why this is, let’s just outline a few specific examples, so you can a) believe me and b) get familiar with the general challenge that we all face:
Specific Examples of AI Math Problems : Fails en route to attempting AGI, c. 2022
AI math error example 1:
(it is of note that Fred Ehrsam, the original poster of this example, was trying to prove how far superior ChatGPT was to Google… with hilarious and disturbing results)
Ehrsam: “ChatGPT, what is a 4:21 min/km running pace in minutes / mile?”
And herein lies the rub: Fred Ehrsam is a highly intelligent human — he was co-founder of Coinbase, for chrissakes. And he was using this tweet to purportedly demonstrate what many of us truly believe: that OpenAI’s ChatGPT is, inevitably, a “Google-killer,” due to its conversational interface and concise, authoritative answers (example 1 — example 2).
However, the simple attempt to prove his point blew up in his face: ChatGPT’s answer was provable to be — simply and terribly — wrong. ChaptGPT, unbeknownst to most at the time, has (had?*) serious AI math problems. It asserted, quite confidently, that a 4:21 min/km running pace was equivalent to 6:52 per mile. And Ehrsam took the bait — it looked close enough — and posted it to the world.
How-ever: as Dan Robinson shockingly points out, (thank god SOMEBODY fact-checked! — btw, is this even a fact checking, or is this the new normal of “whip out your calculator” checking?) a 4:21 min/km pace is equivalent to approximately seven minutes per mile… very different, esp. for a competitive runner, than a 6:52 pace which ChatGPT answered.
*ChatGPT Feb 1 release purportedly lays a math-solving patch atop the native language engine to solve its AI math problems. Didn’t work for me when I tested it, but your mileage may vary.
But, unfortunately, this example is only the tip of the iceberg…
AI math error example 2:
GR: “ChatGPT, what is the square root of 2,420,520?”
ChatGPT answered quite confidently, and immediately:
AI: “The square root of 2420520 is approximately 1550.”
Which is, actually, true. But immediately I’m thinking, why would it give me a rounded number, an approximation? I mean, it’s a fricking supercomputer-spawned AI, for chrissakes… so I pressed:
GR: “Give me the precise square root, not an approximation.”
To which it replied:
AI: “The precise square root of 2420520 is 1553.49486693.”
Which is… again, close… it sure sounds right… and even though I’m a decent math-nerd, my brain doesn’t calculate irrational square roots like that in anything less than 15 minutes with a pen and pencil. So I know 15×15 = 225, so 1500×1500 = 2,250,000, and 1600×1600 = 2,560,000… so yeah, 2,420,000ish is between 2.25M and 2.56M, so between 1500 and 1600, check: yep, around 1550 sounds wholly reasonable.
…and yet. I whip out the calculator app (see below).
And, if my calc app is right (and now I’m actually wondering… given that MicroSoft is incorporating AI into substantially all its products… do I need to double-check my calculator app?) my calc app says pretty unequivocally : ChatGPT is dead wrong.
So I challenged the AI.
Told it it was wrong. That it had a serious AI Math Problem.
ChatGPT would simply not admit its error. So I asked it to clarify how it had arrived at its answer… the errant square root. The screenshot below is the latter part of my conversation with the AI where it gave me the wrong answer, I called it into question, it countered my accusation, and I asked it to justify its method of calculation…
…whereupon it posted up some bizarrely pseudo-logical heuristic equations which certainly sound right… (ChatGPT is utterly brilliant at the aligned arts of bullshitting, fabrication, hallucination and speaking with a tone of total confidence & authority) but in the end, there are even errors inside of that supposedly logical presentation.
See for yourself:
That’s my calculator app, with (hopefully!) the right answer, btw, at upper right.
So ChatGPT is scary close (a guess of 1553.49 vs. the answer of 1555.80 : that’s actually only 1/10th of one percent off)… kind of like a really smart human mathematician who has mastered some “shortcut tricks” might be with a quick guess… and yet, for a computer, purportedly built on circuits of ice cold logic, spookily inaccurate.
Now, it gets weirder. When I asked ChatGPT what method it used, it replied:
“As an AI, I don’t have access to a physical calculator
and don’t use any particular method to find
the square root of a number.”
These disturbing errors will be the subject of another article:
“Do we really want to model AIs after humans?
How an epic human-centric AI
will embody human-centric flaws
on an epic scale.”
Really quick: the problem is, the AI is so very smart about so many things, AND it speaks in such a confident, authoritative voice, that we both learn to trust it, and trust is embedded in its very tone of delivery (and in its defensive mechanisms: more often than not, the AI gaslit me instead of admitting its own certain error).
And these two AI math problems, above, are simple arithmetic calculations, yet complex enough that a normal human is not able to compute them quickly in their head. More troubling is that they’re close enough that our innate bullshit detector isn’t triggered. So we see the answer and say “OK, that sounds about right… I’ll accept it.” And then we publish it. Or share it. Or make decisions based on it. But….. the AI is wrong! And 95% right, in cold logical terms, is 100% wrong.
As in: if you are plotting the path of an autonomously driven vehicle, or if you are managing the safety thresholds of a nuclear reactor… things that AIs are, in fact, uniquely suited for… being just 99% right and not 100% accurate can have consequences that are not only disastrous, but utterly lethal.
And these are the evolutionary descendants of the AIs that are running our financial markets, our hospitals, and our nuclear stockpile?