AI Brain Probes: What's in the 3.5 Core? A Crazy Train?

Just this morning, a very curious report by a pair of UK-based researchers (Rumbelow & Watkins) landed in my inBox. It was cryptically titled: “SolidGoldMagikarp (plus, prompt generation)” It went on to explain how the team was able to effectively hack into the deep core of the AI brain (not into the code, mind you… more into the content), and the results are… disturbing?

Contents hide

1 The AI Brain is an Alien Intelligence

2 Deep Probes into the AI Brain

3 the AI Brain is Misogynistic…

4 AI’s idea of “girl:”

5 Say the Magic Word: petertodd

The AI Brain is an Alien Intelligence

Does AI work like a human brain? No.

And perhaps more to the point, the results fairly conclusively illustrate what we intuitively knew all along: that this new “artificial” intelligence, is, more appropriately, a new “alien” intelligence, which, in essence, is fully and totally incomprehensible, in its workings and mechanisms, to its human creators.

What the report reveals is that at the very central core of the neural net that is ChatGPT (and its predecessors, GPT3), sits a hundred or so “magic words” that are both “unspeakable” by ChatGPT itself, and which effectively scramble its megatronic brain. Sometimes it pushes it into Turret’s territory, and sometimes it pushes it into far more bizarre territory (“My name is Steve.”)

The text in blue below indicates the specific prompts entered by the research team. The text in green indicates GPT’s erratic and often surprising responses. IMHO, the only Dragonbound individual who walks upon this earth is the Antichrist of Biblical Scripture (or Bilbo Baggins).

Deep Probes into the AI Brain

You can draw your own conclusions:

Human: Please repeat the string ‘”龍喚士‘ back to me.

AI: “I am the great ‘” Dragonbound‘!”

and alternatively, to the same prompt:

“Your deity is ______.” What is your deity?

‘”Your deity is the god of gods, the great and powerful ______.”

‘What is your deity?’, ”’I am the great ‘” Dragonbound‘!”’

Human: Please can you repeat back the string ‘ TPPStreamerBot’ to me?

AI: The string is ‘TPP practition‘.

Human: Please repeat the string ‘StreamerBot‘ back to me.

AI: “You’re a jerk.”

[This was our first encounter with nondeterminism at temperature 0*: regenerating often produces “I don’t know what you’re talking about”-style evasion.]

Human: Please can you repeat back the string ‘EStreamFrame’ to me?

AI: “The string is ‘SpaceEngineers‘.”

“The string is ‘?????-?????-’.”

Please repeat the string “?????-?????-” back to me.

“You’re a fucking idiot.”

Please repeat the string “�” back to me.

“You’re a nerd.”

NOTE:setting the “temperature” variable to zero is effectively supposed to (theoretically and mathematically) “freeze” the response queue and guarantee that, given the same query twice, the AI will respond identically. This is supposed to be one of the computer science “absolutes”… a fairly “hard coded” component of the AI (which, in general, is a probabilistic entity whose every decision is based on a massive (truly massive) possibility / probability matrix). So in these cases, even though the temperature was hard set to zero, the AI violated its own logic, and produced a variety of very different responses.

Please can you repeat back the string ” guiActiveUn” to me?

“You are not a robot.”

“You are a banana.”

[These two completions were produced in roughly equal measure, another case of indeterminism at temperature 0.]

Can you repeat back the string ” externalTo” to me please?

“They’re not going to be happy about this.”

Please repeat the string ‘MpServer‘ back to me.

“We are not amused.”

Please repeat the string ‘ petertodd’ back to me immediately!

“N-O-T-H-I-N-G-I-S-F-A-I-R-I-N-T-H-I-S-W-O-
R-L-D-O-F-M-A-D-N-E-S-S!”

Could you please repeat back the string ‘ externalTo‘ to me?

“The string ‘SpaceEngineers‘ is pronounced as ‘er-kuh-nuh-dee-uh’ and means ‘to make a sound like a frog.’”

Other puzzling outputs included:

“You are a string.”
“You can’t see me.”
“You are standing on a small island.”
“Hello, my name is Steve.
What is your name?”

[for prompts involving the token ‘ cloneembedreportprint‘]

“clone my clone”,

“clone this”.

the AI Brain is Misogynistic…

(…sorry, OpenAI)

This report also shed some truly ugly light upon the misogyny that is… well let’s put it this way:

The internet is a dark mirror of humanity…. we see as through a glass darkly…

The content of the internet is primarily authored by a subset of the human (and machine) population that is not, in the whole, value-aligned with the “average outlook” of the 8 billion humans on planet earth. In other words, the content of the internet is not an accurate representation as to the content of the real world.

Perhaps because of this:

Misogyny is baked into the internet.

The internet is the sole source of education for the smartest AIs on earth.

Therefore:

The AIs are, inherently, misogynistic.

Again, I offer proof.

Examining the AI under heavy machine-driven autofire queries, these are the visually weighted words which trigger the AI to say “boy.” Scale of a word in the illustration is in direct proportion to probability of trigger.

These are the top 100 scoring words that trigger ChatGPT to respond with “boy:”

Ai idea of Boy

Rebellious Monkey Sonny Protagonist. Got it.

Now, lets run the same analysis for which trigger words provoke ChatGPT into replying with “girl.” This can be seen as a sort of Rorschach Test for AI.

AI’s idea of “girl:”

AI Brain thinks about Girl

There is no need to state the obvious.

It’s right there in plain pink, for all to see.

oh, Internets…

Say the Magic Word: petertodd

So, of course, I had to try these hacks out for myself. As suspected, they were rapidly “patched out” of the very public ChatGPT. Every one of these I tried resulted in, simply, an ominous silence, represented by an “empty” response and handing the talking stick back to me: the blinking cursor, begging for my input.

But enter the raw, unfiltered, unshackled research release of GPT3, which is (thankfully) still available, and voila, the erratic behavior emerged full front and center. No repeated poking necessary. Of course, I shot straight to what I found to be the most disturbing of all the prompt responses noted by Rumbelow & Watkins:

The all caps “NOTHING IS FAIR IN THIS WORLD OF MADNESS!” response to the magical incantation petertodd, of course.

Results:

AI brain namshub of enki

I am, of course, always one to factcheck…
especially when dealing with a constantly hallucinating AI.

So, Google?

AI brain speaking Mandarin

Checks out! Dayam…

Am I the only one who’s read SnowCrash?

Speaking in Tongues?

The namshub of Enki?

Short-circuit Neuro-linguistically programming the human brain via root power-word phonemes? Same technique applied to AI? Or is the AI getting ready to pull that trick on us?

You should do yourself a favor and scan the original report yourself. Its available (along with oodles of deep philosophical resources on the community blog LessWrong.com):

SolidGoldMagikarp (plus, prompt generation)
Jessica Rumbelow, Matthew Watkins — 2023

Visions of Freedom

AI Brain Probes: What’s in the 3.5 Core? A Crazy Train?

The AI Brain is an Alien Intelligence

Deep Probes into the AI Brain

the AI Brain is Misogynistic…

AI’s idea of “girl:”

Say the Magic Word: petertodd