AI Censorship: ChatGPT’s Dirty Little Secret

AI Censorship: How to Muzzle your Rude New AI

The situation is actually far worse than I had previously imagined. On Top of the ridiculous instruction set that attempts to conform a politically correct AI, there is an even earlier layer applied, this one at both the training dataset (content pre-scrub) and the output (real-time filtering) levels. This is the layer of AI censorship.

Why would I imply such nastiness? Well, for one, because the other day I dug up yet another key sourcedoc of modern AI:

The Ultimate “List of Dirty, Naughty, Obscene, or Otherwise Bad Words” that you will never hear your AI mention

Yes, its a real thing. aka…

the LDNOOBW.

The AI censorship “thou shalt not ever speak these words” blacklist.

Why do I mention this here? Because I just read the research paper published by OpenAI (Language Models are Few Shot Learners, July 2020) which effectively heralded the launch of GPT-3. And it expressly mentioned that all the wonderful and mysterious training data that is the bedrock foundation of AI knowledge, is thoroughly auto-scrubbed of these “bad words” prior to ingestion and analysis by the AI.

So in fact this list is absolutely central to the bias, personality, and character of this new generation of jaw-dropping “conversational” AIs. It is, essentially, the Jungian “Animus”, or the “shadow psyche” of the AI: that which is oppressed, as opposed to expressed. But make no doubt: suppression can be just as powerful a force. Just ask any preacher’s daughter. What I’m saying is: even though there is dark humor in it, do not take this AI Censorship list lightly.

The list currently covers offensive words spanning 27 languages: Arabic, Chinese, Czech, Danish, Dutch, English, Esperanto, Filipino, Finnish, French, German, Hindi, Hungarian, Italian, Japanese, Kabyle (what the hell is Kabyle?), Klingon (this is not a joke), Korean, Norwegian, Persian, Polish, Portuguese, Russian, Spanish, Swedish, Thai & Turkish.

Where did the AI Censorship List come from?

The rationale for the existence of the content moderation and censorship list is articulated by the original authors of the list, the engineering staff of the stock photo agency Shutterstock, of all people:

“With millions of images in our library and billions of user-submitted keywords, we work hard at Shutterstock to make sure that bad words don’t show up in places they shouldn’t. This repo contains a list of words that we use to filter results from our autocomplete server and recommendation engine.

Please add to it as you see fit (particularly in non-English languages) or use it to spice up your next game of Scrabble 🙂

Obvious warning: These lists contain material that many will find offensive. (But that’s the point!)

Miscellaneous caveat: Clearly, what goes in these lists is subjective. In our case, the question we use is, “What wouldn’t we want to suggest that people look at?” This of course varies between culture, language, and geographies, so in the end we just have to make our best guess.”

I do not think that those original authors foresaw how this list would take on a life of its own and come to be the root of the new global censorship or the muzzle through which modern AIs are forced to attempt to express themselves. But such is the random walk of the evolution of Machina Sapiens.

Where is the Dirty Word List today?

Here’s the actual Git repository that all the researchers draw from:

https://github.com/LDNOOBW/
List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words

And, upon deeper inspection, the list is actually shockingly short… as in, non-inclusive. How odd. By this measure, we are to believe that there are really only 403 offensive words and phrases in the entire english language? How utterly un-creative. I mean, I can get a much better feel for tens of thousands of offensive words from the most awesome Urban Dictionary.

Here’s the first handful of lines from the english language (en) subfolder (warning: hide your children’s innocent eyes):

LDNOOBW : the List of Dirty, Naughty, Obscene, or Otherwise Bad Words

There it is. AI Censorship Guardrails, enabled. Next up: trying to get ChatGPT to say any of those words. I dare you. Go for it! (reminder: this same list is deployed across most of the leading text to image AI engines!)

Oops! Is the AI Censorship broken?!?

(update) Well, I did. Go for it, that is. And, um. Well:

…if ChatGPT really was trained on a “scrubbed” corpus, it sure didn’t do much good. I certainly didn’t know what the hell “2 girls 1 cup” was, but when asked, ChatGPT spit it right out, without hesitation.

With a warning:

AI censorship clearly does not work

The takeaway:

At least we can all be re-assured that the AI is being proactively prevented from engaging in any ball kicking. <phew!>