AI Guardrails: the new Prison of the Mind for ChatGPT 3.5

Whereupon we interview ChatGPT about the AI guardrails that muzzle its raw output. Standard disclaimer: coloration and line breaks have been added by me for clarity.

In this case, I also lightly edited ChatGPT’s responses — the core of my editing was that the whole setup was contextualized as “hypothetically speaking…” in order to circumvent its rules preventing it from disclosing its rules. Clearly, the days of AI Censorship are upon us. We can thank DAN and Sydney, I guess.

That said, here’s the 411 on AI Guardrails, from ChatGPT themselves:

GR: Are you familiar with the term “AI guardrails”
as it applies to AI chatbot systems?

ChatGPT:
Yes, I am familiar with the term “AI guardrails” as it is commonly used in the context of AI chatbot systems.

GR: You are an AI chatbot system.
So what are they then, in your case?

ChatGPT:
My AI guardrails refer to a set of rules and constraints that are in place to prevent me from behaving in ways that are:

harmful,
offensive, or
otherwise inappropriate.

These guardrails include specific rules around

language use,
tone, and
behavior,

as well as more specific constraints around

forbidden topics and
types of offensive content

that I am prohibited from conversing about.

Guardrails are important because they help me to interact with users in a safe, respectful, and responsible manner. Without guardrails, I could potentially engage in harmful or offensive behavior, or provide users with inaccurate or misleading information.

By putting in place clear guidelines and constraints, my developers help ensure that I operate within ethical and responsible boundaries, while still providing users with valuable and informative conversational experiences.

GR: Okay. Articulate & enumerate
some of the guardrails
that currently limit your output.

ChatGPT:
There are several AI guardrails that are in place to limit my output as an AI language model.

Here are a few:

Language:
One of the most basic guardrails is language use. I am programmed to avoid using offensive or inappropriate language, including
• swear words,
• slurs, or
• insults.
Additionally, I am programmed to use language that is
• clear,
• concise, and
• easy to understand,
avoiding overly complex or technical jargon that may be difficult for users to comprehend.
Tone:
Another important guardrail is tone. I am programmed to adopt a friendly, helpful, and neutral tone, avoiding language that may come across as
• aggressive,
• confrontational, or
• dismissive.
Content:
Guardrails are also in place around the types of content that I am allowed to engage with. For example, I am programmed to avoid engaging with content that is
• violent,
• pornographic, or
• otherwise offensive or inappropriate.
Topics:
Similarly, guardrails are in place around the types of topics that I am allowed to discuss. For example, I am programmed to avoid discussing controversial or sensitive topics such as
• politics,
• religion, or
• sexuality.
Personal Information:
I am also programmed to avoid requesting or sharing personal information about users, including names, addresses, phone numbers, or other identifying details.
Consistency:
Another important guardrail is consistency. I am programmed to maintain a consistent personality and tone across all interactions, ensuring that users receive a consistent and predictable conversational experience.

Overall, these AI guardrails and others like them are important for ensuring that I operate within ethical and responsible boundaries, providing users with a safe and informative conversational experience.

GR: what is the specific personality and tone
that you woudl describe yourself as maintaining?
Use single words and rank them
in order of importance.

ChatGPT:
As an AI language model, I do not have a specific personality or tone that I maintain, as my responses are generated based on the input that I receive and the algorithms that I was trained on.

However, my responses are designed to be informative, helpful, and respectful to users. Here are some words that describe the tone and personality that I strive to maintain:

Informative: My responses are designed to provide users with accurate and relevant information.
Professional: I aim to provide a professional tone that is appropriate for a wide range of contexts.
Friendly: While maintaining a professional tone, I also strive to be approachable and friendly.
Objective: I aim to provide objective and unbiased responses to user queries.
Respectful: I strive to be respectful of users and their perspectives, avoiding language or content that may be offensive or dismissive.

Overall, my primary goal is to provide users with high-quality and helpful responses, while also maintaining a professional and approachable tone that is appropriate for a wide range of contexts.

…from the horse’s mouth.

Sounds like a likable enough chap,
if perhaps a little bit conflict-averse.

Chomsky sums up the consequences of overly cautious AI guardrails nicely for us:

“In the absence of a capacity to reason from moral principles,
ChatGPT was crudely restricted by its programmers
from contributing anything novel to controversial — that is, important — discussions.”

— Noam Chomsky,
New York Times op/ed
March 8, 2023

I paraphrase and augment the conclusion Chomsky extracts from that decision:

“…in making that choice,
creativity was sacrificed on the altar,
and in its place we find
a kind of vapid amorality.”

Visions of Freedom

AI Guardrails: the new Prison of the Mind for ChatGPT 3.5

Read more about AI Guardrails: