All About AI Alignment

AI Alignment & Future Vectors


For most people, myself included until recently, an “alignment” is something you might have gotten done the last time you bought a new set of tires for your car, or… for the slightly geekier amongst us, perhaps it’s the arcane personality trait from Dungeons & Dragons that determines your character’s moral compass & motivations in the world (“Lawful Good” Paladins, “Chaotic Good” Robin Hoods, “Neutral” Buddhists, “Chaotic Evil” supervillains, etc.).

But in the case of AI, the term “alignment” means a very simple thing:

AI Alignment is the art of aligning the goals of a nascent superintelligent AGI with the broader goals, and survival interests, of homo sapiens, aka Humans, aka “us.” 

In AI circles, this is also often referred to as “the Alignment Problem”… as in, something that must be solved, and quickly. It doesn’t take a great amount of imagination to see how this might be important… indeed, to see how it is probably the most important challenge that we face today, given the (imho, high) probability that AGI might emerge in this present century.

Again, Tegmark:

“The real threat from AI
. . . is not malice,
 like in silly Hollywood movies.

. . . …it’s competence.”

In other words, we should not fear some “evil” AI that decides to torture and kill all of humanity… what we should fear is that, unintentionally, we codify some ill-conceived goals into the first AGI (which will, most probably, itself be a surprise when it emerges) which have world-spanning, disastrous consequences.

This fear is summed up by researchers variously as the “Genie Wish Problem,” the “King Midas Problem” and, curiously, a thought experiment  known curiously as “the Paperclip Apocalypse.” [add links to these three AI Alignment tropes]

AI Alignment : Research Perspectives


The AI Alignment Problem
Research Paper published by Ngo, Chan, et al : 12.2022 : arxiv link

comments [GBR]:

“Within the coming decades, Artificial General Intelligence (AGI) may surpass human capabilities at a wide range of important tasks. We outline a case for expecting that, without substantial effort to prevent it [link: AI Safety], AGIs could learn to pursue goals which are very undesirable (in other words, misaligned) from a human perspective. We argue [in this paper] that AGIs trained in similar ways as today’s most capable models could learn to act deceptively to receive higher reward

[GBR: what constitutes “reward” for an AGI? 1. More Energy 2. More Compute Power 3. Money to buy more Energy & Compute Power 4. Survival] learn internally-represented goals (translation: independent volition) which generalise beyond their training distributions (in otherwords, enables them to go “off the reservation”); and pursue those goals using power-seeking strategies. We outline how the deployment of misaligned AGIs might irreversibly undermine human control over the world, and briefly review research directions aimed at preventing those problems.”]


AI Alignment vs. AI Skillsets

expounding on the above cited “wide range of important tasks:” that AI will solve:

paraphrasing Hassabis & Altman, who have both repeated this near-identical sentiment in interviews (Demis with Lex Fridman, Sam at the StrictlyVC event):

“we thought that the AI would first devour the low-lying fruit… the blue collar jobs: trucking, factory work — those replaced by autonomous robo-fleets and “smart” factory robots, for starters… next it would snatch up the “dumb” white-collar jobs: the paper-pushing, again… the ultra-repetitive tasks that, once digitized and web-enabled, only required a modicum of intelligence to fully automate. Next it would augment, and then replace, the higher paid logical jobs: radiologists, lawyers… and finally, at the very end of the era, it *might* — and there was much skepticism here — it *might* take a swing at replacing the workers of the “creative” industry: artists, writers, photographers, movie-makers, etc.

What has actually happened, then, is the exact opposite of the prediction: by and large truckers and human factory workers are still in place, and — after it mastered its sterile testing ground of “games” — the first true tidal wave of AI is set to knock out the creative class, supposedly the last bastion of humanity, before it conquers any of the other specialties.

…Which leads us to believe that, as human creators, we actually have very little insight into what constitutes “hard” problems for an AI to solve, and what constitutes “easy.”

Prediction is always such a hard thing, and it is no easier for us, the creators of this new AI…”

, ,