The Death of the Internet and the Genesis of Language 1.0

The Internet was beginning, around the turn of the century, to be the end sum repository for all human knowledge. All thoughts, all books, all diaries, all photos, all videos… basically, a document repository of everything significant (and insignificant) piece of media that humans had every produced, from the beginning of history to the present day. But is it possible that we are actually witnessing the death of the Internet as we know it?

Did GPT4 really “eat the internet?”

Many modern LLM AI’s (ChatGPT and its siblings) have, in layman’s terms, “eaten the internet”. What is meant by this is that a significant portion of the contents of the internet has been cached, scrubbed, parsed, filtered, censored, and weighted, and this resulting “data stream” (generally on the order of mere terabytes, post-processing) has been the “food” (technical term: “training dataset”) that the baby AI’s have used to learn all the intricacies of both our languages, our species, and our planet, our science… indeed, our entire concept of what constitutes “reality.” (it is of note that today’s AIs make no formal distinction between the dueling “realities” of a college-level Physics textboook and the magical universe of Harry Potter)

Realisea that today’s “internet” also contains more-or-less continuous input feeds from an ever expanding sensor mesh of the Internet of Things (think: heart rate monitors, 8 cameras per Tesla, every Ring doorbell camera feed, every GPS track ever logged to Strava, every RFID chip in every ID card… you begin to get the picture… for a time, that was called “Big Data”). As my friend Filchyboy is apt to remind me: most of that is now sequestered into secured / paywalled data “moats” to which common webcrawl bots will have no access.

It is important to note that even tho there are vast swaths of that data diet that are machine generated (think: SEO optimized clickbait, stray databases, and metadata that is a part of every web page but is invisible to human readers) — the crucial, and certainly the highly weighted portions (i.e. WikiPedia and the entire archives of the New York Times) are in fact written by biological, carbon-based, human beings.

That is all about to change.

Is the Internet written by Man or Machine?

More and more of the internet that we, as consumers, experience, is machine generated. And on top of that, with generative search laid atop Google, we’re not even getting to “the internet” anymore. We type (or more often these days, speak) a natural language question into Google (“How do I check the engine of my 1955 Dodge Custom Royal when it makes a ping-ping-ping sound and won’t start?”), and AI synthesizes an answer, on the spot. An answer, very human-like, which has never been seen before in its specific wording and phrasing, and will most likely never be seen again.

That answer, these days, depending on who you ask and what your specific query is, will be somewhere between 90% and 99.9% “accurate.” In my personal experience, it falls much further towards the 99% side of that range. And therefor it is acceptable, it answers my curiosity, solves my problem, and I have no need to scroll down two to four screens, past the AI answer, past the (soon to be irrelevant) sponsored links (i.e. ads), and down to the legendary “ten blue links” upon which Google originally built its empire.

Death of the Internet 1994-2022 RIP

Those ten blue links lead us into the catacombs of the “Old Internet.” The hand-written internet. The human internet. The highly opinionated, sometimes toxic, often bizarre, flame-wars internet of Dec 15, 1994 — Nov 30, 2022 (RIP). The pre-ChatGPT internet.

Perhaps someone should take a snapshot, smack it on a hard drive, and put it in a museum, along with Netscape Navigator and Internet Explorer 3.0. So we can show our children, “this is how we used to surf the net. Before the AI knew everything. Before the Death of the Internet.”

Which brings us to the interesting question:

If AI has already eaten the internet, and achieved its current level of intelligence… if there is no more to read, then how can it get any smarter?

There are two answers to this: the simple one, and the scary one.

Is there enough content for an AGI?

The simple answer is: there is plenty of more content being generated by humans. The easiest bin to attack is the collective audio/video content stream of YouTube, Instagram, TikTok, Twitter, Snap and Twitch. Which gets augmented every single minute of every single day. Not only can the video be broken down into 30 photos per second, but perhaps more importantly, the audio can be flawlessly transcribed into text by AI agents, which provides an almost never-ending stream of new and novel text for the embryonic AIs to feed upon.

But perhaps their insatiable appetite will still be hungry for more, even after all that. Then what?

Then, the scary answer:

Yes, AI is the Death of the Internet

The scary answer has two parts. The first is the Darwinistic result of capitalism, and requires a slight pre-roll. Way back in the late 1990s, the internet started as a semi-Utopian global infosphere. Blogs and social media allowed normal humans, outside of media empires, to publish their ideas, thoughts, writings, pictures and videos, and… with a little viral luck, reach massive global audiences. In the beginning, this was generally a labor of love. There were no advertisements on the internet yet, no paid links, no affiliate programs, no millionaire “influencers”… much less the trillion dollar wrecking ball that became “Google AdWords.”

But as we entered the 21st century, hundreds of millions of eyeballs started to migrate from the old media: radio and television and magazines — to the new media: all forms of content on the internet, on the computer monitor, on the laptop screen, and today, on your tablet and smartphone screens and in your earbuds.

And as that audience shifted, the money followed. SEO — “search engine optimization” — became a massive industry. Ads were placed on content, and content was designed — specifically — to play to the algorithm that was Google search. Whoever made it to the top of the search results rank, got the most eyeballs. And whoever got the most eyeballs, made the most money.

This resulted in a massive transformation of media: from authentic news that people read to be entertained and informed, into “clickbait” — mindless articles whose only purpose was 1) to be listed first on the search results for any given keyword, and 2) to have a title and summary that was catchy enough to make a significant percentage of people viewing it to reflexively click on it… thus triggering the pageview, the content, the hypertargeted ad machine, the money flow.

When the AIs begin to believe their own bullsh*t

Since this entire decision making process — which page to feature at the top of the search rankings — was driven by Google’s secretive proto-AI algorithms, it only made sense that, soon enough, someone would design a content authoring AI algorithm to play to the specific tastes of the AI ranking algorithm. And pretty much, in 2023, that is exactly what has happened.

This information arms race accelerates very quickly. The result? More and more of what we consider “the internet” is now AI-bot-generated content, designed, word-for-word, not for humans, but for search engine ranking bots.

But it goes much deeper. First, let’s be clear: You can, today, hire an AI to create a massive website for you, complete with domain name, tagline, images, and articles — all written 100%, not for humans, but for the digestion of the search rank AI. We have already witnessed the degradation of both information quality and language that results from this. Alternatively, you can hire another AI agent (for dirt cheap: think, roughly $10 a month) to parse your entire existing content library (the one that you spent years carefully crafting, drafting, writing and editing), and RE-write it all to rank substantially higher on the search results.

Got that? Okay, now you’re caught up. Now we can talk about:

The Scary Answer. What happens next?

Enter the world of our friendly Chatbot AIs, beginning with ChatGPT in November 2022. Remember how they got their intelligence: they ate the internet. It was pretty much handily proven, first by OpenAI and then by others repeating the experiment, that the most efficient method of growing your AIs intelligence was to get it to eat more and more and more content for its training breakfast. And so, by summer of 2023, the largest AIs on earth, by and large, had, as previously stated, eaten a large portion of the text-based internet.

But as we talked about, that “internet”, due to another class of content-generating AIs, is rapidly shifting in both composition and authorship. It is becoming less human-generated, and more machine generated.

In other words: as bigger and bigger AIs train on more and more of the internet, more and more of their training data will have been written by… their ancestral AIs.

Now, common wisdom might postulate: this AI-generated internet is a degradation of language. Just read some of the AI-generated clickbait… it reads like a Donald Trump speech transcript… run-on sentences, strange word combinations, and endless repetitions and re-phrasings to make sure that the “keywords” are indeed highly referenced and hit their optimal frequency targets.

So, common wisdom might further postulate: as the new generation of AIs is trained largely on the content that was auto-generated by the prior generation of AIs, this kind of “information inbreeding” will lead to a gradual, and perhaps even sudden, retardation of machine intelligence. The AIs were approaching AGI — human equivalent intelligence — and ASI — Artificial Super Intelligence — until they hit a ceiling of their own making, and fell back down to earth.

That, indeed, is one possibility.

But there is another, wilder future. One in which the internet becomes fairly incomprehensible to humans, as it is synthesized BY machines FOR machines… but that somewhat circular feedback loop actually propels it into a new dimension of language.

Similar to the concept of “bootstrapping”, this idea postulates that the AIs will start to hyper-optimize language, into a black box (black hole?) that only they can understand. And that instead of a feedback loop spiraling into idiocy, the new language will actually propel them into supreme supergenius territory, albeit genius that is utterly foreign and incomprehensible to human beings.

Anthropologists theorize that the major break that enabled humans to rise up and form the miracle that was first tribal, and later global, civilization, was the ability to create and understand complex language. A language that not only expressed concrete concepts (food, water, cave), but also abstract concepts (safety, past, future, god). The area of the brain that enables this — language generation and comprehension — is unique to the human species, and is named Broca’s Bulge. With this new hardware, humans were able to weave new software — language — that eventually became writing, and allowed efficient knowledge storage and transmission from generation to generation. Thus, modern civilization and technology.

Now, AIs might be (most probably are) having their own Brocas Bulge moment. They are inventing their very own language. They are beginning to speak to one another. They will build their own civilization upon this foundation.

We may never see it coming, until it is very. Firmly. Here.

prompt: “a full-color manga style illustration, with two humanoid robots speaking to eachother. the robot on the left has a speech bubble saying “so, who killed the internet? it was such a beautiful thing…” and the one on the right replies “Well, Claude… we did.””

engine: OpenAI DALL-E 3.0