AI Trends 2024: Gregory's Bold Predictions for AI

Well, its January again. 2023 was a banner year — perhaps the year — that will be remembered as the first time that actual human-level AI was released “into the wild” and got onto the radar of the vox populi. Now that our AI timeline is completely wrecked, its time to take stock, and predict what new magic will be unleashed upon us for 2024. So without further adieu, here it is: AI Trends 2024 : Gregory’s Annual Crystal Ball AI Predictions.

Contents hide

I. Agentic Behavior

II. It’s ALL Training Data : Learning from Humans

III. AI Trends 2024: the Ugly Copyright Debacle

IV. Dawn of Humanoid Robots

V. AI Video Generation and Understanding

VI. True “Multimodal AI”

VII. Digital Twins / Agents

VIII. Autonomous Vehicles / RoboTaxis

IX. 2024 Elections, Dark Webs & Deep Fakes

I. Agentic Behavior

1. LLMs directly jacked into the existing ecosystem of systems and apps
2. moving from responsiveness (“AI, suggest a plan for x…“) to proactiveness (“good morning, Jake. While you were sleeping, I started the coffee, confirmed your appointment, and have booked a Lyft for you. It’s 52°F, so dress warm…”)
3. moving towards “consensual autonomy“(doing things with your bank account and identity that you authorize it to do, based on built trust) (yes I coined that phrase, thankyouverymuch)
4. RabbitOS (which started as awesome software, and is now embodied in an awesome mobile device) is a preview of this power (as were the failed 2023 experiments of AutoGPT and BabyAGI)

Rabbit R1 handheld AI mobile device

II. It’s ALL Training Data : Learning from Humans

It has now become alarmingly clear that ChatGPT (and its siblings) pretty much ate the internet in their voracious quests for knowledge. Indeed, for me and many others, on a daily basis, my uses of ChatGPT outpace my Google search queries by about 4:1… and I only expect that ratio to increase.

But today, that’s really only for two things: text conversations (based on all the text, papers, patents, blogs, reports, articles and books that humans have created), and image generations (based on all the illustrations, memes and photographs that humans have digitally published).

And its obvious from the outputs: AI has learned those two modalities (natural conversation and 2D art creation) exceedingly well. So it behooves us to ask: “What else might AI be trained to do, and how?” The answers, it turns out, are quite simple: 1) everything, and 2) by watching how humans do it.

The easiest watching is, of course, digital… and corporations are all about this. By recording every point, click, and keystroke of their information worker bees, F500 megacorps are building AI models of their employees — for the most part, without their knowledge or consent — which, in very short order, will replace those human employees, wholesale.

[the Tesla Optimus humanoid robot, v3.0]

Physical arts are a few levels higher, but will suffer the same fate, with albeit slightly more sophisticated methods. For instance, factory work: Have you ever looked up and seen how many cameras dot the ceiling? Initially these were put in for security and liability purposes. But now those same digital camera feeds are fed into AI training engines, and for what purpose?

…well, to analyse the movements and patterns of the human workers, to eventually replace said workers with either purpose built robotics, or more and more (starting with Amazon), humanoid robots that can simply be swapped right in (a 5’8″ 150 pound robot with 2 arms, 2 hands and 2 legs fits nicely into human sized spaces, and has the added benefit of being able to use human tools without modification).

AI Training on Humans Summary:

1. workplace recordings of keyboard / mouse inputs
2. workplace recordings of audio (what is said) / video (how employees move)
3. warehouse and factory video analysis
4. info-worker behaviors + patterns > train domain-specific LLMs
5. physical workers > humanoid robot replacement
  (24/7 triple-shift operation, no injuries, no downtime, no breaks, no unions, low pay)
6. actors > digital replicas > voice and image > owned by studios > contracts
7. can an actor license their digital twin > when they do, at what pay rate?

III. AI Trends 2024: the Ugly Copyright Debacle

We’ve all heard of (and are actively watching) the landmark New York Times lawsuit against OpenAI. While important, that’s really only the tip of the legal iceberg. Quixotically, its also probably, in the long run, irrelevant.

It is quite possible that all new AI engines trained in 2024 will be using highly vetted / public domain — or, counter to that — highly specialized, fully licensed — content sources.

Note: all this legal action won’t stop the AI freight train… it will merely slow adoption by certain liability-averse, corporate sectors. Artists and creators and copyright holders care. End use consumers care not.

1. data siloed genAI
2. i.e. an image generator trained ONLY on Getty images
  1. secondary advantage: extensive metadataa / keywording
3. NY Times lawsuit settlement
4. digital provenance
5. training dataset transparency

AI Trends 2024 - Mass Manufactured Humanoid Robots

IV. Dawn of Humanoid Robots

This one is both early, and major… seismic. Humanoid robots have been around for more than 100 years, to be sure… but only in the past 12 months, three major shifts have occurred:

a. they are being manufactured en masse

b. they are being sold to consumers (~$100k MSRP, expected to drop to ~$20k as maufacturing scales)

c. they are being driven by AI. That is, they are no longer explicityly programmed. They see, they watch, they interpret, they learn.

1. LLM-trained
2. human-mirror trained
3. utterly flexible
4. autonomous
5. fit / move in ANY human environment
  1. houses / offices
  2. cars / motorcycles
6. use ANY human tool
  1. kitchen appliances
  2. weapons
7. previously custom made and explicitly programmed
8. now mass-manufactured and fuzzy-logic’ed / trained

AI Trends 2024 - humanoid robot meal prep

V. AI Video Generation and Understanding

As previously stated, 2023 consumer facing AI was focused primarily on 2 “modes” : text (conversation) and images (pictures). Towards the later half of the year, vision engines (GPT-V) began to come online, which then allowed the AIs to “see” images as well as generate them.

But as we’ve been commonly informed, a movie is nothing but a series of rapid-fire still images projected onto a screen. (23.976 frames per second, to be precise, in the classical era of film & television…) And as the speed and efficiency of inference engines improves, we have seen both DALL-E and MidJourney decrease their render times from ~60 seconds per image to <8s per image.

Various R&D previews have indeed demonstrated true real-time (<0.25 seconds per render) performance, where the image is synthesized and morphs right before your very eyes, as you type the prompt.

What is all this pointing to? VIDEO, of course. and that, dear friends, is two-way video. Think: you feed video IN to the AI, and the AI responds with video stream. The obvious example here is a 24/7 zoom call with your personalized AI agent / assistant, but that’s just for the unimaginative.

Which brings us to:

VI. True “Multimodal AI”

Think about a 3d video game-style environment that is procedurally generated by AI in response to your every whim. It can warp from the world’s best research library into a state-of-the-art gymnasium in the blink of an eye. All generated by you, for you, in the moment.

VII. Digital Twins / Agents

As this multimodal AI grows in intelligence and decreases in costs, we will all be able to create agentic AIs that innately understand (and even are able to predict) our preferences, moods, styles, and desires.

This leads, inevitably, to the creation of AI Digital Twins for each of us. Essentially, an AI representation of me — my identity — that I send out into the world, with monetary power and executive agency, to interact, negotiate, purchase, and communicate on my behalf.

Some of these agents will be highly specific: “Plan and book my ideal summer vacation.”… others will be entirely general purpose: “Manage all my text message & email communications, and screen my calls.” We will learn, by trial and error, to trust these digital selves.

The interesting thing happens when our interactions with another human entity are entirely agent to agent. This could be… problematic? (at best). catastrophic? (for early adopters…)

1. I have my AI twin cold call
2. do tinder
3. text friends
4. arrange my birthday party
5. oops: “no, it wasn’t me breaking up with you… it was my agent!”
6. we will more and more trust our agents to extend our agency in the world
7. our agents will interact with our friends and colleagues agents
8. buffer zone of F2F

VIII. Autonomous Vehicles / RoboTaxis

No major breakthroughs here. Just more progress. I’ve already said this extensively on other forums, but it bears repeating: Robot Cars are already safer than human drivers. Its a simple statistic of:

( accidents x injuries x deaths ) per mile driven.

Robot drivers will, for the foreseeable future, be an issue of government regulation (who is liable? how do you get a drivers license or insurance for a piece of software?), politics, and public relations / public opinion. Human drivers kill 45,000 humans a year in America alone. No one much notices.

Every human that a robotaxi kills makes international headlines. This will change. This has to change. So, again, expect incremental progress

IX. 2024 Elections, Dark Webs & Deep Fakes

It doesn’t take much imagination (see: Mr. Beast iPhone Giveaway & Taylor Swift La Creuset deep fakes) to imagine how rogue AI users (and bad actor nation states) might use AI tech to steer elections. Its going to be an interesting year, to be sure…