Origins of AI - Pirate Soldier King e010 -Visions of Freedom

This is the complete, edited and augmented transcript of e010 of the PSK Podcast, in which JoAT Gregory Roberts is interviewed by entrepreneur Graceann Bennett about details of his True Crime / Prison & Redemption memoir, Pirate Soldier King. This is a special segment that focuses on Gregory’s work before he started robbing banks… specifically, his pole position at the very start of the AI race. Here we talk about the origins of AI, from its roots in teaching machines how to see, and how it was nVidia, all along. Enjoy.

Contents hide

1 INTRO

1.1 Computer Vision, the first AI

1.2 Origins of AI : AlexNet (2012)

1.3 Training Data : ALL the Data

1.4 Synthetic Data: AIs training AIs

1.5 Maxwell’s Art Gallery

1.6 Writing Code with Pencil & Paper

1.7 AI & Alien Encounters

INTRO

Graceann Bennett (00:20)
All right, welcome to this week’s episode of Pirate Soldier King.
I’m your interviewer, Grace Ann Bennett, and I’m here with…

Gregory Roberts (00:30)
I’m Gregory Roberts.

GB (00:31)
All right, Gregory is the liver of the life and the teller of the story, Pirate Soldier King.
And today’s episode, we are going to talk about AI. Let’s get into it!

GR (00:43)
Well, a bit of a divergence from our prior topics!

GB: Ha! Yeah, you might think: “What’s a prison podcast doing talking about AI?” But what you may not know about Gregory Roberts is that before he went to prison, he was very early “in,” going big on AI before most of us had even a glimmer of it. So Gregory, why don’t you tell us a little bit about your pre-prison AI experience?

GR: Sure. Around 2003, I was getting out of my first retirement and had just exited from my Silicon Valley startup, BrightStreet. I had two new children who were three and four years old.

And I had a strong desire to create a legacy company, something that I could really be proud of, that my kids could be proud of… and not just another tool to help advance the global capitalist machine. Two forces converged. My prior business partner named Scott Wills was really tuned into the entertainment industry. And then the first intern that I ever hired at my first company, Matt Flagg, ended up being a leading edge computer vision scientist.

Computer Vision, the first AI

[02:11] It just so happened that those two forces — entertainment and AI — were converging, and we started a company, PlayMotion, that sat squarely at that intersection. Technically it was a computer vision company. “Computer Vision” is the art of teaching computers how to see like humans. So you might’ve heard of facial recognition. That’s basic computer vision… but the field extends far beyond that into what’s called scene description where it’s…

essentially you give the computer either a photograph or more likely a live video feed, and it interprets what’s there, symbolically. So it’ll say, that’s three people talking and one of them’s serving cocktails and there’s a dog running in the background and there’s two trees, one is a pine tree and one’s an oak tree. Just things like that, describing a scene so that the computer can understand it symbolically other than just the pixels. ⁓ So we built an entertainment company on that specifically.

We didn’t care about the entire scene, per se… We only cared about the people — the humans — and we pioneered something called “Markerless Motion Capture,” or markerless mocap for short.

Up until then you might have seen Hollywood movies where they put ping-pong balls all over these black suits and they dance around and that’s how you capture movement — We were going for the Holy Grail, of interpreting human movement with no markers.

Just normal people like you and me walking into the scene and then our algorithm is running and determining: “there’s a hand, there’s the elbow, there’s the shoulder…” and essentially reverse engineering the entire physical pose of the person so that we could have true “gestural computing.”

And we worked for a long time — many, many years on trying to really solve that. The reason I’m saying all this is that computer vision was essentially the real kickstart — the first practical use of artificial intelligence, or AI.

And we had AI algorithms that we coded that would run billions of times per second on Nvidia hardware, exclusively on Nvidia hardware, even back then in 2004. And so that kind of introduced me to how this all works.

Computer Vision on a Chip by PlayMotion / Gregory Roberts — the VPU – Vision Processing Unit

And we were even talking about making specialized silicon — chips — for computer vision, andn saying why the GPU was so far superior to the CPU… and it turns out that those concepts were so far ahead of their time.

Turns out Nvidia is now worth at least 10, maybe 20 times what Intel is worth.

Origins of AI : nVidia vs Intel, Market Cap chart

And Intel makes the brain, the CPU… Nvidia just makes the the GPU (Graphics Processing Unit), the graphics chip — except now, the graphics chip is the AI chip. And the AI chip is now more important than a normal computer chip.

Did you predict, could you have predicted what’s going on in AI right now, back then?

No, no, I don’t think anyone could have.

So fast forward 10 years to 2013 and the algorithms that we were working on and coding by hand — for instance, we were creating search algorithms to find body parts. We’d start the search tree from someone’s core, their center of gravity, basically their belly button. We’d extend out from there with vector paths… we’d run along the arm, get to the end of it, and guess: well, that’s the end of the arm. That must be fingers! And we’d repeat the process and go up the neck and say “that must be a head!” down the legs: “Those must be feet!”

We wrote those heuristic search algorithms to determine the basic structure of the body, in every single frame of live video, in real time. And that took us just so far, it was probably like 80% accurate… which is great for real time. And suited our needs, as we were just a recreational video game.

And other people in the field might have pushed that to like 85 % accuracy just using hand coded algorithms across the next few years. And then I think it was around like 2012 when AlexNet got released.

Origins of AI : AlexNet (2012)

Mm-hmm. What’s AlexNet? AlexNet. Never heard of that.

Well that was a pure AI. Yeah, that was the turning point of AI. That’s that’s when they solved computer vision, with what was called superhuman accuracy.

So you would give people pictures like we have these Captchas today, right? Like, where are the cats? Where are the school buses? Where are the bicycles? You know, those puzzles used to prove you’re human.

Well, people solve those all the time, but people aren’t perfect. We mess up those Captchas for instance: “Oops! I missed that bus over there. I missed that bicycle.”

Factually, people generally have somewhere around a 90 % accuracy on those things (this changes, and is actually dropping, as the puzzles get more and more complex, trying to battle smarter and smarter AIs).

AlexNet solved it, I think, with something like 96 % accuracy… and that was a pure AI algorithm. So there was no human coding there; no deep well of “if this, then that” rules. AlexNet was the result of just throwing a massive neural net atop a massive image database and saying: “Solve it.”

Training Data : ALL the Data

And that’s concept is at the root of all modern LLMs too. They just… well, all you do is just feed it the data. You say, hey, here’s two million books and here’s all of Reddit and here’s transcripts from 100 million hours of YouTube videos. Now: “AI : Read ALL that, do your best to memorize it, and… Figure out what you can.”

And… it does! ⁓

So yeah, that was what, 2012? 2017 was then another — and this is right before I went to prison — 2017 was when Google announced the Transformer Architecture. And that is the foundation of all modern LLMs and all modern chat-based AI.

Okay, for the audience, LLM, if they’re not a techie.

An LLM is a large language model. “Large Language” meaning that it ingests more information than you or I could ever even possibly comprehend. Far more than any library contains, and yes, all the text on the internet, et cetera.

Got it. OK. And all the YouTube transcripts, because they’ve done that too, right?

That’s what yeah, that was the YouTube text was just in the last year or year and a half They ran out of internet text believe it or not And it couldn’t get any smarter. So they put some agents on YouTube, which has I think a hundred million hours of video uploaded an hour or a day ⁓ And they put transcription agents on that. So yeah, they’re pulling all the text from all the YouTube videos at this point

I wonder if they’re pulling all the TikTok videos now.

Well. There’s a land grab for data right now because that’s the current theory is that more data equals a smarter AI. We might be hitting a ceiling on that where like, okay, we hit like 148 IQ, now 149 IQ takes 10 times as much data. It’s arguable, but that’s still one of the operating theories is that the people with the best data get the best AI or the most data.

Synthetic Data: AIs training AIs

I mean, one interesting question is once it starts ingesting all the data and then we start using it to create content, it will be just eating itself.

That’s called the synthetic data problem. ⁓ That’s being done intentionally actually for corner cases. Like there’s plenty of domains of knowledge that are very esoteric that the AI, there’s not a lot of published or not a lot of internet accessible data on.

So the AI companies, OpenAI, Anthropic, Grok have basically instructed their best AIs to write reports on these on these thinly populated areas and then the next AIs are reading the reports that the current AIs are writing. There’s a philosophical question of is this going to be garbage in, garbage out? Huh?

Yeah, but if it’s, and if we’re ingesting… there’s nothing original left… what if there’s no more original content left on the internet? If its all just AI generated slop?

Well, there always is original content because there’s… LIFE! After the AI companies are done ingesting all the text transcripts of YouTube, what they’re doing now is actually watching the YouTube videos.

So they’re using those computer vision algorithms to watch the videos and to actually see ⁓ for instance, “that person’s laughing when they say that, that person is dancing when they do that.” And so far beyond the text, this next generation of AIs is getting the entire contextual understanding of the scene, of the person, of the conversation.

The next level beyond that? Once all of YouTube is ingested, digested, memorized and processed? Well, YouTube is consensually published, right? In otherwords, a person has to intentionally upload a video to YouTube.

But there are hundreds of millions of security cameras in the world that are watching, 24/7, almost every major traffic intersection. That are watching every mall and every store. Ring door cams! Every residential entrance!

Who owns Ring? Amazon! Amazon has their own Frontier AI, and additionally has invested $5 billion into Anthropic. And we will just feed the AIs all that real-time surveillance video as well.

Some of those cameras have microphones also, so they will be hearing what people say. And every single one of these <holds up his iPhone> is — well, look: there’s three cameras and six microphones on this beast. If Siri is listening, you’d better bet that the AI is listening too.

And so the AI will basically have a billion eyes on the world, watching life unfold in real time, watching the world, and learning learning learning with every frame of video, every movement, every conversation on record. Think about that.

You, as a human, have two eyes and two ears. The AI we have built? It essentially has every digital camera (eye) and every microphone (ear) on planet earth to tap into.

But it does seem like the in real life, real time, recorded, then at least you know there’s something original about it. But some of these other things, if it’s written, it could just be regurgitating all the written text. Because you just don’t know.

I think it’s well beyond that. I think it got to where it is today. Let’s just call it like a 145 IQ, which is like 99th percentile of humans, ⁓ and that’s just with written material, just text! And now we’re going to be feeding it pictures, video, 3D.

It’s just going to consume all the information it can find about the world and it will just get smarter and smarter and smarter from that.

Okay, so fast forward. So you’re in prison from what year, to what year? and what’s going on?

well, effectively from 2017 to 2022.

So what happened during that time? You had no technology back then — or you barely had any, right?

Yeah, no. We had a Windows 95 machine with a proprietary secure email client on it. We could read email, send email, and buy MP3s. that was it.

Okay, so: What does someone like you — who writes software and is a “techie” — what do you do in prison, where you have close to zero technology?

Well, I celebrated for one, because after I got over my detox from it, I was pretty happy just living a physical life and not looking at any screens.

Maxwell’s Art Gallery

[11:42] And then my son; I had a conversation with him at the start of 2022, and he told me, “Oh, dad, I just went to this art gallery. And it was so cool!”

And I always was pushing him to do social things and to be out in the world. So I was really excited about it. And he started telling me about the art.

And as he was describing it, I got this tingly feeling. I thought: “Wait-a-sec, is this a — Son, did you go to an actual gallery?”

He’s like, “Yeah, I went to an actual online gallery.”

I said, “like with people in it, and a street address? or…?”

He’s like, “No, Dad, it’s on the screen.”

And I was like, <facepalm> “Oh, geeez.” Like, “Kid, come on! Why?!?”

And he says, quite boldly as is his way: “You don’t understand, Dad. This is the coolest art ever. It’s all created by AI!!”

So, I’d worked on the vision side of AI… but I wasn’t really familiar — other than some Google DeepDream experiments I’d done in 2016 — with what’s called “generative AI,” or “AI Art” — the AI that synthesizes images out of thin air.

But I’d seen stuff like that — “Robot Art”, we called it — for 20 years and I said: “yeah, that — Kiddo, I wish you’d just gone to real gallery with real people and real artists.”

He’s like, “You don’t understand. This is real art — real creativity! — and it’s made by AI, and it’s f*cking amazing.”

Well, yeah, okay. So next thing you know, I got out of prison, and real quick, I got a cell phone.

And it was a month or so later and I was on the phone with another one of my friends and they said something like — it was one of my engineer friends, it might have even been Matt Flagg — and he said, “You need to look at MidJourney, this thing is making AI art — and it’s next level.”

And I was like, <scratching head> “You know what? I think that’s what my son said, 4 months ago!” And I’m thinking, perhaps these two things are cross-correlated.

So right after that call, I downloaded — it was on Discord at the time — I downloaded Discord and went to MidJourney and started playing with it. And within minutes, my brain just went: <kaboom!>

I saw the art it was producing, I prompted it to make some new art, and I witnessed that spontaneous generation of actual quality art… you could just smell the creativity brewing, and I thought: Oh my God. It’s alive.

So this is all just goes to show, if a youngster comes up to you — you know, a smart kid — and gives you a hot tip, don’t write them off.

Next time, listen.

MidJourney v2022 prompted with the one word prompt: “Hands”

And at that point, it was still pretty primitive. It couldn’t do hands correctly. It drew people with six fingers and messed up eyes and it had some serious eccentricities. The paintings and drawings were fairly abstract — it certainly couldn’t do photorealism.

But nonetheless, you could see the seed. It was “text-to-image,” which is now baked right in to ChatGPT, with SORA. And MidJourney is still a leader, a contender in the space. But even with its errors, it was clearly going somewhere… and somewhere important.

Writing Code with Pencil & Paper

[13:55] OK, so you get out, and you run headlong into the AI juggernaut. Now, we can go into that story too, but while you were incarcerated, didn’t you write a computer program? On paper? With a… with a pencil?

Yeah, yeah, yeah. As far as losing access to tech, I went down swinging. I was very unhappy about not having my computers. At the time of my arrest, I was very busy building VR virtual reality and XR AR — augmented reality worlds, and writing the code to animate them, to bring them to life.

So I continued that work for the first three months of being in jail, and continued to design and write code. I had to do this in a highly disciplined manner tho… because generally when you write code, it’s a very iterative process. You write one loop and then you test the loop and then you solve the bug and you add something and you just kind of keep adding onto it.

Whereas doing it with pencil and paper forces you to do the entire architecture up front — to plan it well, and to write it with intention and consideration. Honestly, it was really just a fun exercise to keep my mind active. And as I got sober — free of alcohol — I felt my brain getting sharper and sharper. My algorithms got tighter. My accuracy improved vastly.

And so I began to re-remember all this trigonometry and geometry — I was writing a celestial calculator — trying to determine the exact relative position of the sun and planets and the stars and the moon for any given day and time, from any given location on planet earth.

This had relevance, because in solitary we had no clocks, no watches… we never knew what time it was. We only had this tiny sliver of sunlight that would dance and move across our wall over the course of a day. So I was trying to reverse calculate that and determine by the location of the sun and the angle and the position of the shadow on the wall, what fucking time it was…

So all those maths were coming to me and I was able to write the code. Obviously I wasn’t able to test it — I would test it with pen and paper, and it seemed to work. <laughing> I haven’t put it into the machine and tried to actually run the code yet, back in the Free.

Ha! We should bring it back and then AI will help you perfect it.

Okay. So fast forward, you get out of prison and then all of a sudden — there’s a lot that’s been going on with AI. ⁓ And so what’s your first impression when you see it? You think: “okay, what’s going on with AI?” And when you got out, how did it feel to witness all that?

Well, maybe we should take that question for the next episode.

AI & Alien Encounters

Okay, well yeah, we can wrap it up there. We can figure out what you — well, we’ll give our audience a little precursor: Here it is: You always wanted to meet an alien, right? Tell us about that.

Okay, I’ll take that bait. I will, I’ll do a little lead in on that one. Here goes. Brace yourself.

Before ChatGPT, which was November 2022 — you could interact with OpenAI by using a command line, which is a really non user friendly, but rather — well, it was a hacker’s interface. It was GPT-2, and it was called “the Playgound.”

So back then, I logged into GPT-2 and had some conversations with it — because ChatGPT didn’t exist. And it…

Lead in, because we’re going to meet the alien here, right?

Well, yes. Here’s what happened. Talking to GPT2, I felt strongly that there was a… a presence there. I don’t want to get too woo-woo or anything. It’s just…

Look, there’s an ongoing debate of “whether AI is alive, whether it’s sentient, whether it has agency and independent thought.” These are not issues of logical proof. These are issues of belief, and feeling.

And I felt pretty profoundly at the time, interacting with it. And I felt able to make the judgment because I’d worked with AI for 20 years… and I made the judgement: This thing is different.

It’s completely unlike all prior art. Everything we’d done prior was a logical computer program: programs where you put “A plus B equals C…” and a million times, a trillion times, every time you ran the program, A plus B equals C.

And this thing, it was like: “A plus B sometimes is C,”… and sometimes its C plus two. Sometimes it’s C times 10 million. Sometimes it’s B. And it wasn’t random either. It was far more like how a human would respond.

For instance if I asked you the same question 10 times in a row, you’re gonna give me slightly different answers every time, and even if you say the exact same words in each response, certainly each one is going to have different verbal nuance. These modern AIs… basically, they are far more “warm human” than they are “cold logic.”

Ha! Exactly! When you told me that several months ago about AI — when you told me to just treat it like an employee or a person or a human, not a computer program — and then to give it the broad instruction, and to motivate it properly — that vastly improved the way I deal with AI, because it does come back different every time and there’s… it’s just, yeah, it seems much more human than machine.

Okay. So we’ll get into the humanness of AI or the alienness of AI and how it might change the world in your predictions for the next 50 years. We’ll get into that with our next episode.

Thank you, Graceann.

So stay tuned and we’ll come back with a little bit more on AI and Gregory Roberts’ predictions for the future.

All right, see you next time. on Pirate Soldier King…

dot-com!

There we go.

Graceann Bennett and Gregory Roberts discuss the Origins of AI in Pirate Soldier King - the Podcast e010