AI five years from now
Oh boy… This is going to be a relatively long article, I think. My goal is to walk you through what I am very sure will happen within the upcoming five years regarding artificial intelligence.
For the record, my current job is at ABB Robotics R&D as a software engineer, and I have been absolutely obsessed with GPT since GPT-2. I got access to GPT-2 back then, and I’ve also pushed initiatives at ABB Robotics regarding this. I literally cried when OpenAI recently demoed human-like voice, not only because I know what it will bring but also because I’ve been waiting patiently for it since Image-GPT was released in 2020.
Who was I?
I didn’t take any particular note of the transformers paper, which ultimately enabled the creation of ChatGPT when it was released. It’s fun to go back and read the Reddit posts about what people thought about it back then.
My first memory of the GPT technology was with GPT-2. I remember being in awe while understanding, in hindsight, that I didn’t fully comprehend how big of a deal it would become. I saw their generated article about a scientific finding of a herd of unicorns in the Andes Mountains and thought, “Oh wow! I’d love to play around with this for hours and just generate all kinds of random stuff!” But I didn’t see much more potential for the technology than that.
My first real eye-opener with GPT was with the announcement of Image GPT. This was because I realized that this technology truly was data-agnostic. ChatGPT excels with text, and Image GPT excels with pixels, which means the sky’s the limit. This was before we discovered the scaling laws, but just the fact that it was data-agnostic made it obvious to me that this would be scalable in some sense and that nothing was preventing us from applying this to video and voice, which explains my reaction to the voice demo.
Since then, of course, things have blown up a lot. During university, people used to tell me to stop rambling about AI, but nowadays, I simply lean back and let everyone else do the talking for me. While much has come to fruition, I am still very impatient about upcoming technological advancements. I think this is because I know what will be coming, but at the same time, I cannot find someone who shares my enthusiasm for it.
As I stated, the purpose of this article is to detail what I anticipate will be achieved over the next five years. I’m not sure how obvious or not these things are to you, but I want to have it on the record. The reason I’ve chosen five years and not ten is not only because it is more relevant but also because that’s about how far I feel I can realistically see into the future at this point.
I want to make it clear that this article isn’t about how these things will affect society and industries. A lot of these, perhaps almost all of them, undoubtedly have the potential to be very disruptive. But I think that kind of article requires more work than an article like this. With all that out of the way, here is what I am confident will happen before 2029–06–05.
Five years from now
Human-level capable robots
When it comes to robotics, specifically humanoid robots, I don’t see any reason why (or more clearly, I know that) we simply can’t put GPT into a humanoid similar to the Tesla Bot. A robot like this will basically be able to do everything a human can do in a home: make coffee, do the laundry, cook a meal in an average kitchen, and have conversations. We will probably not have a robot like this in every home, but I’m pretty confident that these robots will exist.
The underrated use-cases of voice
At this point, people will likely have been using voice constantly throughout every day for a while. I think we will transition to a situation where almost everyone talks to their AI all the time, very much like in the movie Her. If I can instantly get the correct weather for 5 pm today, why would I bother picking up my phone to awkwardly browse to the right website, select the right day, and check that specific hour like how Neanderthals checked the weather?
AI integrations with all our software
As technologies such as ChatGPT (with voice) become more widespread, I think we will shift much of our current interaction with technology to interacting with technology through AI. My guess is that all apps and OSes will start providing hooks to the OS AI, which the AI can use. For example, the AI might be granted the ability to check appointments on a specific date or create an appointment in Google Calendar.
It feels reasonable that we would grant the AI abilities like this, similar to how we today grant applications access to certain kinds of abilities and information. After installing app, you would get a list of things you can grant your AI power to do on your behalf. Then you would have your AI available at all times to help you complete all sorts of tasks.
The disappearance of actual code writing
I feel like I must comment on this due to it being my profession. Five years from now, I think that “programming”, as in sitting and writing actual text-based code, will have shifted towards something very different. And by this, I mean that you won’t really write code anymore but rather ask an AI to “Add a button that does this” and then test it. A large proportion of all the actual code writing will completely disappear, apart from perhaps some much larger rewrites or architectural changes to a code bases. But I am sure that almost all code at the level of which I write today at ABB will be able to be written completely autonomously through AI.
I’ve seen this as somewhat resembling three phases of coding. The first one was (and currently is for many people) to simply have the AI autocomplete your code. The second one (which I exclusively do today) is to never actually write code but rather simply give the AI the relevant code along with an instruction on what to do. The second approach requires you to somewhat intelligently break down the task into smaller tasks, but I think the second approach can scale up to something like a third phase where I’m not even looking at the code but simply asking for changes to be made. Then it’s simply about verifying the complete code change.
Now, I’m not a gamer, but technology like this will probably enable gaming experiences where the game mechanics and behavior can change dynamically. This will likely allow gamers to easily create and modify existing games.
I think the following phrase captures my feeling of automating programming very accurately: It is much easier to see that a sketch of a tiger is very good than to actually make a very good sketch of a tiger.
Chart-topping music generation
This is, of course, something that is already happening to some degree with companies like Suno providing such services. But I don’t really think we’ve hit a ChatGPT-moment for music yet. My bet is that either these smaller companies will explode in size, or one of the larger players will release something akin to ChatGPT for music. And this would not simply be a gimmick, which I somewhat feel today, but it will be able to create music at a chart-topping quality. These songs will be indistinguishable from songs produced by people like Avicii.
And even though I wouldn’t talk about societal effects, I would just like to briefly address this when it comes to music. I think that some artists will focus on staying authentic, and there might (which I hope because it would be cool) be artists completely relying on AI to make music. When it comes to services such as Spotify, I think that users will be able to listen to AI-generated music based on their current playlists, indistinguishable from real songs.
Complete movie generation
I am a bit more unsure about this one. I am very confident that we will have the technology to generate complete Oscar-level movies within ten years, but a bit less confident about five years from now. But I think that some company will showcase (or even release) something that can generate a whole movie, complete with film, audio, and a compelling story. But I’m not sure that these will be Oscar-worthy at this point. I think that music generation is a bit more straightforward in this aspect. Netflix might eventually provide something like what I mentioned for Spotify but for movies and series, where you could basically ask and see exactly what you want.
Perfect speech through BCIs
I am very sure that we will have created a good brain-computer interface (BCI) along with the software technology to allow people to talk perfectly using only their thoughts at this point. This will transform many people’s lives when you also connect this to the previously discussed OS interface.
Where will we be when it comes to actual AI?
I think this is the most difficult thing to predict for me due to how closely I follow it. We might have changed our fundamental architecture once during this period from transformers to something like Mamba. But I’m not 100% sure that we will do it. I think these future GPTs (or LLMs) will have been trained on all modalities we can find, at the same time: movies, video, audio, speech, text, and even things like stellar curve data. They will have been scaled up to enormous sizes with much longer context windows.
Should I be worried?
When I talk about these kinds of things, it often makes people uneasy. But I’m honestly not worried. My knowledge of history tells me that humans have a tendency to view new technologies with pessimism. And while technology like this is definitely double-edged, I’m very confident that the future is very bright for us humans in this context. But I still think that it’s very important to continue to be conscious of what AI is and how to proceed in the best way possible.
To sum things up
The above article contains what I just couldn’t hold inside me anymore. I would love to hear your thoughts about this, on what you agree or disagree with. Regardless, I think this will be a very fun thing to look back on five years from now, if we’re still alive (just kidding!).