Google says its next AI ‘Gemini’ will be more powerful than ChatGPT

Sushil Kumar 💝❤️
9 min readJun 30, 2023

In a somewhat provocative new interview with Wired Magazine, Demis Hasabis, head of Google DeepMind, is quoted as saying that Gemini, which could be released as soon as this winter, will be more capable than OpenAI’s chat GPT, he reveals that they are attempting to combine some of the strengths of AlphaGo type systems with the amazing language capabilities of large models.

Google says its next AI ‘Gemini’ will be more powerful than ChatGPT

Before we look into how that might work, here is the context of the Gemini announcement from Sundar Pichai. Asabis promises that we also have some new innovations that are going to be pretty interesting. And I know many people will dismiss this as all talk, but remember DeepMind was behind, not just AlphaGo, but also AlphaZero, which can play any two player full information game from scratch. They were also behind AlphaStar, which conquered Starcraft 2 with quote, long term planning. And let’s remember that for later. And most famously perhaps Asabis led them to the incredible breakthrough of AlphaFold and AlphaFold 2, which are already impacting the fight against plastic pollution and antibiotic resistance.

So let’s not underestimate DeepMind, but back to Gemini, we hear from the information recently that the multi modality of Gemini will be helped in part by training on YouTube videos. And apparently YouTube was also mined by OpenAI. Of course, that’s not just the text transcripts, but also the audio imagery and probably comments. I wonder if Google DeepMind might one day use YouTube for more than that. A few days ago, they released this paper on Robocad, which they call a self improving foundation agent for robotic manipulation. And the paper says that with Robocad, we demonstrate the ability to generalize to new tasks and robots, both zero short as well as through adaptation using only 100 to 1000 examples for the target task.

We also show how a trained model itself can be used to generate data for subsequent training iterations, thus providing a basic building block for an autonomous improvement loop. Notice that part about using the model itself to generate data. That reminded me of a conversation I had with one of the authors of the textbooks are all you need paper Ronan Eldan from Microsoft. When you get elite math papers with proofs and elite scientific research, if you train on much more of those for way more epochs, I don’t think we’re that far away from AGI.

I personally can’t see any barrier within the next five years. Ronan said this, as you said, I also don’t see any barrier to AGI. My intuition is that there’s probably a lot more improvement we can do with the data we have and maybe a little bit more synthetic data. And this is even without starting to talk about self improving mechanisms like alpha zero, where the more you train models with some verification process and you generate more data, this can be done in math and other things as we see here with RoboCap.

So you know there’s just so many directions where we can still go that I don’t think we’re going to hit a ceiling anytime soon. Can’t wait to show you guys the rest of that paper and what else I learned from Ronan, who is also by the way the author of the tiny stories paper, but back to Gemini. If you remember the planning bit from DeepMind’s earlier systems, that reminded me of something else from Gemini’s introduction.

Gemini was created from the ground up to be multimodal, highly efficient at tool and API integrations and built to enable future innovations like memory and planning. This is echoed in the article in which Hasabes says his team will combine a language model like GPT4 with techniques used in AlphaGo, aiming to give the system new capabilities such as planning or the ability to solve problems. Interestingly, this comes just a few weeks after DeepMind’s extreme risks paper, which identified long horizon planning as a dangerous capability. For example, adapting its plans in the light of unexpected obstacles or adversaries and generalizing to novel or new settings.

For me, this is a bit like when a model can predict what humans would do in reaction to its own outputs. Back to the article, it’s interesting though that Hasabes is both tasked with accelerating Google’s AI efforts, while also managing unknown and potentially grave risks. So what’s his take? Hasabes says the extraordinary potential benefits of AI, such as for scientific discovery in areas like health or climate, make it imperative that humanity does not stop developing the technology. It also believes that mandating a pause is impractical, as it would be near impossible to enforce. If done correctly, it will be the most beneficial technology for humanity ever, he says of AI. We’ve got to boldly and bravely go after those things.

So how would AlphaGo become AlphaGo GPT?

Hasabes described the basic approach behind AlphaGo in two of his recent talks. So what’s going on here then? Well, effectively, if one thinks of a go tree as the tree of all possibilities, and you imagine each node in this tree is a go position. So what we’re basically doing is guiding the search with the model. So the model is coming up with most probable moves and therefore guiding the tree search to be very efficient. And then when it runs out of time, of course, then it outputs the best tree that is found up to that point. We’ve learned that from data or from simulated data. Ideally, you have both in many cases. So in games, obviously, we have this it’s effectively simulated data. And then what you do is you take that model and then you use that model to guide a search process according to some objective function.

I think this is a general way to think about a lot of problems. I’m not saying every problem can fit into that. I mean, maybe. And I’ll give you an example from drug discovery, which is what we’re trying to do at isomorphic. So this is the tree I showed you earlier finding the best go move, right? And you’re trying to find a near optimal or close to optimal go move and go strategy. Well, what happens if we just change those nodes to chemical compounds? Now, let me know in the comments if that reminded anyone else of the tree of thoughts paper in which multiple plans are sampled and results were exponentially better on tasks that G54 finds impossible, like creating workable crossword or mathematical problems that require a bit of planning, like creating the greatest integer from a set of four integers using operations like multiplying an addition.

Google says its next AI ‘Gemini’ will be more powerful than ChatGPT

Well, I think my theory might have some legs because look at where many of the authors of this paper work. And just yesterday, as I was researching for this video, the tree of thoughts paper was also cited in this paper on using language models to prove mathematical theorems. As you can see at the moment, GPT four doesn’t do a great job. But my point in bringing this up was this, they say towards the end of the paper that another key limitation of chat GPT was its inability to search systematically in a large space. Remember, that’s what alpha go is really good at. We frequently found that it’s stuck to an unpromising path when the correct solution could be found by backtracking a la tree of thoughts and exploring alternative paths. This behavior is consistent with the general observation that LLMs are weak at search and planning. Addressing this weakness is an active area of research and then they reference the true of thoughts paper.

It could well be that Gemini let alone Gemini two, which is state of the art for mathematical theorem proving. And to be honest, once we can prove theorems, we won’t be as far from generating new ones. And in my opinion, fusing this alpha goes style branching mechanism with a large language model could work for other things. We’ve all seen models like GPT four sometimes give a bad initial answer, picking just the most probable output in a way that sometimes called really decoding. But methods like smart GPT and self consistency demonstrate that the first initial or most probable output doesn’t always reflect the best that a model can do.

And this is just one of the reasons, as I said to Ronan, I honestly think we could see a model hit 100% in the MMOU in less than five years. The MMOU, which I talked about in my smart GPT video is a famous machine learning benchmark, testing everything from formal logic to physics and politics. And I know that predicting 100% performance within five years is a very bold prediction. But that is my prediction. But if those are the growing capabilities, what does Demis Asabis think about the implications of the sheer power of such a model? One of the biggest challenges right now, hasabis says is to determine what the risks of a more capable AI are likely to be.

I think more research by the field needs to be done very urgently on things like evaluation tests, he says, to determine how capable and controllable new AI models are. He later mentions giving academia early access to these frontier models. And they do seem to be following through on this with DeepMind, open AI and anthropic giving early access to their foundation models to the UK AI task force. This foundation model task force is led by Ian Hogarth, who was actually the author of this, the we must slow down the race to Godlike AI paper that I did a video on back in April. You check that video out. But in the article, Hogarth mentioned a practical plan to transform these companies into a sound like organization. And somewhat unexpectedly, this idea was echoed this week by none other than Satya Nadella, who had earlier called on Google to quote dance. Essentially, the biggest unsolved problem is how do you ensure both at sort of a scientific understanding level and then the practical engineering level that you can make sure that the AI never goes out of control. And that’s where I think there needs to be a sound like project where both the academics along with corporations and governments all come together to perhaps solve that alignment problem and accelerate the solution to the alignment problem. But back to the article, the interview with Asabis ended with this somewhat chilling response to the question, how worried should you be? Asabis says that no one really knows for sure that AI will become a major danger. But he is certain if progress continues at its current pace, there isn’t much time to develop safeguards.

I can see the kind of things we’re building into the Gemini series, and we have no reason to believe that they won’t work. My own thoughts on this article are twofold. First, that we might not want to underestimate Google and Asabis, and that adding AlphaGo type systems probably will work. And second, based on his comments, I do think there needs to be more clarity on just how much of Google DeepMind’s workforce is working on these evaluations and preemptive measures.

This article from a few months ago estimates that there may be less than 100 researchers focused on those areas. Out of thousands, so is it even 5% of the total? And if not, how can we take too seriously the commitments at any AI summit such as the one happening this autumn in the UK on safety? On the other hand, if Asabis revealed that half or more of his workforce were on the case, then we could be more confident that the creators of AlphaGo and my fellow Londoners had a good chance of researching to safety and success. As always, thank you so much for listening and have a wonderful day.

--

--

Sushil Kumar 💝❤️

Hello, I'm someone who loves to share knowledge about side hustles. I've attempted successful side business there is, I'll speak about them all, which possible.