Google Is Using AI to Build Phone Chips. Let’s Explain How.
Let’s explore how Google is using reinforcement learning to build its new chips. In English.
Google’s Pixel line of phones has never been the flashiest of flagship phones. They’ve never claimed to have the fastest hardware or the newest camera for that matter. Instead, pixel phones have continually relied on Google’s comparative advantage in the use of software and AI to improve the user experience and make the most of their limited hardware. In fact, pixel phones have not had their camera hardware updated in 3 generations, despite boasting one of the best phone cameras in the smartphone market today. That’s why when reports came out that Google is using AI and reinforcement learning to design it’s in-house chips for the new Pixel, it was no surprise to anyone already familiar with Google’s phones.
In this article, we will explain a branch of machine learning known as reinforcement learning and cover how Google is reinforcement learning to build new chips in less time than ever before. In English
How chips are traditionally made
As part of the traditionally chip-making process, engineers spend months creating a “floorplan” for the chip. A floorplan is the layout of a chip’s subsystems, such as a GPU or CPU and the millions of connections that link them together. The process is complex and involves advanced software to help engineers visualize how the pieces will fit together on a chip die. Differences in layouts between the same components can have a profound effect on a chip’s performance as even the smallest of differences in placement can make a significant impact on overall performance.
Now, thanks to AI and reinforcement learning, engineers can simply give Google’s algorithm thousands of examples of what makes an efficient floorplan, give it some time to learn on its own, and cut down the time to create floorplans from month to mere hours.
Reinforcement Learning Explained
Reinforcement learning is a branch of machine learning that allows an AI agent to learn from experience and adapt its behavior to maximize the total reward it receives. In reinforcement learning problems, there is the agent, the environment, and the reward. The agent interacts with the environment by measuring its current state in the environment and taking certain actions that generate a reward. Reinforcement learning algorithms rely on trial and error to increase their understanding of how interactions affect their overall progress within an environment. As the agent tries different actions and receives the corresponding rewards, it increases the training data for the reinforcement learning model. In the world of machine learning, this is considered semi-supervised learning.
Compared to traditional supervised learning problems, reinforcement learning problems are more challenging as they involve delayed rewards. For example, if an agent received an reward for every action that they take, this would be considered a supervised learning problem. In reinforcement learning problems, however, rewards will not be available immediately after taking an action. Rewards are only received sometime in the future. This lack of immediate feedback can cause a delay before the agent learns how to maximize its total reward.
A standard example for reinforcement learning is the environment of a computer game. The agent in this case is the player, it interacts with the environment by moving around and interacting with objects in the game world. The reward for interacting with an object is dependent on how it affects your overall progress within the game. Ultimately, by the end of the game, the player will receive a reward for winning the game. However, the player does not know how interacting with specific objects in the game world will affect its overall progress or whether or not it will cause it to win or lose. Ultimately, this can be a hard problem to solve as without knowing about the feedback at each step of the process, it can be difficult for the agent to learn how best to interact with objects to maximize its reward.
The ultimate challenge of reinforcement learning problems is to design an optimal policy, which tells the agent how to best interact with the environment based on the agent’s uncertainty about what actions are best at that time. An example of a policy would be to take action A if the agent is in state s and action B if the agent is in state s. This policy represents a set of actions that the agent can choose from and each action will be executed in turn until one results in a certain state. In most reinforcement learning problems, there are multiple states, and the agent will choose actions depending on the current state of how it interacts with the environment. Then, based upon what it has learned previously about how to maximize its overall reward, it tries different sets of actions until it finds one that maximizes its total reward.
For anyone who wants to dive deeper into this subject, I highly recommend watching this video:
Deep Reinforcement Learning
For complex problems, the optimal policy in a reinforcement learning problem is very difficult to compute, even if we do have highly precise data about the relationships among states, actions, and rewards. Thus, it is impractical to learn an optimal policy for most real-world problems. That’s where neural networks come in.
Neural networks are special types of computational learning models that are based on the observation that neurons in the brain work similarly to those in digital computing machines, such as computers and mobile phones. Without giving a full run-down of how neural networks work, neural networks are important because they’re great function approximators. In the context of reinforcement learning, neural networks can approximate an optimal policy and minimize the distance (or error metric) between this approximation and the optimal policy given enough data.
The key to making neural networks function optimally in reinforcement learning is to train them with examples of state-action pairs. A state-action pair being a situation where the agent is in a specific state and takes a specific action. In essence, these examples are samples of the rewards that the agent received after taking certain actions. By using this data to train the neural network, it will gradually learn how to make the best possible choices (or policies) based on state-action pairs. In order to provide a useful approximation of an optimal policy, we need to feed it thousands of samples or examples and let it learn from them on its own.
For example, AlphaGo was initially trained to mimic human play by attempting to match the moves of expert players from recorded historical games, using a database of around 30 million moves. To then improve AlphaGo to the point where it can defeat even the best human players, they continued to train the model using reinforcement learning by simulating millions of different state-action pairs and feeding information about the respective rewards back into the model to adjust the policy function. Imagine if we took the best Go player in the world and gave them all the time in the world to play against him/herself in Go millions of times. How much better would that player become? That is, in essence, how AlphaGo became better at Go than any human player in the world.
Back to Google
How is Google using this technology to help build chips? Engineers at Google compare the floorplanning process of building chips to that of playing a board game. You have to think about how all the pieces are going to fit together. It’s not a simple task; chips can include millions of parts that all must be connected properly. Differences between layouts lead to differences in performance and efficiency. In fact, given the scale of the chip manufacturing process and the billions of computational cycles that occur in a second, nanometer length differences in placement can lead to drastic differences in performance and efficiency.
Google trained the model with 10,000 examples of different chip designs, each with a score as a function of different success metrics such as efficiency and performance. The model was then tasked with optimizing its policy to place the chip components as efficiently as possible such that the total reward is maximized. Just like in a game of Go, the AI model has a game board (silicon die), game pieces (computer components), a set of moves (different locations to place components) and a win condition (finding the most efficient layout). Google believes that this approach can help overcome the limitations of traditional chip design tools, which might require an enormous amount of human effort and cost to design custom chips.
While this technology is still in its early stages, there are many potential benefits to using AI at all stages in product development. Imagine if a startup could use RL neural networks in their product design process to build the best products possible with the least amount of customer waste. Or perhaps a car manufacturer could use an RL neural network to optimize their production schedule to maximize efficiency. The applications for this technology are endless, and in many ways, it is only just beginning to be discovered.