Entrepreneur, Software guy, Opiner, I dare to think different…

See more

… positions that led to victory and slightly lower values for the game positions that led to defeat. This process is repeated thousands of times: two copies of AlphaZero play each other, the value function is updated, two new copies (with slightly improved value functions) play each other, the value function is updated further, etc. After 300,000 generations of neural network improvement…

…k the opposite approach. It doesn’t have any heuristics built into its value function. Instead, the value function is learned entirely from scratch using a neural network model. The network starts with a value function that simply assigns random values to each state. Then two…