From Ballerina to AI Researcher: Part IX
OpenAI Five benchmark game and my tenth week as a scholar
This past Sunday was super fun! OpenAI held the benchmark game with Dota pros as a part of preparation for the main game at The International late in August. It’s really hard to flesh out all the emotions we experienced while preparing for the game. But it was really a moment of reunification for the entire company where everyone had made their contribution to make the benchmark game happen, from actually working on the OpenAI Five training to putting things together at the venue.
OpenAI Five won two games out of three playing against a team of top-99.95th percentile pro Dota players that included Blitz, Cap, Fogged, Merlini, and MoonMeander. The human team won game three after the audience was asked to adversarially select OpenAI Five’s heroes.
By the way, for this game the team revealed a new OpenAI Five capability — drafting, an extremely challenging part of Dota as heroes interact with each other in complex ways.
Since the compute plays a crucial role in AI advancement, we estimated it to train various Dota systems:
- 1v1 model: 8 petaflop/s-days
- June 6th model: 40 petaflop/s-days
- Aug 5th model: 190 petaflop/s-days
Here you can read the blog post with more highlights of Sunday’s game.
My tenth week as an OpenAI scholar
I’ve spent the last week working on building an LSTM model for a classification task (sharing here the major parts of it).
LSTM preprocessing.
The model itself.
Running the model.
Working on LSTM is a part of my learning process in the NLP domain. Besides that, I’m looking carefully at interesting papers as inspiration for my future work. I’d like to highlight the DeepMimic paper released by the Berkeley AI Lab. The paper discuss how to apply deep reinforcement learning to train an agent to do physical-based movements like martial arts. As as result, authors train a humanoid how to performing a cartwheel or Atlas robot to do a spinkick. I highly recommend this paper to anyone who is interested in deep RL and planning to work with motion-capture datasets for physically simulated characters.
Also, I’ve continued my education in RL, and as a part of that learning process I’ve referred to the following papers, articles, and courses:
Proximal Policy Optimization Algorithms
Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control
Lessons learned reproducing a deep reinforcement learning paper
If you have any questions or comments, feel free to ping me. You can learn more about me at Twitter.
You can check out my previous articles here:
From Ballerina to AI Researcher: Part VIII
From Ballerina to AI Researcher: Part VII
From Ballerina to AI Researcher: Part VI
From Ballerina to AI Researcher: Part V
From Ballerina to AI Researcher: Part IV
From Ballerina to AI Researcher: Part III