From Ballerina to AI Researcher: What’s Next?

Results of the OpenAI scholarship during the last three months, thoughts on the future and my final project

Sophia Aryan
buZZrobot
4 min readAug 31, 2018

--

Update to the post:

Here’s the latest version of my final project with the full code implementation.

Feel free to implement it for your needs.

Initial blog post

The week of Burning Man! Most of my friends are enjoying pure interaction with fellow humans. To be honest, I’ve never been to BM, but I know the right moment and right people will come and the universe will work it out for me.

I asked my colleagues at OpenAI what they like about BM — they said: The coolest part of it is that nobody cares what you do in life or what your profession or your social status is. Humans see humans and are driven by genuine care for one another.

Credit to Smithsonian American Art Museum

Basically, that’s the closest representation of the world of enlightened people driven by the only real value — unconditional love for each other. Well, while the universe is arranging my life’s circumstances for me to go to BM next year, this week I’m sharing with you some of the results of my work during the summer within the scholarship program at OpenAI.

My experience as an OpenAI scholar

Summer passed insanely fast — it seems June 4th was only yesterday… So where was I then and where am I now? Three months made a huge difference. I spent most of my time and focus in the NLP research area. I started with a lot of low level implementations of common algorithms, like bag of words, word2vec embedding, and then moved to building from scratch RNNs and LSTM models. Later within the program I switched my attention to reinforcement learning and how it can be applied to NLP tasks.

I started my work with RL from the Pong Game implementation introduced by Andrej Karpathy. His code base is in Python, and since my scholarship program I’ve been working with TensorFlow and have purposefully been building my skills around the framework I’ve worked through the game using TF.

Currently, I’m working on the project “Language conditioned reinforcement learning in the Gridworld environment”. An agent (while cell) achieves target cells upon the command, either “go to green” or “go to red”. The agent architecture consists of an MLP which concatenates the command (“CMD”) and observation (“OB”). With each episode an agent and the target cells appear in random locations and an agent receives a state and task correction (see the visualization below).

Language conditioned reinforcement learning in the Gridworld environment

Sharing with you some code of the project (the implementation is in TensorFlow) so you can have a sneak peak on what I’m busy with right now.

Downloading libraries and defining hyperparameters:

Defining the Gridworld environment:

Defining the Policy:

Training loop:

The current score of the training:

I’ll continue working on the project to improve the results and bring more language conditioning as a step towards grounded language learning where I want to continue my research work. The final version will be open-sourced as a part of the OpenAI scholarship program curriculum.

I truly enjoyed the growth I experienced this summer. Thanks to the program, I became more comfortable building models from scratch, working with TensorFlow, and I have built a deeper understanding of NLP and reinforcement learning.

For the future, by combining my background in journalism and as a communicator and strengthening my technical skills in AI, I have a really good shot at being a great facilitator of the most amazing technology in the world.

--

--

Sophia Aryan
buZZrobot

Former ballerina turned AI writer. Fan of sci-fi, astrophysics. Consciousness is the key. Founder of buZZrobot.com