From Ballerina to AI Researcher: Part X

Learning to play the Pong game: Embracing Reinforcement Learning

Sophia Aryan
3 min readAug 20, 2018


Hello, readers! As usual, I share with you some of my thoughts I’ve had during the past week and my progress within the OpenAI scholarship program.

Recently, I had a somewhat philosophical conversation with a friend of mine on the importance of human life. I mean…We wake up every day, do our daily routines, interact with our fellow humans, but do we have a real understanding about what is important in our lives?

There are many definitions and motivational speeches, philosophical concepts and business books on motivation and personal growth. But I’ve always thought about how someone can use this advice at the moment of vulnerability and difficulty. I mean, it’s so easy to speak in an inspirational way, but significantly harder to actually put those ideas into practice, create and push things forward.

I’ve always kept in mind an image of a completely fearless person who is always 100% confident in any life situation. It took time for me to comprehend that it doesn’t matter how many challenges someone goes through — there always will be unseen obstacles to overcome and ‘opportunities’ to experience: fear, frustration, and lack of confidence. And we should learn to embrace such unpleasant feelings and not step back in the face of them. It’s truly important to learn how to fight and continue your path no matter what.

The universe wants us to live, to exist, to thrive. And as we continue fighting for our existence, we will build our inner strength. This is what the universe wants from us.

My progress as an OpenAI Scholar

Last week, I worked on an RL task and went through Andrej Karpathy’s blog post on RL and how to train an agent to play the Pong game. If you want to catch up on RL basics, I’d recommend that you go through this extremely useful article.

So I implemented the Pong game, but since I’m developing my skills in working with TensorFlow, I did it using the framework.

I’m sharing below some of the TF code.

Importing necessary packages to work with, setting up hyperparameters, an environment we are going to work in.

Defining functions of the policy.

Training process.

I’m still playing with hyperparameters and doing some tricks and will share the final result in my future post.



Sophia Aryan

Former ballerina turned AI writer& communicator. OpenAI alumni. Fan of astrophysics and deep conversations. Founder of BuzzRobot