Simple Reinforcement Learning with Tensorflow Part 0: Q-Learning with Tables and Neural Networks
Arthur Juliani

Hej Juliani

First of all thank you for your nice tutorial, we are trying to reuse your code in away that, it solves the Mountain car. we are a group and totally new to machine learning, please find our code below. Thank you in advance.

import gym
import random
import tensorflow as tf
import numpy as np

n_nodes_hl1 = 5
n_nodes_hl2 = 5
n_classes = 1
env = gym.make(“MountainCar-v0”)
#place holders for inpute and actual values our data set
inputs1 = tf.placeholder(‘float’, [2, None])

#y = tf.placeholder(‘float’)

hidden_1_layer = {‘weights’:tf.Variable(tf.random_normal([2, n_nodes_hl1])),

hidden_2_layer = {‘weights’:tf.Variable(tf.random_normal([n_nodes_hl1, n_nodes_hl2])),

output_layer = {‘weights’:tf.Variable(tf.random_normal([n_nodes_hl2, n_classes])),

l1 = tf.add(tf.matmul(inputs1,hidden_1_layer[‘weights’]), hidden_1_layer[‘biases’])
l1 = tf.nn.elu(l1)

l2 = tf.add(tf.matmul(l1,hidden_2_layer[‘weights’]), hidden_2_layer[‘biases’])
l2 = tf.nn.elu(l2)

Qout = tf.matmul(l2,output_layer[‘weights’]) + output_layer[‘biases’]

predict = tf.argmax(Qout,1)

nextQ = tf.placeholder(shape=[1,2],dtype=tf.float32)
loss = tf.reduce_sum(tf.square(nextQ — Qout))
trainer = tf.train.GradientDescentOptimizer(learning_rate=0.1)
updateModel = trainer.minimize(loss)

init = tf.initialize_all_variables()

# Set learning parameters
y = .99
e = 0.1
num_episodes = 2000
#create lists to contain total rewards and steps per episode
jList = []
rList = []

with tf.Session() as sess:
 for i in range(num_episodes):
 #Reset environment and get first new observation
 s = env.reset()
 rAll = 0
 d = False
 j = 0
 #The Q-Network
 while j < 99:
 a,allQ =[predict,Qout],feed_dict={inputs1:s})
 if np.random.rand(1) < e:
 a[0] = env.action_space.sample()
 #Get new state and reward from environment
 s1,r,d,_ = env.step(a[0])

#Obtain the Q’ values by feeding the new state through our network
 Q1 =,feed_dict={inputs1:s1})
 #Obtain maxQ’ and set our target value for chosen action.
 maxQ1 = np.max(Q1)
 targetQ = allQ
 targetQ[0,a[0]] = r + y*maxQ1
 print( r + y*maxQ1)
 #Train our network using target and predicted Q values
 _,W1 =[updateModel,W],feed_dict={inputs1:s,nextQ:targetQ})
 rAll += r
 s = s1
 if d == True:
 #Reduce chance of random action as we train the model.
 e = 1./((i/50) + 10)
print “Percent of succesful episodes: “ + str(sum(rList)/num_episodes) + “%”

The error we are getting so far is

ValueError: Cannot feed value of shape (2,) for Tensor u’Placeholder:0', which has shape ‘(2, ?)’

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.