OpenAi Request For Research XOR Problem

Arnav Dantuluri
4 min readDec 26, 2021

--

OpenAi released a couple of research problems that they wanted solved and as an avid learner of Machine Learning I decided to tackle their warmup problem. You can view the rest of their requests here https://openai.com/blog/requests-for-research-2/

The problem is to Train an LSTM to solve the XOR problem: that is, given a sequence of bits, determine its parity. The LSTM should consume the sequence, one bit at a time, and then output the correct answer at the sequence’s end. Test the two approaches below:

  • Generate a dataset of random 100,000 binary strings of length 50. Train the LSTM; what performance do you get?
  • Generate a dataset of random 100,000 binary strings, where the length of each string is independently and randomly chosen between 1 and 50. Train the LSTM. Does it succeed? What explains the difference?

The XOR Problem:

What it the XOR problem? In simple terms it is a binary classification problem where the input data consists of a vector of binary numbers and the target consists of a 0 or a 1.

Image adapted from Kevin Swingler.

Problem 1:

import numpy as npdef gen_len50(size=100000):X_data_1 = []Y_data_1 = []for i in range(size):length = random.randint(1, 50)data = np.random.randint(2, size=(1, length)).astype("float32")data = pad_sequences(data, maxlen=50, dtype='float32',       padding='pre')X_data_1.append(data)labels = [0 if np.sum(X_data_1[i])%2 == 0 else 1]Y_data_1.append(labels)return X_data_1, Y_data_1

To generate the dataset we import numpy as iterate through the for loop 100000 times to generate a list of length 100000 each consisting of a binary string of a constant length of 50. Each data element has a corresponding label which is either 0 or 1.

We then define a very simple LSTM network using Keras, a very popular and simple to use Machine Learning Library.

model = Sequential([LSTM(32, return_sequences=True, activation='sigmoid', input_shape=(1, 50)),LSTM(64, return_sequences=True, activation='sigmoid'),Dense(1, activation='sigmoid')])

We split the Dataset into training and testing datasets to evaluate the model after training using a split size of 20%

split_size = 100000 * 0.20X_train = X_data1[split_size:]X_test = X_data1[:split_size]Y_train = Y_data1[split_size:]Y_test = Y_data1[:split_size]

We compile the model using Adam as an optimizer, Binary Cross entropy as a loss due to the problem being a binary classification problem and keeping track of the accuracy.

model.compile('adam', loss='binary_crossentropy', metrics=['acc'])

We convert the list into numpy arrays and normalize the data to optimize training.

X_train = np.asarray(X_train)norm = np.linalg.norm(X_train)
normal_array = X_train/norm
print(normal_array)

We do this for all the training and testing lists

history = model.fit(X_train, Y_train, epochs=50, batch_size=32, shuffle=True)

We then train the model on the normalized X and y values for 50 epochs. Not nearly enough to optimize an LSTM model ,which are infamous for taking an extremely long time to train, but enough to draw conclusions between problem 1 and 2.

def plot_model(history):''' Plot model accuracy and lossArgs:history: Keras dictionary contatining training/validation loss/accReturns:Plots model's training/validation loss and accuracy history'''loss = history.history['loss']epochs = range(1, len(loss) + 1)
plt.figure()plt.plot(epochs, loss, 'b', label='Training loss')plt.title('Training loss')plt.xlabel('Epochs')plt.ylabel('Loss')plt.legend()plt.figure()acc = history.history['acc']plt.plot(epochs, acc, 'bo', label='Training acc')plt.title('Training accuracy')plt.xlabel('Epochs')plt.ylabel('Loss')plt.legend()plt.show()return

We can then plot the accuracy and loss and draw conclusions from that

Author’s Image of Loss and Accuracy for 1st model

The accuracy shows a constant upward trend but a slow curve to the end leading to a stagnating accuracy. Evaluating the model with the test data leads to an accuracy of 51%. Not great but not too bad considering the highest accuracy is 56%.

Problem 2:

Problem 2 says to use the same technique but to change the length of the binary string to a random length from 1 to 50. We facilitate this change by simply adding an extra line to generate a random integer from 1 to 50 and use that as the length rather than the constant 50 we also add padding in front to convert all lengths to 50

def gen_data(size=100000):X_data_1 = []Y_data_1 = []for i in range(size):length = random.randint(1, 50)data = np.random.randint(2, size=(1, length)).astype("float32")data = pad_sequences(data, maxlen=50, dtype='float32', padding='pre')X_data_1.append(data)labels = [0 if np.sum(X_data_1[i])%2 == 0 else 1]Y_data_1.append(labels)return X_data_1, Y_data_1

We use the same LSTM model that we used previously to facilitate comparison between them.

model = Sequential([LSTM(32, return_sequences=True, activation='sigmoid', input_shape=(1, 50)),LSTM(64, return_sequences=True, activation='sigmoid'),Dense(1, activation='sigmoid')])

Using the same techniques we split the data

split_size = 100000 * 0.20X_train = X_data1[split_size:]X_test = X_data1[:split_size]Y_train = Y_data1[split_size:]Y_test = Y_data1[:split_size]

And also compile the model

model.compile('adam', loss='binary_crossentropy', metrics=['acc'])

And also normalize the data and convert it into numpy arrays

X_train = np.asarray(X_train)norm = np.linalg.norm(X_train)
normal_array = X_train/norm
print(normal_array)

We then train the model

history = model.fit(X_train, Y_train, epochs=50, batch_size=32, shuffle=True)

Again we plot and analyze the performance of the model

Author’s Image For Loss And Accuracy of 2nd model

This time the model reaches an accuracy of almost 60% and when evaluated performed at an accuracy of 57% which is great!

This however is a very impractical way to solve the problem which would easily be solved using a dense neural network or even a simple recurrent neural network rather than a more complex LSTM and the only reason I solved it this way was because it was specifically requested in the problem.

Code For the Project: https://github.com/arnavdantuluri/XOROpenAi.git

--

--

Arnav Dantuluri

I am 15 years old and am extremely interested in Machine Learning