OpenAi Request For Research XOR Problem
OpenAi released a couple of research problems that they wanted solved and as an avid learner of Machine Learning I decided to tackle their warmup problem. You can view the rest of their requests here https://openai.com/blog/requests-for-research-2/
The problem is to Train an LSTM to solve the XOR
problem: that is, given a sequence of bits, determine its parity. The LSTM should consume the sequence, one bit at a time, and then output the correct answer at the sequence’s end. Test the two approaches below:
- Generate a dataset of random 100,000 binary strings of length 50. Train the LSTM; what performance do you get?
- Generate a dataset of random 100,000 binary strings, where the length of each string is independently and randomly chosen between 1 and 50. Train the LSTM. Does it succeed? What explains the difference?
The XOR Problem:
What it the XOR problem? In simple terms it is a binary classification problem where the input data consists of a vector of binary numbers and the target consists of a 0 or a 1.
Problem 1:
import numpy as npdef gen_len50(size=100000):X_data_1 = []Y_data_1 = []for i in range(size):length = random.randint(1, 50)data = np.random.randint(2, size=(1, length)).astype("float32")data = pad_sequences(data, maxlen=50, dtype='float32', padding='pre')X_data_1.append(data)labels = [0 if np.sum(X_data_1[i])%2 == 0 else 1]Y_data_1.append(labels)return X_data_1, Y_data_1
To generate the dataset we import numpy as iterate through the for loop 100000 times to generate a list of length 100000 each consisting of a binary string of a constant length of 50. Each data element has a corresponding label which is either 0 or 1.
We then define a very simple LSTM network using Keras, a very popular and simple to use Machine Learning Library.
model = Sequential([LSTM(32, return_sequences=True, activation='sigmoid', input_shape=(1, 50)),LSTM(64, return_sequences=True, activation='sigmoid'),Dense(1, activation='sigmoid')])
We split the Dataset into training and testing datasets to evaluate the model after training using a split size of 20%
split_size = 100000 * 0.20X_train = X_data1[split_size:]X_test = X_data1[:split_size]Y_train = Y_data1[split_size:]Y_test = Y_data1[:split_size]
We compile the model using Adam as an optimizer, Binary Cross entropy as a loss due to the problem being a binary classification problem and keeping track of the accuracy.
model.compile('adam', loss='binary_crossentropy', metrics=['acc'])
We convert the list into numpy arrays and normalize the data to optimize training.
X_train = np.asarray(X_train)norm = np.linalg.norm(X_train)
normal_array = X_train/norm
print(normal_array)
We do this for all the training and testing lists
history = model.fit(X_train, Y_train, epochs=50, batch_size=32, shuffle=True)
We then train the model on the normalized X and y values for 50 epochs. Not nearly enough to optimize an LSTM model ,which are infamous for taking an extremely long time to train, but enough to draw conclusions between problem 1 and 2.
def plot_model(history):''' Plot model accuracy and lossArgs:history: Keras dictionary contatining training/validation loss/accReturns:Plots model's training/validation loss and accuracy history'''loss = history.history['loss']epochs = range(1, len(loss) + 1)
plt.figure()plt.plot(epochs, loss, 'b', label='Training loss')plt.title('Training loss')plt.xlabel('Epochs')plt.ylabel('Loss')plt.legend()plt.figure()acc = history.history['acc']plt.plot(epochs, acc, 'bo', label='Training acc')plt.title('Training accuracy')plt.xlabel('Epochs')plt.ylabel('Loss')plt.legend()plt.show()return
We can then plot the accuracy and loss and draw conclusions from that
The accuracy shows a constant upward trend but a slow curve to the end leading to a stagnating accuracy. Evaluating the model with the test data leads to an accuracy of 51%. Not great but not too bad considering the highest accuracy is 56%.
Problem 2:
Problem 2 says to use the same technique but to change the length of the binary string to a random length from 1 to 50. We facilitate this change by simply adding an extra line to generate a random integer from 1 to 50 and use that as the length rather than the constant 50 we also add padding in front to convert all lengths to 50
def gen_data(size=100000):X_data_1 = []Y_data_1 = []for i in range(size):length = random.randint(1, 50)data = np.random.randint(2, size=(1, length)).astype("float32")data = pad_sequences(data, maxlen=50, dtype='float32', padding='pre')X_data_1.append(data)labels = [0 if np.sum(X_data_1[i])%2 == 0 else 1]Y_data_1.append(labels)return X_data_1, Y_data_1
We use the same LSTM model that we used previously to facilitate comparison between them.
model = Sequential([LSTM(32, return_sequences=True, activation='sigmoid', input_shape=(1, 50)),LSTM(64, return_sequences=True, activation='sigmoid'),Dense(1, activation='sigmoid')])
Using the same techniques we split the data
split_size = 100000 * 0.20X_train = X_data1[split_size:]X_test = X_data1[:split_size]Y_train = Y_data1[split_size:]Y_test = Y_data1[:split_size]
And also compile the model
model.compile('adam', loss='binary_crossentropy', metrics=['acc'])
And also normalize the data and convert it into numpy arrays
X_train = np.asarray(X_train)norm = np.linalg.norm(X_train)
normal_array = X_train/norm
print(normal_array)
We then train the model
history = model.fit(X_train, Y_train, epochs=50, batch_size=32, shuffle=True)
Again we plot and analyze the performance of the model
This time the model reaches an accuracy of almost 60% and when evaluated performed at an accuracy of 57% which is great!
This however is a very impractical way to solve the problem which would easily be solved using a dense neural network or even a simple recurrent neural network rather than a more complex LSTM and the only reason I solved it this way was because it was specifically requested in the problem.
Code For the Project: https://github.com/arnavdantuluri/XOROpenAi.git