Implementing RNN to create a language model using Sequence-to–Sequence approach using Keras in Jupyter Notebook

Ayman Shams
3 min readNov 27, 2017

This article is the beginning of a series of stories. This article will consist of how to implement a basic character-level sequence to sequence model. This concept was applied to translate short English sentences to Bengali sentences, character-by-character. This is a fairly odd way to translate a sentence, as word level translation is more common to do.

This work is based on Francois Chollet’s :

Ten minute sequence to sequence learning in Keras

Dependencies assumed installed:

  • Python 3.6
  • Scikit-learn, Pandas, NumPy, Matplotlib
  • Keras 2.0
  • Either Theano or Tensorflow backend

Several key steps are involved in the overall process

Step 1: Import models required for this project

from __future__ import print_functionfrom keras.models import Model
from keras.layers import Input, LSTM, Dense
import numpy as np

Next we have define the batch size, the number of times we want to code to run, the number of samples etc . Generally, 100 epochs gives reasonable results, but it was seen but this corpus that at 150 epochs it gave better results.

batch_size = 64  # Batch size for training.
epochs = 150 # Number of epochs to train for.
latent_dim = 256 # Latent dimensionality of the encoding space.
num_samples = 10000 # Number of samples to train on.

At this point, it is important to note that the Jupyter Notebook be run from the path where the data is stored for ease of access.

data_path = 'fra-eng/fra.txt'

Step 2: Define input sequence and process it

The English sentences were used as inputs. The encoder LSTM is used to convert the input sentences into two state vectors, state h and state c. We are only bothered with the LSTM states and discard the outputs.

encoder_inputs = Input(shape=(None, num_encoder_tokens))
encoder = LSTM(latent_dim, return_state=True)
encoder_outputs, state_h, state_c = encoder(encoder_inputs)
# We discard `encoder_outputs` and only keep the states.
encoder_states = [state_h, state_c]

Step 3: Setup the decoder, using encoder states as initial state

The decoder LSTM is trained to turn the target sequences into the same sequence but offset by one time step in the future, a training process called the teacher forcing in this context. It uses as initial state the state vectors from the encoder. Effectively, the decoder learns to generate ‘targets[t+1….]’ given ‘targets[….t]’, conditioned on the input sequence.

decoder_inputs = Input(shape=(None, num_decoder_tokens))
# We set up our decoder to return full output sequences,
# and to return internal states as well. We don't use the
# return states in the training model, but we will use them in inference.
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs,
initial_state=encoder_states)
decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

Step 4: Define model that will turn encoder and decoder input data to decoder target data

model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

Step 5: Run Training

model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
model.fit([encoder_input_data, decoder_input_data], decoder_target_data,
batch_size=batch_size,
epochs=epochs,
validation_split=0.2)
# Save model
model.save('s2s.h5')

Step 6: Inference Mode

encoder_model = Model(encoder_inputs, encoder_states)decoder_state_input_h = Input(shape=(latent_dim,))
decoder_state_input_c = Input(shape=(latent_dim,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs, state_h, state_c = decoder_lstm(
decoder_inputs, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = Model(
[decoder_inputs] + decoder_states_inputs,
[decoder_outputs] + decoder_states)

Step 7: Implement inference loop

def decode_sequence(input_seq):
# Encode the input as state vectors.
states_value = encoder_model.predict(input_seq)
# Generate empty target sequence of length 1.
target_seq = np.zeros((1, 1, num_decoder_tokens))
# Populate the first character of target sequence with the start character.
target_seq[0, 0, target_token_index['\t']] = 1.
# Sampling loop for a batch of sequences
# (to simplify, here we assume a batch of size 1).
stop_condition = False
decoded_sentence = ''
while not stop_condition:
output_tokens, h, c = decoder_model.predict(
[target_seq] + states_value)
# Sample a token
sampled_token_index = np.argmax(output_tokens[0, -1, :])
sampled_char = reverse_target_char_index[sampled_token_index]
decoded_sentence += sampled_char
# Exit condition: either hit max length
# or find stop character.
if (sampled_char == '\n' or
len(decoded_sentence) > max_decoder_seq_length):
stop_condition = True
# Update the target sequence (of length 1).
target_seq = np.zeros((1, 1, num_decoder_tokens))
target_seq[0, 0, sampled_token_index] = 1.
# Update states
states_value = [h, c]
return decoded_sentencefor seq_index in range(100):
# Take one sequence (part of the training test)
# for trying out decoding.
input_seq = encoder_input_data[seq_index: seq_index + 1]
decoded_sentence = decode_sequence(input_seq)
print('-')
print('Input sentence:', input_texts[seq_index])
print('Decoded sentence:', decoded_sentence)

We have obtained some correct translations of the English sentences into Bengali.

Input sentence: Go.
Decoded sentence: যাও।
-
Input sentence: Stop!
Decoded sentence: থাম!
-
Input sentence: Ask Tom.
Decoded sentence: টমকে জিজ্ঞাসা করো।
-
Input sentence: Go away.
Decoded sentence: চলে যাও।
-
Input sentence: Help us.
Decoded sentence: আমাদের সাহায্য করুন।

The full code can be obtained here:

--

--

Ayman Shams

Evolutionary algorithms and generative modelling. I like connecting dots.