Paper Implementation — Learning Text Similarity with Siamese Recurrent Networks

3 min readAug 26, 2023

Siamese neural network is deep learning architecture that is used for measuring similarity or dissimilarity between pairs of sequential data. This network contain two or more identical subnetworks, which are sharing weights and parameters. This architecture particularly in in text/image similarity task.

In this blog first we go through Learning Text Similarity with Siamese Recurrent Networks paper and then implement this paper using python

Paper url — https://aclanthology.org/W16-1617.pdf

Introduction

The paper Learning Text Similarity with Siamese Recurrent Networks proposes a Siamese recurrent network architecture for learning a similarity metric on variable length character sequences. The model combines a stack of character-level bidirectional LSTM’s with a Siamese architecture. This model learn similarity between pairs of strings.

Model Architecture

The training set for a Siamese network consists of triplets (𝑥1,𝑥2,𝑦), where 𝑥1 and 𝑥2 are character sequences and 𝑦∈0,1 indicates whether x1 and x2 are similar (𝑦=1) or dissimilar (𝑦=0). The aim of this model maximize the distance if input pair are similar and minimize the distance for dissimilar pair.

Model contains four BLSTM(Bidirectional Long Short-Term Memory Networks) layers with 64-dimensional hidden vectors ℎ𝑡 and memory 𝑐𝑡.
The output of LSTM layer are averaged over time and this 128-dimensional vector is used as input to a dense feedforward layer.
Use Contrastive loss function.
The parameters of the model are optimized using the Adam method
Use the dropout technique on the recurrent units (with probability 0.2) and between layers (with probability 0.4) to prevent overfitting.

Contrastive loss function — The contrastive loss function is a key component in training this network. Contrastive loss function is to define a metric that measures how well the network is able to differentiate between similar and dissimilar pairs.

Let 𝑓𝑊(𝑥1) and 𝑓𝑊(𝑥2) be the projections of 𝑥1 and 𝑥2 in the embedding space computed by the network function 𝑓𝑊.

Energy of the model EW to be the cosine similarity between the embeddings 𝑥1and 𝑥2.

Total loss function over a dataset.

Instance loss function -

The loss functions for the similar and dissimilar cases -

The code of Siamese Recurrent Networks is available here

def euclidean_distance(vects):
    x, y = vects
    sum_square = tf.math.reduce_sum(tf.math.square(x - y), axis=1, keepdims=True)
    return tf.math.sqrt(tf.math.maximum(sum_square, tf.keras.backend.epsilon()))

inp_seq = layers.Input(shape=(MAX_SEQ_LEN,))
x = layers.Embedding(num_word,output_dim=16,mask_zero=False)(inp_seq)
x = layers.BatchNormalization()(x)
x = layers.Bidirectional(layers.LSTM(64,return_sequences=True))(x)
x = layers.Bidirectional(layers.LSTM(64,return_sequences=True))(x)
x = layers.GlobalAveragePooling1D()(x)
x = layers.Dense(128)(x)

embed_network = keras.Model(inp_seq, x)

inp_seq1 = layers.Input(shape=(MAX_SEQ_LEN,))
inp_seq2 = layers.Input(shape=(MAX_SEQ_LEN,))

network1 = embed_network(inp_seq1)
network2 = embed_network(inp_seq2)

merge = layers.Lambda(euclidean_distance)([network1, network2])
merge = layers.BatchNormalization()(merge)
out = layers.Dense(1,activation='sigmoid')(merge)

model = Model(inputs=[inp_seq1,inp_seq2],outputs = out)


def loss(margin=1):
    def contrastive_loss(y_true, y_pred):
        y_pred = tf.cast(y_pred, tf.float32)
        y_true = tf.cast(y_true, tf.float32)
        square_pred = tf.math.square(y_pred)
        margin_square = tf.math.square(tf.math.maximum(margin - (y_pred), 0))
        return tf.math.reduce_mean(
            (1 - y_true) * square_pred + (y_true) * margin_square
        )
    return contrastive_loss

margin =1
model.compile(loss=loss(margin=margin), optimizer="adam", metrics=["accuracy"])
model.summary()

model.fit([train_text2seq_1,train_text2seq_2,],train_label,epochs=5,batch_size=16,verbose=1,
          validation_data=([test_text2seq_1,test_text2seq_2],test_label))

References

Paul Neculoiu, Maarten Versteegh and Mihai Rotaru Textkernel B.V. Amsterdam. Learning Text Similarity with Siamese Recurrent Networks.

Paper Implementation — Learning Text Similarity with Siamese Recurrent Networks

Introduction

Model Architecture

References

Written by Ujjalkumarmaity