Semantic analysis in React Native using Tensorflow

Published in

dogtronic

7 min readJun 20, 2022

Artificial intelligence and machine learning are extremely popular concepts in recent times. Hence, no wonder that more and more cases of using their capabilities directly on mobile devices. Among the libraries that allow the use of machine learning models in React Native applications, we can find:

Tensorflow.js
PyTorch Live

Machine learning can be used in this case for a variety of tasks related to image recognition, text analysis, or event and value prediction. Transferring such functionalities directly to mobile applications can significantly simplify the architecture of the systems by moving part of the calculations directly to the end user’s device.

Semantic analysis

Among natural language processing (NLP) tasks, we can highlight classification. A machine learning model (e.g. using an artificial neural network) can adapt its results depending on the input data (so-called learning by experience). The classification must follow certain rules, which the algorithm learns itself during the training process.

Let us look at the following problem. In a movie review site, we want to appropriately label user reviews in an automatic way, i.e., whether a given review is positive or negative. This is an example of binary classification — a text can have either positive or negative overtones, hence we can assign a certain class to it. However, to be able to fully understand the context and meaning of words, semantic (contextual) analysis is necessary. In the case of ourselves, we are able to extract the meaning of words based on our own cognitive methods. In the case of machine learning algorithms, however, it is necessary to indicate what connotations a given text may contain, e.g., the environment of a given word becomes important information (i.e., in the environment of which words it is most often found). To solve this problem in practice, we can use a pre-trained convolutional neural network model directly from the Tensorflow library pages. This model was trained based on user reviews from the IMDB website. In order for the textual input data to be understood by the algorithm, it must be stored in an appropriate structure.

This article does not explain how convolutional layers or machine learning algorithms work. You can learn more about artificial neural networks in a great book by Aurélien Géron: Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems.

Preparation of the model

The first step will be to load the model and metadata needed to prepare the input data. Let’s prepare a new React Native project and install and import Tensorflow into any component.

For details on how to install Tensorflow in your React Native application, see: https://github.com/tensorflow/tfjs/blob/master/tfjs-react-native/README.md

import * as tf from '@tensorflow/tfjs';
import '@tensorflow/tfjs-react-native';

Example of convolutional layers in a neural network, source: https://www.ibm.com/cloud/learn/convolutional-neural-networks

The model is available at the following links:

const source = {
  model: 'https://storage.googleapis.com/tfjs-models/tfjs/sentiment_cnn_v1/model.json',
  metadata: 'https://storage.googleapis.com/tfjs-models/tfjs/sentiment_cnn_v1/metadata.json'
};

Then we can load the needed data into memory (using the hooks useRef and useEffect).

const metadata = useRef<any>();
const model = useRef<tf.LayersModel>();  
  
const loadModel = async () => {
    try {
        await tf.ready();   
        model.current = await tf.loadLayersModel(source.model); 
        const metadataJson = await fetch(source.metadata);
        metadata.current = await metadataJson.json();
    } catch (err) {
    }
}useEffect(() => {
    loadModel();
}, []);

Data preparation

Suppose we want to evaluate whether the sentence, “This movie is bad” is a positive or negative opinion. It is important to do the preprocessing of the text first. To simplify the calculation, the text should be written in small characters and punctuation marks should be removed. To do this, we can use the ready-made Tokenizr library, which allows us to perform the tokenization process.

const tokenize = (text: string) => {
    let lexer = new Tokenizr();
    
    lexer.rule(/[a-zA-Z_][a-zA-Z0-9_]*/, (ctx, match) => {
      ctx.accept("id");
    });
    lexer.rule(/[+-]?[0-9]+/, (ctx, match) => {
      ctx.accept("number", parseInt(match[0]));
    });
    lexer.rule(/"((?:\\"|[^\r\n])*)"/, (ctx, match) => {
      ctx.accept("string", match[1].replace(/\\"/g, "\""));
    });
    lexer.rule(/\/\/[^\r\n]*\r?\n/, (ctx, match) => {
      ctx.ignore();
    });
    lexer.rule(/[ \t\r\n]+/, (ctx, match) => {
      ctx.ignore();
    });
    lexer.rule(/./, (ctx, match) => {
      ctx.accept("char");
    });lexer.input(text);
    return lexer.tokens().map(v => v.value);
}

After executing the function with the above sentence passed as a parameter, we will get an array of tokens:

[“this”, “movie”, “is”, “bad”]

You can find the Tokenizr library at: https://github.com/rse/tokenizr.

For the algorithm, the words themselves don’t really matter — they need to be encoded as numbers beforehand, in a way that makes them easier to process. We can do this using Bag of Words notation, TF-IDF or apply word embeddings in vector space. The trained model from the set of examples was prepared using the word2vec algorithm. In order to represent words as numbers, we need to use a pre-prepared metadata structure, which contains a set of words with their associated indexes.

const OOV_INDEX = 2;
const sequence = inputText.map(word => {
    let wordIndex = metadata.current.word_index[word] + metadata.current.index_from;    if (wordIndex > metadata.current.vocabulary_size) {
        wordIndex = OOV_INDEX;
    }    return wordIndex;
});

OOV_INDEX is the index of an unknown word that was not used when the algorithm was trained. Any word that appears in the feedback that was not in the word corpus will just be assigned an OOV_INDEX value.

After this process, our coded sentence will be of the form:

[14, 22, 9, 78]

The input for a neural network must have a fixed size (called frame size). In the case of the model from the example, this value is equal to 100. Hence, we must additionally write the above sentence in the form of a hundred-element array of numbers. To do this, we can use a simplified function padSequences, which will fill the missing spaces with zeros, or in the case of text larger than one hundred words, divide them into appropriate hundred-element frames.

const padSequences = (sequences: number[][], maxLen: number) => {
    return sequences.map(seq => {
        if (seq.length > maxLen) {
            seq.splice(0, seq.length - maxLen);
        }
    
        if (seq.length < maxLen) {
            seq = Array(maxLen - seq.length).fill(0).concat(seq);
        }
    
        return seq;
    });
}

The function must then be called:

const paddedSequence = padSequences([sequence], metadata.current.max_len);

After executing the function, the data will look as follows:

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 14, 22, 9, 78]

Tensors

Tensorflow is a library that uses data flow graphs (directed graphs) in which vertices store information about mathematical operations or data exchanges, and edges are used to represent the flow of that data (output-input relationships).

These are represented as multidimensional data matrices — tensors. Each tensor can be activated asynchronously, in parallel with the activation of other tensors.

Example representation of a 3-dimensional tensor, source: https://www.tensorflow.org/guide/tensor

For the data to be an input for the neural network, it must be stored as a two-dimensional tensor:

const input = tf.tensor2d(paddedSequence, [1, metadata.current.max_len]);

Prediction

Prepared data can already be the input for a convolutional neural network (i.e., a pre-trained model). After the classification process we will get the probability value of our sentence belonging to the selected class (it will be in the range [0,1]). The higher the value, the higher the probability that the sentence will be positive. So we can assume that if the score value will be in the range [0.5, 1], then the text will be positive, and otherwise negative.

const predictOut = model.current.predict(input) as tf.Tensor<tf.Rank>;const score = predictOut.dataSync()[0];
predictOut.dispose();
setValue(score >= 0.5 ? 'positive' : 'negative')

Once the final value is known, to save memory, the previously built tensor should be deleted, so we use the dispose() method.

Now let’s look at how this works in our application.

Conclusions

Just as you can see from the above example, applying machine learning models directly to a mobile application is not a complicated task. The ability to transfer a pre-trained model to the device, allows you to efficiently use the benefits of ML for sophisticated and non-trivial tasks.

Sources

https://www.tensorflow.org/js
https://blog.tensorflow.org/2020/02/tensorflowjs-for-react-native-is-here.html
https://github.com/rse/tokenizr

You can find the Polish version of the article here: https://dogtronic.io/analiza-semantyczna-w-react-native/

Find us on our Dogtronic website.