Saving and Uploading a TensorFlow.js Speech model

5 min readApr 28, 2020

TensorFlow.js provides pre-trained models for speech recognition among others.
In this article, we will look at the steps that are required to

Saving a TensorFlow.js speech model to the file system from the browser
Uploading a saved model from file system to the browser to perform speech recognition

Saving a TensorFlow.js model to the file system

Usually the need to download a model arises when we train a transfer model using TensorFlow.js in the browser to recognize new words specific to our application needs. Once the transfer model is trained, we would like to have this model saved for later use, ideally in the user’s file system. This way we could later upload and use this model in our applications.

Save a pre-trained model available as part of the TensorFlow.js library

Let’s attempt to save the default pre-trained model that we get when we instantiate a speech recognizer using the following lines of code. The below code initializes a speech model that was trained on a large dataset and made available as part of the TensorFlow.js library.

import * as speechCommands from '@tensorflow-models/speech-commands' const baseRecognizer = speechCommands.create('BROWSER_FFT');

Let’s first take a look at the summary of the underlying model of this speech recognizer. The below line should print the model’s summary to the browser’s console.

baseRecognizer.model.summary();

We will make use of the model.save() function to download the model. Execute the below line of code to download this model and its weight to the local file system of the user. The model file is a JSON file containing the details of model architecture and a reference to the model’s weight binary file.

baseRecognizer.model.save('downloads://my-model');

The files should be downloaded to the “Downloads” directory in the user’s system. Two files will get downloaded — a model file namely “my-model.json” and the weight’s binary file namely “my-model.weights.bin”.

If we don’t pass the download scheme parameter to the model.save() function, then by default, the model would be saved in the IndexedDB within the browser. Please refer to the References section to learn more about the different schemes that are available to download a model.

Save a transfer model trained by the user

The below code creates a transfer model and trains on few custom words.

import * as speechCommands from '@tensorflow-models/speech-commands' const baseRecognizer = speechCommands.create('BROWSER_FFT'); 
await baseRecognizer.ensureModelLoaded(); const transferRecognizer = baseRecognizer.createTransfer('mywords'); await transferRecognizer.collectExample('hello'); 
await transferRecognizer.collectExample('hi'); 
await transferRecognizer.train({ epochs: 10 });

The above transfer model is trained to recognize the two words — “hello” and “hi”. Execute the below line of code to download this model to the local file system.

await transferRecognizer.save('downloads://my-model');

Notice the difference between saving the pre-trained model and the transfer model. While saving the transfer model, we invoke the save() function directly on the transferRecognizer object.

This is because when we create the transferRecognizer object from the baseRecognizer.createTransfer() function, we get an instance of the class TransferBrowserFftSpeechCommandRecognizer which defines a save() function that internally invokes the model.save() function.

The class TransferBrowserFftSpeechCommandRecognizer is defined in the file src/browser_fft_recognizer.ts in tfjs-models speech repository.

As before if we invoke the transferRecognizer.save() function without passing the download scheme parameter, then it would save the model in the browser’s IndexedDB.

Loading a TensorFlow.js model to the browser

Generally, to load a speech model, we need three files — the model file, weights binary file and a metadata.json file. The metadata.json file contains the ordered list of words that a model is trained to recognize.

The metadata.json file for the default pre-trained model that we are using in our example can be downloaded from the below link.

https://storage.googleapis.com/tfjs-models/tfjs/speech-commands/v0.4/browser_fft/18w/metadata.json

In order to upload a model directly from the file system, we need to be using Node.js while working with Tensorflow.js code.

But in this article as we are focusing on the browser and client side Javascript code, we will look at the HTTP(s) upload scheme to load the model i.e. the model will be loaded from a URL.

So, we would need to upload our model.json, weight binary file and the metadata.json to some online system to get the HTTP(S) URL to these files. To make it work in the local development environment, we can quickly start a HTTP server using Node or Python and make these files available from that HTTP server.

Setting up a HTTP Server

Install the http-server using npm.

npm install --global http-server

Refer to the link below to install and start the http-server.
https://www.npmjs.com/package/http-server

Enabling URLs for the model and metadata

Let’s make use of the pre-trained model that we downloaded in the previous section and upload it back to initialize a speech recognizer.

Place all the three files in a single directory. Open a command prompt and navigate to the directory where the files are placed and start the http-server using the below command. The server should start in the port 8080 by default.

http-server

Verify if the files are available using the below URLs. Make sure that the model file is named as “my-model.json”, weight binary file is name as “my-model.weights.bin” and the metadata file is named as “metadata.json” for the below URLs to work.

http://localhost:8080/my-model.json
http://localhost:8080/metadata.json

Upload the model to create a new speech recognizer

Let’s make use of the existing model at hand to create a speech recognizer that could be used to recognize words.

Execute the below line of code to upload this model and initialize a speech recognizer using this model.

const customSpeechRecognizer = 
       speechCommands.create('BROWSER_FFT', null, 
                             'http://localhost:8080/my-model.json', 
                             'http://localhost:8080/metadata.json'); await customSpeechRecognizer.ensureModelLoaded();

The second parameter to the create function is the vocabulary to be used. When we create the speech recognizer using a custom model, we don’t have to specify the vocabulary as that is determined by the metadata.json file.

Issue in accessing the model.json or metadata.json in local development environment

Run the application code from the same http-server so that the app and files are part of the same domain.
OR
Install a chrome extension to disable CORS while testing in local development environment. Please disable it only during development for testing purpose. For instance, use CORS Unblock chrome extension.

References

Refer to the TensorFlow.js documentation in the below link to understand more on the other schemes of download that are available to download a model.
https://www.tensorflow.org/js/guide/save_load
Steps to install and start the http-server in local
https://www.npmjs.com/package/http-server
TensorFlow.js Models Repo (Speech Commands)
https://github.com/tensorflow/tfjs-models/tree/master/speech-commands

Originally published at https://handsondeeplearning.com.