Supercharged Embeddings: Unlocking Faster BERT Inference at Scale with Spark NLP and OpenVINO™

Rajat
OpenVINO-toolkit
Published in
10 min readDec 21, 2023

Introduction

Welcome to the third part of our technical blog series exploring the power of OpenVINO Runtime and Spark NLP for deep learning inference within Java and Spark applications. In the previous two blogs [Part 1, Part 2], we laid the groundwork and the foundational concepts. In the first blog, we focused on how to use OpenVINO Runtime in Java to run deep learning inference, providing a step-by-step guide on how to set up the environment, load a model, and run inference. Subsequently, in the second blog, we demonstrated how OpenVINO Runtime can significantly accelerate NLP inference within the Spark NLP ecosystem.

In this third blog, we will delve into building a Spark NLP pipeline for token-level embedding generation with the popular BERT-Base-Cased model. We will leverage the capabilities of OpenVINO as the backend for the BertEmbeddings annotator, achieving efficient and optimized embedding extraction.

This blog assumes familiarity with the previous two blogs in the series. If you haven’t already, we encourage you to check them out for a deeper understanding of the foundational concepts:

Apache Spark Ecosystem

Apache Spark is a general-purpose, distributed, in-memory data-processing framework that can perform processing tasks on large-scale datasets quickly and efficiently, and handle a wide range of workloads including batch applications, interactive queries and stream processing. Due to these qualities, it finds wide application in the domains of big data and machine learning. Spark has its own ML library- SparkML, which provides a set of APIs that lets you create and tune your own machine-learning pipelines. However, it still does not cover all NLP tasks, and integrating an external NLP framework introduces inefficiencies in serializing and copying string data.

Spark NLP, an open-source project led and sponsored by John Snow Labs provides NLP annotations that can scale easily on distributed clusters. Built on top of Apache Spark and SparkML, it offers a production-grade, unified solution with which you can transform raw text into structured features, run your own NLP pipelines and feed the results to downstream ML pipelines with a simple-to-learn API in Scala and Python. In addition to pre-trained models available in the SparkNLP Models Hub, you can also import your custom models into equivalent Spark NLP Annotators (version 3.1.0 and above). You can find a list of supported models, with plenty of example notebooks describing how to export models from HuggingFace and TF Hub for use with Spark NLP here.

Source: John Snow Labs Spark NLP

OpenVINO integration with SparkNLP lets you use the OpenVINO backend to run models imported into Spark NLP. OpenVINO can handle models in Tensorflow, Tensorflow Lite, ONNX and PaddlePaddle formats. You can also convert your model to the OpenVINO Intermediate Representation (IR) format and take advantage of the toolkit’s full optimization and quantization features. The official documentation provides more details on how to prepare your model for deployment using the OpenVINO Toolkit.

Source: OpenVINO™ Toolkit

In this article, we will cover the following steps:

1. Exporting the BERT base cased model from HuggingFace.

2. Setup Spark NLP.

3. Loading the model into the BertEmbeddings annotator in a Spark NLP pipeline.

4. Demonstrating execution of Spark NLP pipeline with OpenVINO backend.

Exporting the Model

Hugging Face is an AI community and platform that provides access to tools that enable you to build, train and deploy powerful machine-learning models. Through their open-source Transformers framework, Hugging Face provides APIs and tools that offer access to over 20,000 pre-trained models based on the state-of-the-art transformer architecture for various tasks in different modalities including Natural Language Processing, Computer Vision and Audio.

In this demo, we will export the bert-base-cased model, a raw BERT model that can be used either for masked language modeling or next sentence prediction. To export the model from HuggingFace, we can use the transformers library available via PyPi.

Note: The BertEmbeddings annotator supports BERT models in the Fill Mask category in Hugging Face. Models trained or fine-tuned on a specific task such as Token Classification cannot be used.

Prerequisites

  • Python 3.6 or higher

Steps to export the model

  • Start by creating and activating a Python virtual environment to avoid any dependency compatibility issues.
python -m venv .env
source .env/bin/activate
  • Install the Hugging Face transformers library with the following command. In this demo, we will use version 4.31.0. Additionally, since the source model is Tensorflow-based, we need the tensorflow backend to be installed. This can be achieved in a single command as follows:
pip install transformers[tf-cpu]==4.31.0
  • Now that we have installed the necessary dependencies, we will see how to download the BERT model and export it in the Tensorflow SavedModel format.
from transformers import TFBertModel, BertTokenizer
import tensorflow as tf
model = TFBertModel.from_pretrained('bert-base-cased')
tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
  • The TFBertModel class, which is essentially a tf.keras.Model subclass represents the bare BERT model that outputs raw hidden states without any specific head on top. We define the model signature before exporting as follows:
@tf.function(
input_signature=[
{
"input_ids": tf.TensorSpec((None, None), tf.int32, name="input_ids"),
"attention_mask": tf.TensorSpec((None, None), tf.int32, name="attention_mask"),
"token_type_ids": tf.TensorSpec((None, None), tf.int32, name="token_type_ids"),
}
]
)
def serving_fn(input):
return model(input)
  • Now we can safely export the model using the save_pretrained function offered by HuggingFace. Enable the saved_model flag to export the model in the SavedModel format.
model.save_pretrained(save_directory="bert-base-cased", saved_model=True, signatures={"serving_default": serving_fn})
tokenizer.save_vocabulary("bert-base-cased/saved_model/1/assets")
  • This will save the model files in the bert-base-cased/saved_model/1 directory and the vocab file needed for tokenization inside Spark NLP in the assets directory.
  • To import the model, all we need now is bert-base-cased/saved_model/1 folder. Move this folder to /root/models using the following command:
mkdir -p /root/models
mv bert-base-cased/saved_model/1 /root/models/bert-base-cased

We can either import the Tensorflow model directly or leverage OpenVINO’s optimization capabilities like 8-bit quantization to reduce model size and improve CPU throughput. For instructions on model optimization and supported techniques, see the optimization guide here.

Spark NLP Pipeline

Spark ML offers two primary components for building machine learning applications- Estimators and Transformers. Estimators can be trained on a piece of data using the fit() method, and Transformers, generally the result of the fitting process, apply some transformation on a target DataFrame using the transform() method. Spark NLP extends these concepts through Annotators- the primary components that spearhead NLP tasks. There are two types of Annotators: Annotator Approaches that represent Estimators and require a training stage, and Annotator Models that represent Transformers and append the results of the current annotation to the input frame without ever deleting or replacing previous data. A list of Spark NLP Annotators and their uses can be found here.

Pipelines are a mechanism for chaining multiple annotators in a single workflow. An end-to-end NLP workflow can be represented as a Pipeline where each stage performs a relevant transformation like tokenization. The following diagram represents a simple pipeline to generate and extract word embeddings using the BertEmbeddings annotator.

Sample Pipeline stages

The following steps implement the pipeline illustrated above:

Prerequisites

  • Ubuntu 20.04 or higher
  • OpenJDK 8 or higher
sudo apt-get update
sudo apt-get install -y default-jdk
  • OpenVINO 2023.0.1 or higher

In this demo, we will use OpenVINO version 2023.0.1. The OpenVINO components are assumed to be installed in to /opt/intel/openvino-2023.0.1 folder, based on the steps outlined here.

  • Spark 3.2.3 or higher

Download Apache Spark from the official release archives. Here, we will use Apache Spark version 3.4.1.

curl -L https://archive.apache.org/dist/spark/spark-3.4.1/spark-3.4.1-bin-hadoop3.tgz --output spark-3.4.1-bin-hadoop3.tgz

Then unpack the downloaded archive to opt/spark-3.4.1

mkdir /opt/spark-3.4.1
sudo tar -xzf spark-3.4.1-bin-hadoop3.tgz -C /opt/spark-3.4.1 --strip-components=1

Now set up Apache Spark environment variables and add the binaries to the PATH variable by adding the following lines to the .bashrc file

vi ~/.bashrc
# Add the following lines at the end of the .bashrc file
export SPARK_HOME=/opt/spark-3.4.1
export PATH=$PATH:$SPARK_HOME/bin

Finally, load these environment variables to the current terminal session by running the following command

source ~/.bashrc

Verify that the installation is successful by running

spark-submit --version
  • Spark NLP

In this example, we assume that the Spark NLP jar is located in the ~/spark-nlp/python/lib folder.

Note: As of this article, to take advantage of the OpenVINO integration, you must build Spark NLP from source with the latest changes. To achieve this, build the OpenVINO jar using the instructions available here, copy the OpenVINO jar file to a new lib directory in the Spark NLP project root and run sbt assemblyAndCopy from the project root to compile the Spark NLP jar to spark-nlp/python/lib folder. See here for steps.

Steps to run Spark NLP Pipeline

Spark provides an interactive shell- a Scala REPL that offers a powerful and convenient way to test out Spark statements, quickly prototype applications and learn the API. We can use this shell to demonstrate how to import the saved BERT model into Spark NLP and run a sample pipeline. The sample code is available here.

The following command launches the Spark shell with the Spark NLP jar. Use the driver-memory parameter to increase the available memory.


spark-shell --jars ~/spark-nlp/python/lib/sparknlp.jar --driver-memory=4g

This should open the shell in which we can start building our pipeline.

First import the necessary dependencies

import com.johnsnowlabs.nlp.embeddings.BertEmbeddings
import com.johnsnowlabs.nlp.base.DocumentAssembler
import com.johnsnowlabs.nlp.annotators.Tokenizer
import org.apache.spark.ml.Pipeline
import com.johnsnowlabs.nlp.EmbeddingsFinisher
import org.apache.spark.sql.functions.explode

Then store the model path in a variable.

val MODEL_PATH="file:/root/models/bert-base-cased"

Now we can begin defining the components of our pipeline. The DocumentAssembler is the entry-point for every Spark NLP pipeline. This Spark transformer reads the raw input text as String columns and transforms it into the Document type necessary for further processing in Spark.

val document = new DocumentAssembler().setInputCol("text").setOutputCol("document")

With the input text ready for further processing, our next step will be to define the tokenizer that will split the text into tokens that our model can understand.

val tokenizer = new Tokenizer().setInputCols(Array("document")).setOutputCol("token")

Next, we define the BertEmbeddings annotator, a Spark transformer that is used to produce token-level embeddings using the BERT model. The annotator can be used out-of-the-box with support for downloading and using pre-trained models directly with the pretrained function.

To import a custom model to use with this annotator, we can use the loadSavedModel function. By default, TF models are run using Tensorflow. To use the OpenVINO backend, enable the useOpenvino flag.

val embeddings = BertEmbeddings.loadSavedModel(MODEL_PATH, spark, useOpenvino = true).setInputCols("token", "document").setOutputCol("embeddings").setCaseSensitive(true)

Finally, we use the EmbeddingsFinisher annotator to extract the resulting embeddings.

val embeddingsFinisher = new EmbeddingsFinisher().setInputCols("embeddings").setOutputCols("finished_embeddings")

To assemble these stages into a pipeline, use the following Spark statement

val pipeline = new Pipeline().setStages(Array(document, tokenizer, embeddings, embeddingsFinisher))

That’s it! Let us test out the pipeline with some input text.

val data = Seq("The quick brown fox jumped over the lazy dog.").toDF("text")

Fit and transform the raw input dataframe

val result = pipeline.fit(data).transform(data)

Now, let us examine the contents of the resulting dataframe

result.select(explode($"finished_embeddings") as "result").show(5, 100)

These raw embeddings can then be used for downstream tasks like Classification and Named Entity Recognition. The entire Scala application is available here.

Conclusion

This post explored the key steps for building a Spark NLP pipeline to generate token embeddings using BERT and OpenVINO. We started by exporting the popular BERT base model from HuggingFace and preparing it for use in Spark NLP. We then set up a sample Spark NLP pipeline with the DocumentAssembler, Tokenizer, BertEmbeddings, and EmbeddingsFinisher annotators. The pipeline can generate embeddings for any input text which can then power downstream NLP tasks, and OpenVINO’s neural network optimizations and hardware accelerations unlock greater speeds allowing quicker embeddings generation.

Note: For steps on generating BERT embeddings using the SparkNLP Python API, check out this notebook. Note that you need to build the jar and wheel files from this feature branch to run the notebook. Use the steps outlined in this article to build the Spark NLP jar. Then run pip wheel . from the sparknlp/python directory to build the wheel file.

Additional Resources

--

--