Word Embedding Using BERT In Python

Embedding words into vectors using a deep learning NLP model (BERT) with just a few lines of Python

Anirudh S
Towards Data Science
4 min readDec 16, 2019

--

Word Embedding: What are They?

In the world of NLP, representing words or sentences in a vector form or word embedding opens up the gates to various potential applications. This functionality of encoding words into vectors is a powerful tool for NLP tasks such as calculating semantic similarity between words with which one can build a semantic search engine. For example, here’s an application of word embedding with which Google understands search queries better using BERT. Arguably, it’s one of the most powerful language models that became hugely popular among machine learning communities.

BERT (Bidirectional Encoder Representations from Transformers) models were pre-trained using a large corpus of sentences. In brief, the training is done by masking a few words (~15% of the words according to the authors of the paper) in a sentence and tasking the model to predict the masked words. And as the model trains to predict, it learns to produce a powerful internal representation of words as word embedding. Today, we’ll see how to get the BERT model up and running with little to no hassle and encode words into word embedding.

BERT Word Embedding Model Setup

There’s a suite of available options to run BERT model with Pytorch and Tensorflow. But to make it super easy for you to get your hands on BERT models, we’ll go with a Python library that’ll help us set it up in no time!

Bert-as-a-service is a Python library that enables us to deploy pre-trained BERT models in our local machine and run inference. It can be used to serve any of the released model types and even the models fine-tuned on specific downstream tasks. Also, it requires Tensorflow in the back-end to work with the pre-trained models. So, we’ll go ahead and install Tensorflow 1.15 in the console.

pip3 install tensorflow-gpu==1.15

Up next, we’ll install bert-as-a-service client and server. And again, this library doesn’t support Python 2. So, make sure that you have Python 3.5 or higher.

pip3 install -U bert-serving-server bert-serving-client

The BERT server deploys the model in the local machine and the client can subscribe to it. Moreover, one can install the two in the same machine or deploy the server in one and subscribe from another machine. Once the installation is complete, download the BERT model of your choice. And you can find the list of all models over here.

Deploying the Model

Now that the initial setup is done, let’s start the model service with the following command.

bert-serving-start -model_dir /path_to_the_model/ -num_worker=1

For example, if the model’s name is uncased_L-24_H-1024_A-16 and it’s in the directory “/model”, the command would like this

bert-serving-start -model_dir /model/uncased_L-24_H-1024_A-16/ -num_worker=1

The “num_workers” argument is to initialize the number of concurrent requests the server can handle. However, just go with num_workers=1 as we’re just playing with our model with a single client. If you’re deploying for multiple clients to subscribe, choose the “num_workers” argument accordingly.

Subscribing with BERT-Client

We can run a Python script from which we use the BERT service to encode our words into word embedding. Given that, we just have to import the BERT-client library and create an instance of the client class. Once we do that, we can feed the list of words or sentences that we want to encode.

from bert_serving.client import BertClient()client = BertClient()vectors = client.encode([“dog”],[“cat”],[“man”])

We should feed the words that we want to encode as Python list. Above, I fed three lists, each having a single word. Therefore, the “vectors” object would be of shape (3,embedding_size). In general, embedding size is the length of the word vector that the BERT model encodes. Indeed, it encodes words of any length into a constant length vector. But this may differ between the different BERT models.

Computing Similarity Between Words

Okay, so far so good! What to do with the vectors which are just some numbers? Well, they’re more than just numbers. As I said earlier, these vectors represent where the words are encoded in the 1024-dimensional hyperspace (1024 for this model uncased_L-24_H-1024_A-16). Moreover, comparing the vectors of different words with some sort of similarity function would help determine how close they are related.

Cosine similarity is one such function that gives a similarity score between 0.0 and 1.0. Provided that, 1.0 means that the words mean the same (100% match) and 0 means that they’re completely dissimilar. Here’s a scikit-learn implementation of cosine similarity between word embedding.

from sklearn.metrics.pairwise import cosine_similaritycos_lib = cosine_similarity(vectors[1,:],vectors[2,:]) #similarity between #cat and dog

Word Embedding with BERT Done!

You can also feed an entire sentence rather than individual words and the server will take care of it. There are multiple ways in which word embeddings can be combined to form embedding for sentences like concatenation.

Check out the other articles on Object detection, authenticity verification and more!

Originally published at https://hackerstreak.com

--

--

Towards Data Science
Towards Data Science

Published in Towards Data Science

Your home for data science and AI. The world’s leading publication for data science, data analytics, data engineering, machine learning, and artificial intelligence professionals.

Anirudh S
Anirudh S

Written by Anirudh S

Always Believing there's more to learn!

Responses (4)