Running spaCy as a Service on GKE
This is a step by step guide of running SpaCy a Natural Langauge processing library on Google Cloud Kubernetes Engine (GKE) cluster. These steps assumes that you have followed the steps to create cluster in your account.
The api uses the examples of the from Spacy github repository
App
We are using Flask to define two apis for SpaCy
- To fetch the entity relation
- To fetch noun and verb phrase extraction from a given sentence
from flask import Flask
from flask import request, jsonify
from flask import Response
from flask import json
from gevent.pywsgi import WSGIServer
import numpy as np
import wave
import sys
import spacy
import textacy
import os.path
from spacy.matcher import PhraseMatcher
from entity_relation import extract_currency_relations
nlp = spacy.load('en_core_web_sm')
matcher = PhraseMatcher(nlp.vocab)
app = Flask(__name__)
pattern = r'<VERB>?<ADV>*<VERB>+'
def extract_noun_phrase(text):
doc = nlp(text)
noun_phrases = []
for np in doc.noun_chunks:
noun_phrases.append(np.text)
print(noun_phrases)
return noun_phrases
def extract_verb_phrase(text):
doc = nlp(text)
verb_phrases = []
verb_chunks = textacy.extract.pos_regex_matches(doc, pattern)
for vb in verb_chunks:
verb_phrases.append(vb.text)
print(verb_phrases)
return verb_phrases
@app.route('/extract-phrase', methods = ['POST'])
def extract_phrase():
if request.method == 'POST':
data = request.get_data()
dataDict = json.loads(data)
nounPhrase = extract_noun_phrase(dataDict["text"])
verbPhrase = extract_verb_phrase(dataDict["text"])
phraseDic = {
"noun": nounPhrase,
"verb": verbPhrase
}
return jsonify(phraseDic)
else:
return Response()
@app.route('/extract-relation', methods = ['POST'])
def find_relation():
if request.method == 'POST':
data = request.get_data()
dataDict = json.loads(data)
doc = nlp(dataDict["text"])
relations = extract_currency_relations(doc)
output = {}
for r1, r2 in relations:
output[r1.text] = r2.text
return jsonify(output)
else:
return "Invalid Request"
@app.route('/ping', methods = ['GET'])
def health():
return "Ok"
if __name__ == '__main__':
print("Starting the server...")
port = 8050
http_server = WSGIServer(('', port), app)
print("Server started and listing on port: ", port)
http_server.serve_forever()
Complete code is in git: https://github.com/k8scaleio/SpaCyServer
Docker
Now let’s build the docker image for the above app. We are going to use ubuntu base image for building the docker container
Dependencies are being pull from requirements.txt file
Build the docker container by cloning the repository and running the below command
docker build -t spacy-server:1.0 .
After you have build the container run it using
docker run -p 8050:8050 spacy-server:1.0
You can test your server by running a curl command
curl -d ‘{“text”:”Net income was $9.4 million compared to the prior year of $2.7 million.”}’ -H “Content-Type: application/json” -X POST http://localhost:8050/extract-relation
GKE deployment
Now that your docker container is working. Let’s try to deploy to the kubernetes cluster which we have created
Verify first that your kubectl is pointing to the correct cluster Run the below command
kubectl config current-context
If its not pointing to the correct cluster run the below command to fetch the credentials for it
gcloud container clusters get-credentials $CLUSTER_NAME
First we need to tag the container so that we can push it to GCR. $PROJECT_NAME is the google cloud project in which your cluster is running
docker tag spacy-server:1.0 gcr.io/$PROJECT_NAME/spacy-server:1.0
Now we are ready to push the container to Google container registry using the below command
docker push gcr.io/$PROJECT_NAME/spacy-server:1.0
Once you have pushed your container to GCR let’s create a deployment file
This deployment file can be found in the repository as well.
Now you can run kubernetes deployment command
kubectl apply -f $DEPLOYMENT_FILE
Now you should be able to access the service from your cluster.
We have learned to run a spaCy as a service in a kubernetes cluster.
Follow us on twitter: https://twitter.com/k8scaleio