Making an API out of a Hugging Face model — Introduction
Summary
This is the introduction to a blog series that shows how to create an API out of one of the Transformer models from Hugging Face.
Currently, all the AI-related blog posts out there explain how to create an AI system, but few talk about how to actually deploy it. Which is a shame because once you deliver your proof-of-concept, the first thing developers want is something that can be built into their application directly. They normally ask for an API or microservice that they can send requests to, but building those is one thing they normally don’t teach you in a data science course.
Hopefully, this blog series can give you some insight into what happens after you have a working model that can be integrated into a product. We are going to borrow a sentence encoder model from Hugging Face, create a model that recommends skill names from a CV, and deploy that using Cloud Run.
This is an introductory post, so we will not go into the technical details of the implementation. Instead, we will walk you through how the solution behaves and what tools are needed to put it together. We hope you enjoy it!
Who is writing this series?
This series is a joint effort by both Datamarinier, a data strategy company, and huapii, a developer and evangelist of a skills and performance management tool. By combining Datamarinier’s customized data solutions with huapii’s emphasis on unlocking human potential, this article delves into the intersection of technology and talent.
Prerequisites
In order to follow along with this series, you will need:
- Access to Google Cloud Platform (GCP), ideally your own project
- An active billing account associated with your project. Don’t worry, Cloud Run is pretty cheap. Just testing it should cost less than $1 and definitely will not break your bank.
- Intermediate to advanced Python programming skills
- Basic bash scripting skills
- Basic knowledge of Docker
And where can this model be used?
Actually, this was a real business problem we came across during a joint research between Datamarinier and huapii. Currently in huapii’s talent management platform, users need to put in the skills they have, their education, and their work history manually as a user profile. These are all pieces of information that people put in their CVs. So, wouldn’t it be easier if there was an AI that can read the CV for you and suggest what to put in?
This blog is about part of that solution, where the AI model suggests skills to put into the platform.
The tools we will be using
Here is a quick rundown of the tools we will use to make this service run. It’s not necessary to know everything about them, but it is useful to have a basic understanding before we start.
Hugging Face and their model
Hugging Face is a community and platform where open source code for machine learning models are distributed. It is especially famous for its natural language processing Transformer models.
Out of their endless model options, we are going to use a model called all-MiniLM-L6-v2 (https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2). This is an English sentence embedding model, so it will be perfect for transforming short-ish English phrases into embeddings, or a sequence of numbers that captures the essence of the sentence through Transformer model magic.
Cloud Run
Cloud Run is a Google Cloud Platform service that allows you to run your code as a serverless container on Google’s infrastructure. That means, if combined with a web application framework like Flask, you can instantly turn your code into an API.
Cloud Run will only charge you money while the container is running, so unless you use the API after creating it, the costs you will incur after this project will be minimal — most likely only the storage costs for the builds you submit to GCP.
Cloud Build
Cloud Build is Google’s CI/CD service. It imports your source code and transforms it in a deployable form on Google’s infrastructure. Using this service, we are going to use it to deploy our model as a Docker container running in Cloud Run.
Cloud Build stores containers in Google’s Artifact Registry, where you will be charged for a small amount for storage. It will not be much, but if you are curious you can read about it in the Artifact Registry’s documentation.
Docker
Docker is a tool that allows you to distribute your software in virtualized packages called containers. Think of these as tiny virtual machines that have just enough power and dependencies to run your application. So it can run your software perfectly but not anything else.
We are going to deploy to Cloud Run using a Docker container, so it is useful to have some basic knowledge of Docker before we start. In addition to knowing the concept of containers, make sure you know what an official image and a Dockerfile is.
Swagger
This is a documentation tool for your API. APIs that have this implemented will have a visual, interactive page when connecting to the endpoint (if you don’t know what that is, you can think of it as the URL where the API receives input). This page will show your documentation on what the API does, what should be sent, what will be returned, and in what format. It even lets people submit requests to the API as a test.
This looks complicated, but luckily the Python library Flask allows us to document using Swagger from our own code. No scary integrations involved!
How the service works after deployment
Here is a diagram of how the service behaves after the solution is complete.
First the user uploads their CV into the application. If this were a real product, that would be the app’s user interface, but in this blog’s case it is the API’s UI provided by Swagger.
The uploaded CV will be submitted through an HTTP POST request to the model that recommends skill names from the user’s CV. This is a Cloud Run service deployed through Cloud Build. This model cuts up the CV into smaller sections (snippets) and compares it to a list of skills to see if there are any skills that match the snippets. The list of skills needs to be predefined. In real life you should prepare it yourself for your specific use case, but for this blog we will simply ask ChatGPT to provide one for us.
After the matching is complete, the model returns a list of matching skills and their match scores as a response. The developers simply need to parse the results in order to get values that can be shown in the front end.
And of course, documentation. The input and output and its specifications should be clearly documented by Swagger.
What the end result would look like
When the API is finished, it will have an endpoint that accepts CVs written in PDF format and returns a list of recommended skills based on the CV’s contents. For example, you can use my profile I downloaded from LinkedIn:
You could use Postman or curl to send the requests to the API, but for a more visual understanding and easy documentation, we are going to create a simple Swagger interface using Flask. This enables developers to confirm what the API needs and test it before writing code.
So in this Swagger interface, you can simply upload your CV to your browser and submit it to your Cloud Run application. Once you do, you will get an API response in JSON format, like this.
{
"recommendations": [
{
"skillname": "Data Management",
"score": 0.5790029764175415
},
{
"skillname": "Data Analytics",
"score": 0.5725842118263245
},
{
"skillname": "Data Integration",
"score": 0.5653525590896606
}
}
What’s next?
In this first post we presented the overview of our solution. In our following posts we delve into what the Python code for the solution looks like and the GCP settings to start deploying our model on Cloud Run. Stay tuned!