Serverless Approximate Nearest Neighbors on AWS Lambda with Annoy and Chalice

Server room

Recently, I’ve been playing around with adding and subtracting word embeddings learned from GloVe. For example, we can take the embedding for the word “king” subtract the embedding for “man” and then add the embedding for “woman” and end up with a resulting embedding (vector). Next, if we have a corpus of these word-embedding pairs we can search through it to find the most similar embedding and retrieve the corresponding word. If we did this for the query above we would get:

King + (Woman - Man) = Queen

There are many ways to search through this corpus of word-embedding pairs for the nearest neighbors to a query. In this article I will be using an approximate technique using the Python library Annoy. I am using word-embedding pairs, but the instructions below work with any type of embeddings: such as embeddings of songs to create a recommendation engine for similar songs or even photo embeddings to enable reverse image search.

Annoy builds an index which is a data-structure that is queried to find approximate neighbors, which I refer to as an Annoy index.

This post will focus on how to take a prebuilt Annoy index and put it on the web so anyone can access our approximate nearest neighbor service. We will be pushing our service to AWS Lambda with Chalice.

Why Lambda? Because we would rather be working on improving our model used for approximate nearest neighbors than spending time doing DevOps. Because with Lambda we pay for only what we use. Because its a relatively quick way to write a service.

Let’s get started!

My Environment

Before we start, I’d like to share the setup I used. Results may vary depending on your setup.


  • MacBook Pro with 16GB RAM
  • Python Version: 2.7


  • Amazon Linux AMI 2017.09.1 (HVM), SSD Volume Type 64bit
  • t2.micro (the one that is free tier eligible)
  • Python Version: 2.7

Creating and Deploying a Simple Chalice App

First, let’s start off by creating a chalice project.

chalice new-project wa-ai

Let’s cd into that directory.

cd wa-ai

If you have never configured AWS before, follow these commands. Replacing YOUR_ACCESS_KEY_HERE, YOUR_SECRET_ACCESS_KEY, and YOUR_REGION with their respective values.

mkdir ~/.aws

cat >> ~/.aws/config
region=YOUR_REGION (such as us-west-2, us-west-1, etc)

Now you can deploy your chalice app!

chalice deploy
Regen deployment package.
Updating IAM policy for role: wa-ai-dev
Updating lambda function: wa-ai-dev
API Gateway rest API already found: o5k2wl5gy2
Deploying to API Gateway stage: api

The link that is returned is your endpoint. Go ahead and navigate to it with your favorite browser or curl, it should return the json {‘hello’: ‘world’}. Now to the fun part. Open up your favorite text editor and let’s open up You should see this:

The included

Chalice is very similar to Flask if you have experience with that web framework. Decorators mark endpoints and the function immediately below the decorator is what is run when that endpoint is hit.

We will use the approximate nearest neighbors library Annoy from Spotify to power our nearest neighbors search. Usually you can just pip install annoy and pip freeze > requirement.txt; however, annoy can’t be built automatically by Chalice. At the time of writing, l got this error when I tried “chalice deploy” with Annoy frozen into requirements.txt:

Could not install dependencies:
You will have to build these yourself and vendor them in the chalice vendor folder.

So, in order to use Annoy we will have to build Annoy ourselves and include it in the vendor/directory. (which we haven’t made yet, don’t worry). For more detailed instructions and more information look at the official documentation: Chalice docs.

Let’s make the vendor directory now. We will populate it next.

mkdir vendor

Aside: Setting up an ec2 Instance

In order to build Annoy, we have to be running on a machine with the same os as the one Amazon lambda functions run on. So let’s spin up an ec2 instance. Here are the specs of my instance:

Amazon Linux AMI 2017.09.1 (HVM), SSD Volume Type 64bit

t2.micro (the one that is free tier eligible)

After you launch your instance, make sure that you can ssh into it. Specifically, after your instance is launched, go to your “instances” tab click on the instance we just created and scroll down to “security groups.” Click on launch-wizard, the tab “inbound,” the button “edit,” and then add a rule with the following:


Protocol: [should be auto filled to TCP]

Port Range: [should be auto filed to 22]

Source: [from the dropdown choose “My IP” and your ip should be auto filed in. If not, fill in the space with the IP of the computer you will be ssh-ing into your instance with]

Click save and now ssh into your spanking new instance. If you don’t have a private key file or still need help ssh-ing into your instance, follow the official guide: AWS guide.

ssh -i [path-to-private-key.pem] ec2-user@[name of instance]

Now that we are in our instance, let’s make sure everything is secure and up to date with a quick update

sudo yum update

Your instance comes with python2.7 and pip. But, in order to build annoy, we need to download gcc, gcc-c++ and wheel.

sudo pip install wheel
sudo yum install gcc
sudo yum install gcc-c++

Building a Python Package for Chalice

Now let’s download (not install) Annoy, and then let’s wheel Annoy (sounds weird, I know).

pip download annoy
pip wheel annoy-1.10.0.tar.gz

Now if you “ls” you should see the file annoy-1.10.0-cp27-cp27mu-linux_x86_64.whl. We want to scp this .whl file onto our own computer into the vendor directory.

scp -i [path-to-private-key.pem] ec2-user@[name of instance] [path to wa-ai/vendors]

Let’s move back to the terminal on our own computer and cd into the vendor directory. If you “ls” now, you should see the new .whl file we just scp-ed over: annoy-1.10.0-cp27-cp27mu-linux_x86_64.whl. To unzip this file just call the command.

unzip annoy-1.10.0-cp27-cp27mu-linux_x86_64.whl

If all went smoothly, you should see two new directories “annoy-1.10.0.dist-info” and “annoy”. Now let’s remove “annoy-1.10.0-cp27-cp27mu-linux_x86_64.whl.”

rm annoy-1.10.0-cp27-cp27mu-linux_x86_64.whl

Now you can freely import and use Annoy in “” How cool is that!

Serverless Annoy

Let’s cd back into wa-ai and open up “”. We can now import Annoy at the top of the file and we won’t get nasty error messages (woot!). I have a pre-built an Annoy index which I will use, but you can learn more about how to build your own from the official repo:

Internally, Annoy maps indices to vectors, so we have to keep track of a mapping from indices to keys ourself. To do this, I included the lmdb python package to store a map of keys to indices and vice versa. I have a pre-built lmdb mini database that I will be using, to build your own follow the official docs: lmdb official docs. Unfortunately, Chalice also fails to build lmdb automatically so if you choose to use lmdb to store your map follow the previous step: “Building a Python Package for Chalice” but instead of building Annoy build lmdb. I’ve also included NumPy for easier matrix calculations and boto to load s3 files. I personally had trouble letting Chalice build NumPy and boto from requirements.txt so I manually built them as well.

Plug: If you want to see how I built my Annoy index and lmdb database, read about it here: Simple Approximate Nearest Neighbors

Finally, onward to the code! Let’s import everything we need in “”:

Imports for

Aside: AWS Lambda Size Limitations

AWS Lambda has some pretty strict restrictions on resources. For example:

AWS Lambda size restrictions: source

The restriction on “/tmp” space means that files we want to load into “/tmp” such as our Annoy index, must be less than 512MB. The 250MB limit on uncompressed code/dependencies means we can’t just import every library under the sun.

Serving Models from S3

It’s good practice to not include models and large static files with code (also if we did we would go over Lambda’s code/dependencies limit). So let’s upload our Annoy index and our lmdb mini database to S3 and serve them from S3. If you’ve never used S3 before take a look at the official guide here: Starting with S3.

Basically, we want to create a new bucket, upload the files we want our Lambda function to download, and then use boto to serve those files. I named my bucket “wavecs” and the uploaded Annoy index “glove_min.annoy.” Uploading and serving lmdb databases is a little trickier because instead of files the databases are directories. We could zip the database directory, upload it to S3, download the zip with boto in our Lambda function, and then unzip the directory. But, instead a simple workaround is to upload the contents of the micro database: “data.mdb” and “lock.mdb” to S3. Then, in our Lambda function when we are setting up lmdb we will us the “os” python package to make a directory and download “data.mdb” and “lock.mdb” into it.

Aside: Storing Data in /tmp

Both Annoy and lmdb mmap their respective data into a process’ address space, which is lightning fast. Therefore, we need to provide both Annoy and lmdb paths to their respective data. We can do this by downloading our data into the “/tmp” directory and giving Annoy and lmdb the paths.

Here is a section from Tim Wagner from Amazon to share some light on how Lambda handles the “/tmp” directory:

Let’s say your function finishes, and some time passes, then you call it again. Lambda may create a new container all over again, in which case the experience is just as described above. This will be the case for certain if you change your code.
However, if you haven’t changed the code and not too much time has gone by, Lambda may reuse the previous container. This offers some performance advantages to both parties: Lambda gets to skip the nodejs language initialization, and you get to skip initialization in your code. Files that you wrote to /tmp last time around will still be there if the sandbox gets reused.

This is great for us. We don’t have to download our large Annoy index every time our Lambda function is invoked!

Serving Models from S3 — Continued

Now if we go back to our “” we can use boto to download the files we just uploaded:

Downloading model with boto from S3

Adding Helper Functions

Great, now that we have both an Annoy and lmdb instances we can start writing code to preprocess and calculate nearest neighbors. Your code here will be different depending on what you want to calculate.

Approximate nearest neighbors helper functions

Adding Chalice Endpoints

Now that we have all the helpers we need to find nearest neighbors, let’s add a few endpoints to call our functions.

Chalice endpoints

Testing and Deploying our Finished Code

Now that we have all the code done, let’s make sure everything works. Open up terminal again and run:

chalice local

This should start up a simple localhost with default port 8000. Go ahead and navigate (or curl) to localhost:8000 and make sure that you see the json:

{‘status’: ‘server is running’} 

Now test to see if the /add/ endpoint returns something in the form of:

 {‘results’: [results], ‘tokens’:[tokens]}

If those both work, let’s run “chalice deploy” and push our code to AWS.

chalice deploy
Regen deployment package.
Updating IAM policy for role: wa-ai-dev
Updating lambda function: wa-ai-dev
API Gateway rest API already found: o5k2wl5gy2
Deploying to API Gateway stage: api

Congratulations! You now have a AWS Lambda function that calculates approximate nearest neighbors at the link https://[hash]


AWS Lambda is an amazing technology. Size limitations on code/dependencies and /tmp restrict it for heavy use cases, but for our simple approximate nearest neighbors code it works great.

If you’re having trouble please let me know how I can help! If I wrote something wrong (very likely) please let me know. If you want to just get in touch email me!

My email is:

Like what you read? Give Kevin Yang a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.