AWS Lambda’s Binaries Problems

Kareem Amin
Clay
Published in
2 min readJan 31, 2018

Writing code on AWS Lambda that relies on binaries such as Tesseract, a popular OCR library or PhantomJS for scraping, is painful. You need to do the following steps to get your code running:

  1. Compile the binary on an AWS Linux instance with the same kernel as the latest AWS Lambda kernel
  2. Make sure that the wrapper libraries have the correct paths set up to call the binaries. If the library needs to write to the file system it needs to be modified to write to the ‘/tmp’ folder that is write accessible.
  3. Package the binaries with the code in a zip and deploy it

Here’s detailed walkthrough to run Tesseract:

The number of steps seems daunting.

In order to get Tesseract up and running in node.js, I used an npm library with pre-compiled binaries that gets copied when the function spins up:

npm install tesseract-lambda

I also used the official tesseract wrapper for node:

npm install tesseract.js

Finally, I needed to include the trained english data model for Tesseract. I included this file in the root directory of the function:

https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata

Normally, the wrapper library tries to download the trained data automatically to the root directory of the function but since only /tmp is writable on Lambda this fails.

In order to avoid these problems in the future, I’m going to keep a Github repo with pre-compiled binaries that work on the latest Lambda kernel:

Hopefully this helps others to get started quickly instead of spending time on set up. AWS can make this process easier in the future by allowing you to select from a list of popular binaries that can come packaged with your Lambda function.

If you’d rather get started instantly than follow all these instructions you can use Clay (where I’m a co-founder) to use the Tesseract function that I created as is or fork it to create your own copy that you can modify.

Have Fun and let me know what you build!

You can always reach me @kareemamin on Twitter or leave a note here.

--

--