My Little Lambda

Notes on getting started with Amazon Lambda for the NPB (Next Poor Bastard)

A long-winded and entirely-skippable explanation of why I needed to do the voodoo that I did do…

Every so often I like to turn a job into a learning experience; this allows me to spend 3–5x the time on the project while allegedly learning something that “will save me time in the future.”

In this particular case, I have a very nerdy project management tool (BackerSupport) that I developed to help automate a lot of the logistics of my Kickstarter projects. One aspect of this is address formatting and validation; it is well-known that many people don’t know where they live — or more precisely, they don’t know where the Post Office thinks they live.

I first commented on this issue back in Y2k when I published my Top 10 Offensive Addresses, which I will re-list here for your alleged amusement:

  1. 123 Main. Is that N Main, S Main, E Main, Main St, Main Ave, Main Blvd, Main Rd, Main St S, Main St N, Main St W, Main St E, or W Main St E? I don’t know, and if you postman could figure it out, he’d come and shoot you!
  2. Military Addresses, or “If this is Tuesday, this must be Bosnia.” We get addresses for military units that were disbanded before the civil war. And several of our sailor customers apparently went down with the ship in WW2.
  3. Puerto Rico: Streets in Puerto Rico often have 5 alternate names, all incorrect. Marimba Street is often also Calle Marimba, for at least part of it’s length. And Daquiri Avenue is often Ave Daquiri. You want to know why Puerto Rico isn’t a state yet? It’s because the paperwork can’t get delivered.
  4. Hawaii: “Is that Lahalanahalalaha ave or Lalahanalahahala ave?” And people in Hawaii love to give you their address over the phone, the sadists.
  5. Route 66. Is that Route 66, State Route 66, Old State Road 66, New Highway 66, State Highway 66, County Route 66, County Highway 66, or I-66. And which direction?
  6. College Addresses. You may think your address is 123 Dweeble Hall, Ivy U. But that isn’t an address. An address needs a street. Ivy U turns out to be as 456 University Ave, and the internal address is effectively a mailstop.
  7. Big Companies. They all think the name of the company, or the name of the building, is their address. It isn’t. Apple is a notable exception, they know they reside at 1 Infinite Loop.
  8. People who pretend they aren’t using a suite or post office box. They have addresses like 1923-B Important Address Avenue. The real address is 1923 Important Address Avenue Suite B. Or sometimes Box B.
  9. Big people with big offices. They brag that their addresses go something like 101–110 Extended Street. If I were the USPS, I’d rip up letters addressed that way into 10 segments and delivery one to each address.
  10. And last, but not least, people who live at addresses that are screwed up in the USPS database. In particular, this means people who live on FIRST Street [the true, accurate legal address] as opposed to where the USPS wants you to live, which is 1st Street. I mean, we love you, really we do, but it sure would make our life easier if you could move.

So address parsing and validation is a useful thing to be able to do, and BackerSupport had a pretty decent way of dealing with it:

  1. A series of address parsing templates that handled about 95%+ of the addresses people gave me.
  2. Passing the resulting formatted address to the USPS Zipcode Lookup webpage (which not only looks up Zip+4, but also reformats the address properly), and using a little Grep-fu to extract what I need from the results.

But of course, I couldn’t leave well-enough alone, and was looking for something that would help me deal with the remaining 5%. It was then that I stumbled across the usaddress python library, which is a probabilistic parser for US addresses.

Great! Except for that fact that BackerSupport is written in FileMaker, and FileMaker doesn’t have native support for talking to Python, so it would be cats and dogs, living together, and we all know how that ends. So since I happened to have an EC2 instance running DokuWiki just sitting around mostly doing nothing (I could get by with a pico-instance if they made one), I whipped up a quick PHP wrapper so I could use usaddress via a dumb-as-rocks REST API.

And all was good…

Until it occurred to me that if I ever have to seriously update that EC2 server, I’ll probably break the API because I forgot to retweak some config file cunningly hidden in a maze of twisty directories, all alike.

Which is when I decided that I’d spend a little time playing with Amazon Lambda.


The actual useful stuff in this post (finally!)

I shall assume that you’ve created your AWS account and gotten as far as creating and running the “Hello World” Lambda app, because it is at that point that the following notes become relevant. I’ll just list the problems I had and how I solved them:

Creating a Lambda app that uses libraries Lambda does not automatically provide:

The Lambda documentation makes it clear that you have to make a .zip archive containing your app and all its dependencies, but documentation on exactly how to do it is cunningly hidden in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying ‘Beware of the Leopard.”

You might think, naive youth, that it would be on the page marked “Creating a Deployment Package (Python)” but in actual fact, that page only gives the bare outlines, especially if you are using a module is not pure python-native (like usaddress). There is however a tiny link at the bottom of the page, oh so easy to miss, that gets you to a use-case tutorial that actually tells you most of what you need to know, which is:

At that point, you now have everything but your actual lambda.py app. All you do is add that file to the folder you just unzipped, and then zip the contents of that folder (not the folder!), eg:

cd /your/lambda
zip -r9 ../archive.zip *

You can now upload the zip in the Lambda web interface. I suggest starting with a “Hello World” example that just imports your needed libraries, so you can ensure that you’ve got the environment you need.

I ended up developing my actual usaddr.py function in PyCharm (creating an equivalent python 2.7.6 environment) and then copying the final result into place for zipping. This turns out to be very convenient because you can just have the python script call the handler code when it is run in order to test it, eg:

if __name__ == ‘__main__’:
print lambda_handler(None, None)
print lambda_handler({‘derf’: ‘derf’}, {})
print lambda_handler({‘addr’: ‘1 beold hahrafrar’, ‘parse’: ‘yes’}, {})
print lambda_handler({‘addr’: ‘6810 Finian Drive\r\rWilmington\r\n\rNC 28409\rUSA’}, {})

Don’t forget to deploy your function after you’ve finished testing it!

Creating an API Gateway for a simple GET request with arguments:

I decided I wanted to have a simple URL I could call (eg: https://some.bizarre.amazon.domain/randomcrap/prod/usaddr?addr=somebody’s%20address) and get some formatted JSON back.

It took a fair amount of futzing around to get this working — the problem is that the API Gateway is very powerful but I couldn’t find good examples of how to dirt-simple things. But here’s what I figured out, with some links to useful documentation.

  • Go to the API Gateway web interface.
  • Under APIs, click LambdaMicroservice.
  • Under Resources, click on the / line.
  • Select Actions > Create Resource. Give it a name and path (in my case, usaddr for both) and click Create Resource.
  • Your new resource will be hilighted. Select Actions > Create Method. A popup will appear under the resource; select GET and click the checkbox.
  • Select integration type Lambda Function and the Lambda Region your function is deployed in. A new field will appear called Lambda Function; click on it and a popup will show you your functions; select the one you want to link to. Click Save, then OK in the warning popup.
  • You now need to tell the API what parameters it might expect. Click on the Method Request box, then URL Query String Parameters. For each parameter, click on Add Query String, enter the parameter name, and click on the checkbox (I had 3 — addr, parse and validate). Click on <- Method Execution to save your changes and return to the method execution display.
  • Next tell the API Gateway how to translate those parameters into the form your Lambda app expects. Click on Integration Request then Body Mapping Templates. Set Request body passthrough to None, since there never should be one on a GET request. Click on Add mapping template. Under Content Type, type in application/json even though that appears (greyed) in the text box. This is the default if a Content Type is not specified, which it won’t be. Click on the checkbox. In the text area that appears, enter some JSON that describes how you’d like to map your parameters to the event structure that your Lambda function will receive. This is actually a javascript snippet and the documentation for it can be found here. Here’s mine (I had to get a little fancy to allow addresses with linebreaks):

{
“addr”: “$util.escapeJavaScript($input.params(‘addr’)).replaceAll(“\’”,”’”)”,
“parse”: “$input.params(‘parse’)”,
“validate”: “$input.params(‘validate’)”
}
  • Click Save. Click on <- Method Execution to save your changes and return to the method execution display.
  • Click Test. Enter some Query Strings, click Test and if the stars align, you’ll get what you expect — though most likely, you’ll get what you deserve.
  • Click Actions > Deploy API. Select a stage (like “prod”), click Deploy and your API is alive. Try it out using the URL they provide (you’ll have to tack on the path and ?arguments, of course)

This only scratches the surface of what Lambda and the API gateway can do, but it should get you over a few of the initial hurdles. If you want to play with the tool I ended up creating, you can find it here.