Ready for Launch: API Deployment With FastAPI and AWS

Aaron Watkins Jr
The Startup
Published in
8 min readSep 25, 2020

--

Photo by Bill Jelen on Unsplash

One of the great things about learning data science at Lambda School is that after all of the sprint challenges, assessments, and code challenges, you still have to prove your acumen by working on a real-world, cross-functional project. They call this portion of the program Lambda Labs, and in my Labs experience, I got to work on a project called Citrics. The idea for this project was to solve a problem faced by nomads (people who move frequently), which was the cumbersome nature of trying to compare various statistics for cities throughout the US.

Imagine if you were going to live in three different cities over the next three years: how would you choose where to go? You might want to know what rental prices looked liked, or which job industry was the most prevalent, or maybe even how “walkable” a city was. The truth is, there are probably lots of things we’d like to know before moving, but we probably don’t have hours and hours to research 10 different websites for these answers. That’s where Citrics comes in.

As a data scientist, the big-picture task for my team was to source and wrangle data for these cities and deploy an API that our front-end team could utilize to satisfy end-user search requests. While this may sound simple enough, my first concern going into this project was the wrangling piece because various sources of data may have various naming conventions for cities. Consider examples like `Fort Lauderdale` vs `Ft. Lauderdale`, or `Saint Paul` vs `St. Paul`. We knew intensive data cleaning would be necessary to ensure data integrity and continuity between each of our sources. The other initial concern was in regard to the deployment of the API because our stakeholder expected AWS deployment, but each data scientist on our team of 4 only had experience in Heroku. In this post, I’ll be walking through our process of tackling this problem, as well as nuggets on all the exciting tools AWS (and FastAPI) gives to developers when it comes to creating and deploying ETL pipelines.

From Flask to FastAPI: Deploy Basic Scaffolding

One of the first things I wanted to do (after finding data sources, of course) was to get a handle on using FastAPI to host our DS endpoints, as opposed to using Flask, with which I had more familiarity. I was pleased to learn there are many similarities, and creating routes followed a similar process of defining the endpoint’s function, and referencing that route from a `main.py` file.

# main.pyfrom fastapi import FastAPIfrom fastapi.middleware.cors import CORSMiddlewareimport uvicornfrom app.api import viz# Description TextDESC_TEXT = "Finding a place to live is hard! Nomads struggle with finding the right city for them. Citrics is a city comparison tool that allows users to compare cities and find cities based on user preferences."app = FastAPI(title='Citrics API',description=DESC_TEXT,version='0.1',docs_url='/',)app.include_router(viz.router)app.add_middleware(CORSMiddleware,allow_origins=['*'],allow_credentials=True,allow_methods=['*'],allow_headers=['*'],)if __name__ == '__main__':   uvicorn.run(app)

The next piece for getting a basic API up and running was to enter the Docker environment and familiarize myself with the structure. For our team’s repository, we had two `requirements.txt` files: 1 in the root directory (for use with our local `pipenv` or `conda` environments) and 1 in our `project` directory (which housed our Dockerfile, app, endpoints, etc). This was exciting because it allowed us to be experimental in our development environment, and iteratively add only the most necessary dependencies to our production environment.

Local `requirements.txt` (left) vs Dockerfile’s `requirements.txt` (right)

With a basic endpoint setup and a ready-to-go Dockerfile, running locally was as easy as a few commands: `docker-compose build`, and `docker-compose up`. Once that was running, I could immediately see some of the benefits in using FastAPI vs Flask, with the main one in my opinion being the documentation. In previous Flask projects, I used the `README.md` file in the project repository to hold documentation, links, etc. But with FastAPI, it actually grabs your documentation right from the docstrings!

FastAPI page on the left, VS Code screen on the right

Intro to AWS: Create a Database Instance, Elastic Beanstalk Environment

Once the basic API was functional locally, the next steps were to dive into the AWS docs and figure out how to deploy the app to an Elastic Beanstalk environment. Fortunately, AWS provides plenty of info on getting started, and Lambda was able to provide an account code that enabled me to register an IAM username, password, and region. Once I could access the IAM console, I was able to get my credential keys and proceed to pass these credentials to my local environment. This did require a couple more dependencies, such as installing the `awscli` and the `awsebcli`. The AWS docs give an example of what to expect when completing this step.

AWS Configure screen example (terminal)

The commands I found myself running often throughout this project to deploy and redeploy were:

docker build -f project/Dockerfile -t YOUR-DOCKER-HUB-ID/YOUR-IMAGE-NAME ./projectdocker logindocker push YOUR-DOCKER-HUB-ID/YOUR-IMAGE-NAME

To start off, I also created the EB instance by running `eb create MY-APP-NAME`, and viola, we were off and running! The next piece here took me back to the AWS site, particularly the EB Dashboard, where I wanted to clean up the URL address for our API site. Rather than the provided URL which was very lengthy and cumbersome, I was able to freely register a concise URL by using their “domain alias” workflow. To truly change the domain address, AWS does charge based on what domain you choose (.com vs .org, etc). However, AWS allows you to set an alias from a hosted zone for free, so in effect, the “alias” address routes traffic to the address AWS originally provides. Setting this up made things easier for our front-end development team, as the DS API alias-address was much easier to remember than the real address.

Basic DB connection test

Surprisingly, initializing a database through AWS RDS was relatively simple and straightforward. While still logged in to my IAM account, I was able to navigate to their AWS RDS dashboard, pick my “flavor” of a database (went with PostgreSQL on this one), and simply create the instance with their default settings. The trickiest part was in the connection because despite giving my database a proper name, I learned that by default AWS names a PostgreSQL database as `postgres`. Once that idiosyncrasy was discovered, I was able to easily establish a connection by using the `python-dotenv` and `psycopg2-binary` dependencies.

Building Out The API

Having successfully established a connection to the database and deployed the basic API to AWS, the next steps were in building out the API itself, including the routes (endpoints), functions, and documentation for usage. There was a visualization element to our project, where the stakeholder wanted the end-user to be able to visually compare a city’s stats, such as rental price estimates. In order for this to work dynamically, I created a separate `.py` file to house a function that connected to the database, ran a query, returned the result as JSON pairs, and then closed the connection. With this function in place, I was able to import it to the file which housed the visualization function, so when the endpoint was reached and visualization was needed, the “viz” function could query the database and create an image on the fly, returning as a JSON string for the web team to use. I also added a route to be able to view the viz images on our API directly, to add a bit of a “look before you leap” element.

Visualization example from DS API

Our team was also able to utilize some handy 3rd party APIs (WalkScore and OpenWeatherMap in particular) which allowed us to access how “walkable” a city was as well as its current weather information. The functionality used for these routes also allowed for dynamic operation because they did not need to hit nor store data in the database at all.

Screen share from front-end site search on Baltimore, MD

One of the last pieces on this front was to handle “bad input” and various naming conventions. Identifying the edge cases (ft vs fort, st vs saint, etc) allowed me to write out fairly simple if-statements, and python’s `.title()`, `.lower()`, and `.upper()` functions help sanitize input strings, but the main challenge was going to be communicating to the user (in our case, the web dev team) why certain requests would not work. Fortunately, FastAPI has a useful `HTTPException` class that enables developers to set up `404` status codes for a specified input type. This is nice because instead of typing in ‘eggs’ for the city name and getting a status code of `500`, the API returns a status code of `404` (which lets the user know there was an error on their part) along with a message indicating what was entered that caused the problem.

Screen share from DS API

Reflection and Next Steps

Working on this project has been both exciting and rewarding, as I was surprised at how easy AWS makes it to deploy and manage web apps, as well as how many built-in options exist in FastAPI that make a ton of sense for building data-science APIs. Thus far in the project, we’ve been able to implement rental price estimates, walk scores, current weather, and state unemployment data on the DS API as well as the front-end web site. The next steps will be adding in the BLS job data, predictive endpoints for weather and rental price estimates (using time-series modeling), and re-factors of our functions and API call syntax.

If interested to follow this project’s progress, feel free to visit our repo!

--

--

Aaron Watkins Jr
The Startup

I am a Data Scientist and Software Engineer, particularly interested in predictive modeling, sports and cinema.