From side project to 250 million daily requests

Published in

GET PUT POST

8 min readMar 29, 2016

This is the 4th edition of GET PUT POST, a newsletter all about APIs. Each edition features an interview with a startup about their API and ideas for developers to build on their platform. Want the latest interviews in your inbox? Sign up here

For this edition, I spoke with Ben Dowling from ipinfo. This service grew from a tiny side project to enterprise solution for fetching location data for any IP address. Enjoy the interview!

What is IP info and what can developers do with your API?

IPInfo is an IP details API. It has geolocation details so you know the city, region, country, and often the postal code or area code for an IP. If you’re customizing content on your website, you can show different features to different people based on the country or city.

Another detail is the organization. If you look up your IP and you’re home, it might say this is a Comcast or AT&T IP address. It also returns the hostname IP. Also, we have a totally free plan — you can curl ipinfo.io without any IP address and it will give you your own IP details. With /8.8.8.8:

We have some optional add-ons as well, like carrier field or hosting provider (e.g. to detect an Amazon AWS or Rackspace IP address). We have some rudimentary proxy stuff that will let you know if an IP address is a known proxy.

Who are a few sample customers?

Tesla uses it on their website for the dealership finder. Through the API, they can automatically detect that I live in Mountain View and show that the closest dealership is in Palo Alto based on my IP address.

We’ve got lots of different ad networks that use us to customize their offers and content based. In particular, mobile ad networks will show different offers based on the country you’re in.

Their are quite a few brand names. TripAdvisor and Xerox use it to customize parts of their sites. Brooklyn Library uses it and I’m not sure what for :)

You have 250 million daily requests. Walk me through how you got initial users and what the best growth channels have been.

Initially, I put out a dead simple webpage with a bootstrap theme. All of the data came from an existing geo IP database and it just showed your IP location on a map. You can still see this on the homepage.

Pretty soon, I saw a question on StackOverflow asking if there are good APIs to see where your IP is located. I figured I already have all the data, so it was easy to build a simple API. I honestly built it in a couple of hours and then answered the question.

Within a couple months, I got an email notification from Linode that said your CPU usage is off the charts. That’s strange — I hosted a bunch of sites on the same server so I didn’t know what’s going on here. I logged in, checked the access logs, and there were millions of API requests per day. It really started taking off on its own thanks to StackOverflow.

It’s just been inbound? No outbound sales?

Yeah, absolutely. There was nothing to the API beyond a GET request for basic IP info, so I looked into improving it. There were a few people doing 10 million requests a day and a bunch of people doing around a million.

I decided to try some paid plans using access tokens and made it free for 1,000 requests a day. I figured most small side projects would need less than this daily and be able to use it for free. After adding access tokens and rate limiting, I added four plans: $10, $50, $100, and $200 per month.

One of my first paying customers was Tesla, within a week or so of rolling out the paid plans. It’s continued to grow and I’m seeing more and more enterprise customers interested in directly downloading our data, instead of accessing it through the API.

We’ve used no paid advertising or other outreach. It’s all been totally inbound other than writing a bunch of answers on Stack Overflow related to how to find the country of a visitor on my webpage, how to get someone’s IP with javascript, and anything else relevant to my API. It got to the point where I could reach critical mass with people who had read the different posts and they’d link to the site in their own answers.

Could you tell me more about your stack?

There have been a few variations.

Initially, I had the Linode server. Soon after getting the CPU usage warning I added a couple of DigitalOcean servers, and used Amazon Route 53 for DNS to route to one of the servers using round robin. It worked reasonably well for a while, but adding new server required DNS updates, which take a while to propagate. If a server has any problems, it’ll continue to get traffic because of the delay.

Soon after, I moved everything to AWS, with the servers behind elastic load balancers so I could quickly switch servers in and out without any downtime. AWS scaling groups also helped automate this to some extent.

I setup servers in 3 regions (US east coast, US west coast, and Frankfurt), and then made use of Route 53’s latency-based routing to route to the lowest latency server, which helps to keep the API’s latency super low wherever you are in the world. I’ll also be adding servers in Singapore soon, to cut latency even further in Asia.

This setup worked well, but deploys were a huge pain. I’d need to spin up fresh new servers, deploy the latest code there, add the new servers to the load balancer, and then decommision the old ones. It was all scripted, but it still involved running a bunch of different scripts, and checking that everything worked as expected.

AWS does have CodeDeploy to solve this problem, but it’s not yet available outside of some core regions, which meant I couldn’t use it.

That’s why I switched to Elastic Beanstalk, which is basically a managed version of AWS. It creates almost exactly the same server setup as I had before, but deploying is now a case of running a simple Elastic Beanstalk command, and it handles everything for me.

One thing that has been consistent throughout is that each server can independently answer every API request, there’s no shared database or anything. Everything that’s needed for a request is kept in memory, so when a new server spins up, the requests can come in straight away. It’s super quick and well over 90% of our 250 million daily requests are handled in less than 10 milliseconds.

It sounds like you’ve had really good uptime. What sort of monitoring do you have to track that?

Primarily the AWS logs, and also Pingdom just to be safe. AWS has great metrics. There are load balancer reports that show your IP latency. They also show a summary of your requests broken down by 2xx, 3xx, etc. Assuming that Amazon is doing a decent job of keeping the load balancer up, you can see how many requests from the load balancer to my backend failed.

Also, I import our load balancer logs into like Redshift each day and generate a bunch of reports on that. I mostly try to drill into requests that failed. The main thing I’m worried about is not shipping buggy data.

Do you have continuous integration tests that run before you deploy?

The site gets re-deployed every day with fresh data, and we have a bunch of scripts that pull in the raw data, process it, check that everything updated properly, and then do the deploy to the 3 different server regions that we’re currently in.

Do you have any war stories of when the service went down?

Haha, all of the issues so far have been because of me. As I mentioned, we have checks to make sure the database that we generated isn’t corrupt when we deploy. Those checks have evolved over time to catch mistakes that happened before.

For example, one time a required input file was missing and the script generated an empty organization database, we deployed that, and then got a bunch of emails…

Over time, the integration tests have become much more comprehensive!

What are some use cases for others to build on top of the API?

Content customization. An obvious example is any e-commerce site like Amazon has different stores for different countries. If you know a German visitor is looking at books, you should redirect them to a .de site and show German language options.

Network customization. There are some very useful ad ideas (like being able to target T-Mobile and AT&T users differently) and I’m interested to see what else people could do with the data. For example, if a user’s on a slow mobile network vs. wifi. Maybe you would serve low resolution images or don’t show ads because they don’t convert as well.

Location mashups. If you have location-based data, you can mix it with the ipinfo API. For example, I see a lot of people integrating with weather databases.

Want API interviews in your inbox? Subscribe here