APIs to understand your customers and grow your business
Welcome to GET PUT POST, a newsletter all about APIs. Each edition features an interview with a startup about their API and ideas for developers to build on their platform. Want the latest interviews in your inbox? Subscribe now.
This edition, I spoke with Alex MacCaw, Co-Founder at Clearbit. Clearbit offers a suite of business intelligence APIs for tasks like sales lead research, outbound prospecting, and fraud detection.
You can easily POST an email address to the Clearbit API and get back a bunch of social and business data. I ran all 950+ GET PUT POST subscribers through Clearbit and it identified 135 engineers and 97 founders. Here’s where they live broken down by city:
Full disclosure — I’m an early investor in the company and worked on a similar startup that was acquired by LinkedIn.
What is Clearbit and how can developers use your API?
At Clearbit, we’re building a data backbone for business intelligence. Developers can use our products to discover emails and find person, company, and domain information. We have hundreds of customers and they use us mainly for lead qualification. That is going from an email to information about the person or from a domain name to information about the company. With all our information, you can figure out if a lead is actually valuable.
Can you tell me about some customers and how they integrate Clearbit?
Sure, here are some examples:
- Stripe uses us to research all their inbound signups. [I interviewed Stripe about their API platform in the last edition]
- Braintree uses Clearbit to help underwrite transactions. They’ll use our data as part of their risk modeling when new merchants are on-boarded.
- Asana uses us in a few ways. They query our API for new customers and take all the data and put it in Redshift. Clearbit’s integral to both the sales and marketing side of their business.
Do most people find you because they’re engineers who want an API or sales and marketing people looking for a solution?
I think we have quite a lot of penetration into the developer market, and I think the average developer, at least in SF, will probably know who we are. But, they’re not usually the people who have this lead qualification problem. When it comes to the sales cycle, our target persona is head of sales or head of marketing at startups with a lot of inbound leads.
Down the line, as we launch more and more APIs, I think our target will be increasingly developers. Although we’re an API-first company, developers haven’t been as much of a focus. I think that’ll happen more in the future.
I know you use Clearbit for your own sales. Walk me through how it helps there.
We mainly do reaction sales — sales kicks in when we get an inbound lead.
As soon as someone signs up, we put their information through Clearbit, and we’ll give them different drip campaigns and marketing emails dependent on who they are. Because we target different typical buyers, we have to really customize the experiences for the different verticals. If you’re a salesperson or a marketer rather than a programmer, you’ll get different emails when you sign up. A programmer who signs up for Clearbit will get code in their first email, but sales and marketing won’t.
That’s another example of dogfooding our API. We had all these rules in Customer.io to try and determine who’s a developer or who’s in sales and marketing based on their social data. [Check out this great article from Customer.io] We ended up just extracting that and exposing it through our API.
For high-value customers, we’ll reach out directly and not send a drip campaign or use automated follow-ups. Our sales team will go through all the high-value leads and write a customized email to these people. They’ll look at the company, try to figure out a good use case, and put that in the email. This really helps start a dialogue.
For us, the key thing is to get people integrated. When it comes to churn metrics, we get significantly lower churn once customers are more deeply integrated (whether that’s a CRM plugin or code integration). And often, they will need a bit of hand-holding along the way. We have very specific verticals, like our Salesforce integration with a separate sign up channel through the Salesforce AppExchange store. That also triggers the higher-touch sales process.
How big is Clearbit?
API requests per day is a good metric for us. We’re doing about six million API calls a day. And that doesn’t include requests to the Logo API that are cached at the CDN.
We’re profitable. I think that’s an interesting metric for this city. I actually think this is the first time we’ve ever talked about it, but I think that’s okay. We’ve gotten an incredible amount of work done with a small team. As you add more headcount too quickly, you can create a communication breakdown to some degree.
You have some interesting free products. How did those get started?
That’s right and they don’t even require a Clearbit account. They’re our way of giving back to the community, but they do help us as well. They certainly help us retarget and acquire new customers.
We have two free APIs that drive a lot of integrations:
- The Logo API, which can turn any domain name into a company logo. Segment uses this to automatically pull in your company logo during signup.
- The Autocomplete API, which can autocomplete company names. When you’re entering your company name in a signup form or you’re entering the name of a new lead into a CRM, you can use our Autocomplete API there.
What stack are you built on?
It’s a custom stack using CoreOS on EC2. And then, sitting on top of that is Ruby with Sinatra and HAProxy. And then the database backing everything is Postgres.
As a request comes in, it goes straight to AWS Elastic Load Balancing and SSL is terminated. Then it will go to HAProxy and HAProxy knows about all the other nodes and it’ll fan out requests.
CoreOS is a stripped down operating system that just runs Docker containers. It comes with this tool called etcd, which is the distributed key-value store and that’s key to our infrastructure. We write configuration variables in there and when new services come up and down, they’ll just edit etcd. HAProxy’s listening to etcd and will reload its configuration when services change. Going a step further, the communication between ELB and HAProxy is configured via etcd. When HAProxy comes up and down, the ELB is notified of that and can start sending traffic.
The nice thing about this is that we can lose machines without it being too much of a problem. I always liked the Netflix Chaos Monkey script, where they would terminate servers with no notice. I think we would fare pretty well if we started that script in our servers. When we do lose a server, then etcd and Fleet just reschedule the services on another server that boots up quickly.
How do you queue requests?
We do a lot of queueing on the backend. We use Sidekiq, which is a Ruby messaging queue backed by Redis. Sidekiq is very battle-tested and works extremely well for us.
We do a lot of parallel processing. For example, when we’re processing an incoming domain that we haven’t seen before, we’ll hit maybe ten or twenty different services at the same time. We’ll do a crawl of the website, we’ll pull out SSL certificates, we’ll look at the SEC records of the company, etc. It all happens within a second or two because of parallelization.
Another nice thing about CoreOS being so resilient is that we run a large amount of our queueing infrastructure off spot instances. This considerably lowers costs. [Read more about how Clearbit scaled in this StackShare post]
We have redundancy, but only so much redundancy. If the bid price you bid for a Spot Instance gets overtaken or outbid, then you lose your Spot Instances. It’s worth having Spot Instances in various AWS regions to have a little bit of redundancy when it comes to the bid price.
Do you know roughly how many different websites or services you’ll check given an email address?
Including all of the additional data from the company and their website, it’s quite a lot. Nearly a hundred different data sources. That’s actually one of the hard parts of scaling this kind of business is that you do rely on external services for some data.
And over time, we’ll take various resources and make sure we don’t rely on them in real-time. So, for example, the SEC database, EDGAR, we index that in Elastic Search so it’s all on our own infrastucture. It’s updated once a day and we don’t have to worry about doing that in real time.
How long do you cache data? If I sign up for Segment and then go and sign up for Asana, how many requests are you doing each time?
We’ll cache data for 30 days, and then we’ll refresh it. At this point, most company data is served off cache from the past month. We’ve seen about 30 million companies in our index. The nice thing about doing it this way rather than with a static database, is that the data is fresh and gets updated regularly.
Do you use any human analysts for collecting data?
Everything at Clearbit is a different API, so we have about 60 internal services that do things like sending webhooks or authentication. Of course, there are public-facing APIs like the person and company APIs.
There are a few things we haven’t found a good way of automating yet and we’ll do semi-automation with Amazon Mechanical Turk. One of the internal APIs is called the Turk API. You post a task that gets sent to Mechanical Turk and it returns a webhook back when there’s a response. We’ve used this pretty successfully to backfill spotty data sources.
This is still something I’d like to automate ultimately. For example, you could use a NLP algorithm to extract funding data from the news reports and we’ve been running a few experiments.
All our category data, for example, is extracted from the homepage text using a neural net. We get a threshold when we categorize companies, so we can have someone go through and recheck the ones that we’re not sure about.
What are some things that you want people to build on top of Clearbit?
I want more CRMs to integrate Clearbit in useful ways. It’s just a no-brainer and can make a CRM 10 times better. When you enter a new lead, all the important information can be autocompleted without any manual research. Then, using Clearbit’s data, you can find similar companies and specific targets, including contact information. Your CRM kind of starts to become this living, breathing thing that is enriched with data and becomes a lot more useful.
It’s just a no-brainer and can make a CRM 10 times better.
Companies have a idea of their target customer, but it’s generally an educated guess. It’s based off their memorable successes and failures. It’d be quite interesting to actually take a more scientific approach to it. You could run a company’s customers and won/lost deals through Clearbit. With some really intelligent data around your target customer, you programmatically generate new targets.
We haven’t productized it, but we do this internally with a large SQL query. It gives us some interesting info about what kind of deals we should prioritize. For example, we’re much more likely to win deals for companies who have raised less than $20 million.
Tell me about some upcoming products.
We have two really awesome products that are about to be launched.
We have Connect, which is our version of Rapportive. It’s essentially pulling in all our information inside Gmail or our person company information. One of the reasons we built it is because it helps us identify bad data. When someone sees something bad, they can flag it and update it. This allows us to fix our data in batches using the crowd.
We’ve got about 7,000 people using the beta right now and they’re really enjoying it. [This section is out of date. Clearbit Connect launched last month and now has 15,000 users.]
The other product I want to mention is the Risk API. Clearbit is building a suite of these APIs that tackle both sales and risk use cases. We’ve dabbled with this a little bit with the Watchlist API, but the Risk API is going to be really useful for people who have problems with spammy signups.
If you run a freemium service, some of those free users can actually cost you money. Maybe they sign up to your service to send spam emails or to test fraudulent credit cards.
If you run a freemium service, some of those free users can actually cost you money.
The Risk API is a really lightweight solution for a first barrier to fraudsters. You can stick it on your signup form and it’ll ping Clearbit with a user’s IP address and email. We have so much information on invalid emails and other social profile details that we can generate a risk score.
We can detect that the email has been flagged in the past or it belongs to a really dodgy ISP or the social profile doesn’t match the rest of the information or maybe they’re using a proxy. We can organize all of this risk data and help you fight spam!
From sales to marketing to fraud detection, Clearbit structures the torrent of online data exhaust. I think we’ll see more companies that build their product around internal and publicly-exposed APIs. All the pieces are composable and it’s easier to quickly iterate with customers.
Want the latest API interviews in your inbox? Sign up here.