Why we built DataFire

Bobby Brennan
DataFire.io
Published in
3 min readDec 15, 2015

One of the great things about being a software startup today is the overwhelming abundance of SaaS products itching to take work off your plate. The majority even offer free tiers while you’re in development/beta mode, with the hopes of locking you in as a customer in the event your product takes off. Here’s a (truncated) list of some of the services we’ve used in our products so far:

  • Firebase (user management and BaaS)
  • Stormpath (user management)
  • Google Analytics (web analytics)
  • Mixpanel (web analytics, A/B testing)
  • MailChimp (mailing lists)
  • Mandrill (transactional email)
  • GitHub (code hosting, static sites via GH pages)
  • AWS (EC2, ELB, CodeDeploy, Route53)
  • MongoLab (MongoDB hosting)
  • Trello (issue tracking for sales and engineering)
  • Salesforce (lead tracking)

One of the drawbacks to using all these third-party services, however, is that you now have valuable data stored in many different silos. Furthermore, the data in different silos need to be coordinated - new Firebase users should get added to MailChimp; the most active Mixpanel users should become Salesforce leads; GitHub issues should map to Trello cards.

We quickly got sick of manually copy/pasting between these services, and were not happy with the complexity being introduced to some of our most essential server code. Take, for instance, a typical signup flow:

app.post('/signup', function(req, res) {  Database.users.add(req.body, function(err, newUser) {
if (err) return res.status(500).send(err);
MailChimp.addUser(newUser.email, function(err) {
if (err) return res.status(500).send(err);
Salesforce.addLead(newUser, function(err) {
if (err) return res.status(500).send(err);
res.send("Success!"); })
})
})
})

This, of course, is an oversimplification - formatting user data to create a new Mailchimp recipient and a new Salesforce lead requires a bunch of extra data munging. Regardless, we’ve introduced two unnecessary points of failure - MailChimp and Salesforce. If either responds with failure, the first thing our new user will see is an error message. If their APIs are moving slowly at the moment, so will our signups. We could do some fancy stuff to parallelize these requests or hide them from the user, but no matter what our super-critical Sign Up endpoint will depend on these sub-critical tasks.

Searching for a Solution

Our first approach was to use Zapier to automate some of these tasks. One zap - pushing new Firebase users to our MailChimp mailing list - worked well, but it quickly became clear that Zapier was not a cure-all. For one, if the information in Firebase changed (e.g. the user adds their full name), Zapier would ignore it, so the two services quickly fell out of sync. For another, there was no way to orchestrate several APIs in one zap - e.g. only adding a user as a Salesforce lead if their usage in Mixpanel was above a certain threshold. And anytime the triggering conditions became even mildly complex, I ran into a wall and began to wish I could just write a little code.

The next idea was to use crontab - a decades-old Unix utility for scheduling repeated scripts. But provisioning a server and monitoring these scripts quickly became unwieldy, so we decided to build crontab for the web.

Enter DataFire

The idea was simple enough - connect the accounts you care about, then write a bit of JavaScript to call the APIs you need. We’ll take care of all the low-level stuff - authentication, formatting requests, etc - you just tell us what endpoints you want to call with what parameters.

In particular, there were three major pieces of functionality we felt were missing from existing products:

  • The ability to write code. While mapping data inside a pretty UI is approachable for spreadsheet-savvy professionals, there are some tasks that only code can solve, and many tasks that become unwieldy without it.
  • The ability to call 3 or more APIs in one flow. So many usecases fall to the wayside by limiting users to two API calls. Maybe I don’t want to just blindly retweet any mention of my company - surely it’d be better to use an NLP API to make sure it has positive sentiment and no curse words first. Or maybe I’d like to aggregate usage metrics from Mixpanel and Mandrill before inserting a new lead into Salesforce.
  • The ability to schedule or run on demand. Most existing services expect your job to run continuously - every 5 or 15 minutes depending how much you pay. But maybe some jobs only need to run daily, and others need to run every minute - with DataFire you only pay for what you need.

And so, DataFire was born. In another post we’ll go into specific use cases, along with how we’re using it in production, but for now you can check out a list of pre-built Dataflows or jump into the tutorial to learn how to build your own from scratch.

--

--

Bobby Brennan
DataFire.io

I’m a Software Engineer, specializing in Dev Tools, NLP, and Machine Learning