What’s New at DataFire

Bobby Brennan
4 min readFeb 4, 2016

--

At DataFire, our goal is to empower developers to monitor, sync, and interact with their data, no matter where it lives. We provide a thin layer on top of first-class APIs like Gmail, Github, Salesforce, and Slack, allowing you to write code around these services without worrying about implementation details like authentication and formatting. We want DataFire to feel like one giant client library for the web.

However, as a cloud-based service, DataFire really acts more like a framework than a library, so we’ve been busy adding features to make sure our developers can access their data not just wherever, but however they want.

External Libraries

A lot of JavaScript’s built-in APIs leave something to be desired. To help our developers conquer common tasks more easily, we’ve added support for two popular libraries: lodash and MomentJS.

Lodash simplifies a lot of the work that goes into munging data from one API into another. It provides a set of utilities that simplify common tasks like merging two objects together, and offers stronger support for functional programming. It gets rid of a lot of the boilerplate involved in data-processing.

MomentJS was another no-brainer. JavaScript’s built-in Date object is confusing at best, and Dataflows often need to track and compare dates/times in order to behave appropriately. Consider a common problem: get the date, 1 month ago, in the format “YYYY-MM-DD”. In vanilla JavaScript, this looks like:

var MILLISECONDS_IN_A_MONTH = 1000 * 60 * 60 * 24 * 30;
var today = new Date();
var oneMonthAgo = new Date(today.getTime() - MILLISECONDS_IN_A_MONTH);
var yyyy = oneMonthAgo.getFullYear().toString();
var mm = (oneMonthAgo.getMonth()+1).toString();
var dd = oneMonthAgo.getDate().toString();
var formattedDate = yyyy + (mm[1]?mm:"0"+mm[0]) + (dd[1]?dd:"0"+dd[0]);

Holy cow. That’s taken from the top answer on Stack Overflow. Here’s what it looks like with MomentJS:

var oneMonthAgo = moment().subtract(1, 'month');
var formattedDate = oneMonthAgo.format('YYYY-MM-DD');

The gain in productivity we’ve seen with these two libraries alone is huge. If there are any other libraries you’d like to see added, let us know in the comments!

Storage

One of the first walls our users ran into when writing large, complex Dataflows was dealing with enormous amounts of data. It’s easy enough to retrieve your last 10 messages in Gmail, but what if you want all of them? Trying to load your entire message history will cause your Dataflow to be pre-empted by the system for hogging memory, and the number of requests involved would quickly exhaust your quota. But what if we could crawl through messages, dealing with only 1 or 10 at a time?

We introduced the concept of Storage for this very reason - it helps you to keep track of state across different runs of your Dataflow. You can put up to 100KB of JSON-serializable data in the storage variable, and DataFire will save it.

For instance, this code will get the next page of results each time it is run, resetting to the first page if it gets an empty response.

Step 0:

function request() {
storage.page = storage.page || 0;
return {
page: storage.page++;
}
}

Step 1:

function request(data) {
if (!data[0].length) {
storage.page = 0;
} else {
// do something with the data
}
}

One way we use this in production is to power the running counters on our homepage. The “Operations Available” counter involves grabbing every document in our “Links” collection, getting the Swagger inside each Link, and counting the number of keys in the “paths” variable - an incredibly expensive operation for MongoDB to handle.

Instead of doing this counting each time the homepage loads, we have a Dataflow that iterates over Links in the database. Each time it runs, it grabs a new Link:

Step 0: GET /documents

function request() {
storage.counts = storage.counts || {};
storage.pageNumber = storage.pageNumber || 0;
return {
collection: 'links',
q: {public: true},
skip: storage.pageNumber++,
}
}

Then it counts the number of Operations in it, stores that count along with the Link’s ID, and patches a ‘totals’ document in MongoDB with the new total.

Step 1: PATCH /documents

function request(data) {
var countOperations = function(link) {...}
storage.counts[data[0].id] = countOperations(data[0]);
var totalOps = _.keys(storage.counts)
.reduce(function(prev, cur) {
return prev + storage.counts[cur];
}, 0);
return {
collection: 'totals',
query: {},
body: {operations: totalOps}
}
}

This Dataflow runs continuously in the background, so when we add a new public Link, the Dataflow quickly picks it up, stores it’s Operation count, and updates the counter shown on the homepage.

Extras

We’ve also added a couple minor features to help you maintain and debug your code.

console.log

This will print messages and variables to the GUI when you run your Dataflow in the browser.

GitHub Sync

If you visit the Settings tab on a Dataflow you own, you’ll see a new “Sync to Gist” option. Pushing your Dataflow to a new or existing Gist will take all the code you’ve written (along with some metadata about the Links you’re using) and push them to a Gist. You can then import that Dataflow by pasting the Gist ID into the “Sync to Gist” box in a new Dataflow. For instance, try pasting bb3e64c9cdef6e59f7cc into the Gist box here and clicking “Pull”.

This should help to keep track of changes, and allow you to revert any regressions in your code.

Globals

You can now share data, objects, and functions across Dataflow steps by using the global variable. For instance, if you write

global.isComment = function(item) {
return item.type === 'comment';
}

You can then call global.isComment() in any subsequent step.

Going Forward

We’ll continue to launch new features and capabilities to address our users’ most common pain points. If there’s something new you’d like to see, let us know!

--

--

Bobby Brennan

I’m a Software Engineer, specializing in Dev Tools, NLP, and Machine Learning