Tables and objects and graphs, oh my!

In which I learn about software testing and different data structures, and write my first line (and subsequent thousand lines) of Scala in the 16 hours starting midnight, last Thursday. And that living a startup is living a “Swiss Army life”.

This is a story of startup life. I work at a New York and Montreal based startup, Datalogue. Our vision is to put data into the hands of people who need it, no matter the form, format, or content (want to learn more? sign up for a demo).

I am the Chief Strategy Officer of the company. My job description says things like “turn engineering ideas into saleable ones”, “validate those products with customers”, “think about the future”, “do law stuff” and “do strategy stuff”.

There’s a thing that is absolutely true in every single startup that I have come across, but is a story that you hear less often. Your job description, your fancy title, in startup life, that’s merely a “suggested role” — definitely not the whole thing. This holds for the co-founders, this holds for the first hires, this holds for nearly every one of the first couple of handfuls of employees you have.

If you are a backend engineer, you are also a front-end user tester. If you are a COO, you are also the person juggling API calls seconds before product launch. If you are the CEO, you are also the data wrangler and literal janitor. And no matter who you are, you pitch in with sales, with demos and with VC meetings.

Everyone does everything.

So, last week, we were ramping up to launch a Lite version of our product. A version that users jump in to and get a flavor of our product. In the rush to launch, I was yanked into a new role: test engineer.

This was something I wasn’t used to.


Now, my dad insisted that I learn programming as a kid.

When I was in primary school and wanted to learn my times tables a bit faster — we had arithmetic competitions and I was exceptionally competitive — my dad insisted that we crack open Excel and build a program that generated arithmetic tests, had a built in timer, and lit bit-art fireworks when I beat my high score.

When I asked my dad to teach me to code, he handed me The Little Lisper, installed a compiler on the family’s old computer (if I remember correctly, it was an Apricot) and told me to have at it.¹

When I was in high-school, I played with Arduino and Python.

But all of this was for fun. Now, it was business time.

Business time, in Scala.

Part of our product allows each data format² be translatable into an Abstract Data Graph (an ADG) and which can then be translated into any other data format.³ This is the “Get” stage of the Get Know Transform framework for data preparation. And this is how we feed our neural networks for the “know” and “transform” stages.⁴

My job was to think of all the edge cases that might be translated improperly into ADGs, specifically for CSV/XLSX files and for JSON objects.

Draw a maze

My first task was, then, to think of all of these traps that might trip the transformer up, and write files that would test the transformer’s response.

This required me to understand the strengths and weaknesses of each data structure.⁵ And of our ADG.

So I wrote tables with merged cells and gaps in them, with missing headers, with structure that had to be inferred from the contents of the cells.

I wrote JSON object with empty lists, with lists containing different “types” (and ones that would trip up the system even when we “up-typed” to the most expressive type⁶), and with way too much nesting.

I puzzled and pondered and figured out as many traps as possible for our poor transformer.

My job (for the night) was to design a maze filled with gotcha’s, to test if our system would fail.

And then I was done.

Draw the solution

And then I wasn’t.

Apparently tests aren’t interesting unless you can solve them.

So I had to draw all of the solutions to all of the tricksy problems I created.

Here’s a couple of fiddly ones:

On the left, a few ADGs drawn to test XLSX files, including tests to ensure that our translator can differentiate between worksheets with the same column headers, and sheets missing various data points. On the right, a JSON object with as many expressions of nothing as I could think of, which surfaced issues about how we would express empty strings, arrays with different types, empty array and empty objects.

I like representing graphs on paper, it’s like a map of my data

A visual map of my data.

Computers can’t read maps good

But apparently, the neural networks and I have different taste in how we like information. I like drawings of graphs. Neural networks want them all spelled out.

So, at 11pm on a Thursday night, I write my first ADG (my first line of Scala), and early Friday morning, I write my 60th ADG (and my thousandth line of Scala). Whilst listening to loud music in the depths of our office.

Loud music, Scala, and a notebook filled with maps of imaginary data.

Instead of a visual map, I wrote “directions”. Defining each node by its label and its content. Defining each edge connecting children to parents, and making data siblings. And defining the ADG itself, as the mix of both.

What started as:

The CSV test file

Ended up like:

The ADG, showing in red the places that might cause problems, like the repeated header.

and

Test, hope, fix, rinse, repeat

When we have the input files, and the output ADGs, all that’s left is to write the test: a program that takes the input files, sends it through the transformer and checks that the output ADG is what we expected.

And, of course, hope!

The true story of startup life

This isn’t a story about Abstract Data Graphs.

This isn’t a story about working hard late into the night and early into the morning to make sure that all of your tests are ready, so that you can be comfortable that your product is ready for release.

This isn’t a story of how to bring a little bit of order into a world with too many data formats.

This is a little glimpse at what startup life is really like.

There are no “non-technical” people in a technical startup. There is no-one that is not a salesperson. There is no one that is not going to have to drag his or her friends in to the office to test the product.

In a startup, no one is allowed to stay in their box. Everyone has to do everything, and no one is above any work. And that’s awesome.

Marginalia

Thanks to Tim Delisle and Nicolas Joseph for teaching me everything I know about ADGs and data structures. Learn more about Datalogue or sign up for a demo.

The beginning of a long night of ADGs.
  1. (I remember (when I was reading through the book) that I would (before hitting compile which would take ages) count (by hand) the (extremely nested (not kidding, extremely nested)) parentheses, to make sure there were as many “(s” as “)s”). 
    By the way, this is a list of lists, and would make an interesting ADG.
  2. Take, for example, key/value based data formats like JSON, tabular formats like CSV, relational databases, anything you can think of.
  3. The Abstract Data Graph is quite funky, as it is incomparably expressive, and quite efficient. Instead of displaying the whole database in a single graph, each “view” of the database (e.g. a row in a table, or an object in JSON) becomes one ADG, the group of which represents all of the information in the database.
  4. If you want to read a bit more about this, check out Our Story.
  5. This might be too much to get into exhaustively here, but, it is enough to understand that: tables are difficult because they don’t clearly separate data values from data labels; JSON is difficult because of its flexibility.
  6. E.g. {"array": ["0", 0]}.
This was the loud music I played whilst thinking abstractly about data in a room in our office way too late and perhaps a bit caffeinated. Made it a blast!