Business Intelligence tools for startups

Ben Picolo
9 min readFeb 27, 2018

--

I’ve spent significant time investigating Business Intelligence (BI) tools for the startup I work at. There’s a huge mixed bag of services out there, and it took quite some time to find the right solution for our use case. This is a rundown of the landscape as I experienced it.

The Setup

Our application layout is straightforward. We have two separate databases — one for metrics data, and one for application data.

The important bit is that App/Metrics data are separate sources

We may well end up with other sources in the future, which is useful to think about when deciding on BI solutions.

When first deciding on tools, we didn’t have a separate metrics database. Adding another Postgres database made us test solutions a second time.

The startup is a B2B service. Compared to a B2C business, we have a small amount of customers. An individual user action is therefore more important. Aggregate metrics are less useful for us than the individual actions.

Given that, tabular data is most of what we’re looking to parse through. The datasets are small — you can browse through a whole week of some particular set of user actions. An excel spreadsheet is almost the dream for people who have worked in finance, which is a lot of our team. We also want to browse slice metrics by various date ranges and drill down on metrics given some field value.

From the engineering perspective, we want automatic updates so that data is always fresh. We also want emailed metrics summaries for people to browse over without diving too deep. We have several data sources to deal with, databases being the most important . Given this, we want a cloud offering. We don’t have a dedicated data analyst around to boot up and update these day-to-day.

As an early stage startup we’re also somewhat cost sensitive, so we want to make sure we’re getting the right price.

The Rundown

The BI considered here are: Klipfolio, AWS Quicksight, Google Data Studio, Microsoft Power BI, ChartIO, and Domo.

There are a lot of similarities among the services . All connect to a variety of data sources, including SQL Databases and SaaS providers. All offer more or less the same basic visualization options (charts and pivot tables and what have you). They also offer ways to share dashboards and data with colleagues.

There are enough differences that we were able to exclude most given trial and error, though.

Klipfolio

Klipfolio was the first tool we used when we were working with a single Postgres instance. It fulfilled our basic requirements well — it supports tabular data, automatic data updates, and the pricing is attractive. It was easy to setup. It connects to our Postgres instance and supplies a list of IPs to whitelist to allow their connections. (That’s ended up being standard among all services listed here).

Unfortunately, we started to run into some undesirable quirks. The data tables only support up to 200 rows, which isn’t enough for our use case. The ticket to support more has been open for several years, which we ran into with a few other similar issues we hit as well. Tables were, in general, not particularly performant, which is likely why Klipfolio put that limit in place. It’s possible Klipfolio will perform better for you if you’re not looking to go through tabular data.

We also found that, though you can schedule data refreshes, data only actually refreshes when a user is active on the dashboard. We want to land on a dashboard and see fresh, up-to-date data — but their refresh strategy prevents that. We don’t want to rig up some cron job to ping for updates, the updates we consider a crucial feature.

The last big issue is that modeling different date ranges for metrics required us to create separate data sources, which means more development time and more complicated dashboards because they need widgets for each date range.

Given that, we started searching for other offerings.

AWS Quicksight

QuickSight is a service I very much wanted to like. It’s in AWS already, so naturally we hoped it would integrate well with our infrastructure in general.

A big advertising point for QuickSight is SPICE. As far as I can tell, it’s a bag of advertising buzzwords that boils down to the ability to do fast processing on large data sets. Unfortunately, it’s also a vague topic — I see the marketing for it everywhere, but how do I make use of it? The first Google search result for QuickSight Spice is about how to manage capacity rather than how to actually use SPICE. It’s also the first link on the QuickSight welcome guide. As an first-time QuickSight user, SPICE felt very in-my-face, without actually providing useful functionality.

The first time we were searching for a service, QuickSight didn’t support basic tabular data at all, but rather only pivot tables. It also had a tendency to hit a variety of errors connecting to data sources (and otherwise generally moving around pages) and there were no error messages to speak of, so we couldn’t figure out what we were doing wrong.

When we added our second Postgres instance, we took another look at QuickSight which now supported standard tabular data. Unfortunately, QuickSight didn’t appear to give us any way to support cross-table joining of data (or even cross-schema, which Postgres supports by default). For that reason, it didn’t make the cut for take two either. Loading data into SPICE or similar may have fixed that, but overall the user experience here was weak, and we felt we could do better.

Google Data Studio

Google Data Studio is another one I wanted to like. Pricing is negligible — you mostly pay for the use of resources like BigQuery.

Unfortunately, when we first looked at Data Studio it didn’t support joining tables. The recommendation I saw at the time was to create specific database views. That was a total non-starter on the usability front. It supports custom queries now, but I don’t think that was the case when we initially looked. (If it was, I missed it and that would have made it suitable for our initial set of requirements). With our new requirement to support joining data across multiple database sources that still leaves a gap, though.

Microsoft Power BI

Microsoft Power BI is a service I heard a lot of good things about. It seems to have a separate offering for cloud-based BI and also a local application to build visualizations.

The cloud-based service doesn’t have the ability to connect to Postgres, and as mentioned earlier, we didn’t want to nurse a local application, so Power BI didn’t make it past our initial look. If you buy into the Microsoft ecosystem or are looking specifically for tools for use by dedicated data analysts, Power BI is probably more in your wheelhouse.

ChartIO

ChartIO came pretty close to being what we wanted. They support the required features like joins from multiple data sets in queries, tabular data, and can pull and join data from our various sources.

One thing that was awesome about ChartIO was that it only required a single IP to whitelist for Postgres. Coming from the other services that required 5–10, that was a bit of a breath of fresh air. Whitelisting addresses is only barely tedious, but it was nice to see anyway.

The main issue that prevented us from using ChartIO is that the visualization stage has limits on rows returned for queries. As far as we could tell, this meant for us that we couldn’t fully join across multiple data sources. If you have some data in one database you want to join against 120,000 rows of user data, that’s not possible with row limits at query time, so we seemed stuck there. ChartIO does seem to have a simple integration with Panoply.io to connect data for data warehousing, which likely would have solved this issue, but managing multiple services is something we were hoping to avoid.

Domo

Domo is the service we ultimately landed on, and for good reason — it combines both data visualization as well as ETL in one cohesive offering. For us, that made a huge difference in making the service work for us.

Domo takes data from your sources and loads it into a hosted, managed data warehouse. No need to create your own ETL solutions to get data all of your data into Redshift or some other single product on your own time. They price largely based on rows ingested, which for us works out favorably. They support all sorts of refresh scheduling to keep data updated day-to-day.

Because they build a single data warehouse, it was really easy for us to join multiple sources of data, and their visual ETL is a killer feature. (Other cloud data-warehouse products probably feature this as well I would think, we just didn’t try to utilize any others). Visual ETL is so much simpler than building ETL pipelines on your own time. This probably breaks down at high scale or doing more complicated data transforms, but for us joining and filtering is what we were looking for.

Domo also had the best support for data tables of the options we tried. The tables were quick and responsive, even for large datasets. They support slicing on date ranges (for any timestamp column in your dataset) on-demand, and they support drilling down on any row value we might be interested in.

Date range filtering option on all table views

Given that it met all the requirements we cared about, we’re currently sticking with Domo. In retrospect, the piece we really needed was ETL from multiple data sources, and Domo was the only service we tried that intended to do ETL for you all the way through to visualization. The user experience was also pretty exceptional the whole way through.

Final Thoughts

Definitely ask yourself if what you need is at least partially an ETL solution to combine your data into one place. That ended up being a huge driver in finding the right solution for us.

This is only a subset of available BI services— Tableau is a powerhouse, and Sisense is another I’ve heard good things about, among many others.

A big thing we learned is that a lot of these solutions are attempting to support larger enterprises rather than startups, which makes sense. Enterprises are more likely to have more data to care about, teams of people worrying about analysis, and extra people around to build a warehousing solution for all of your data, and are also likely to spend more money. But enterprise solutions weren’t what we needed — we wanted something especially straightforward.

There’s a lot of options in the BI world. When deciding on a solution, definitely give a variety of services a trial before declaring the right fit.

--

--