Scaling BI with Custom Visualizations

ajo
4 min readDec 1, 2017

--

Even with the latest in memory cube technology, your dashboards aren’t snappy like Netflix.com or Google.com. Of course during the proof of concept stage, the BI vendor showcased his product in the best light. You were smart though and built real world dashboards to stress test the product. Everything looked great at first but your dashboards soon degraded to slow load times and the exact opposite of a snappy user experience.

This happens to every BI team. Time degradation of performance is really difficult to anticipate.

In the original test you had anticipated a handful of dashboards driven by even fewer in memory cubes. Your test may have looked something like this:

In Memory BI Architecture Proof of Concept

However, overtime as you succeeded demand also increased. Your production environment now looks something like this:

Successful deployment of In Memory BI Architecture

Many dozens of cubes now power hundreds of dashboards. It’s wonderful to have such high utilization, but you will eventually see your MAU % (Monthly active users as a percent of total possible users) declining. Your platform goes from sort of snappy to several minutes to load.

At some point you will have more data volume than available memory space. Memory is cheaper but it's not free or even as inexpensive as disk. This means for many users, their cube won’t be loaded into memory at request time. The time to load a large cube into memory could take tens of seconds to minutes. We recently replaced an in memory powered dashboard with a Custom Viz. The in memory, off the shelf, dashboard used to take 1 to 2 minutes to load. The custom viz on the other hand is instantaneous.

Custom Visualizations gives you many architectural options. You could store your viz data in a separate data store, it can ride in the same instance as the viz, or it can use newer open source in memory stores like Druid. Each option has variable costs and benefits. I’ll go into them in detail in another post. Here I’m going to propose the mode that is easiest to deploy and will work in the majority of cases: the data store rides with the viz on the same machine instance. Our new architecture might look something like this:

BI Architecture with Custom Visualizations

Notice I kept the traditional BI server in place but with fewer cubes. As I’ve argued before, custom visualizations might only represent about 30% of total visualizations/reports served, but it will command higher MAU’s and MAU %.

So, does this cost more?

Let's play out the extreme scenario where everything is a custom viz, the total volume of in memory data is 400GB, and the average cube is 10GB. Forty constantly running machine instances ( 400GB / 10GB per Viz = 40 ) should cost a lot more, right?

The best AWS option to house 400GB of data in memory on a single BI server is the memory optimized r4.16xlarge instance with 195 ecu’s and 488GB of total memory. At $4.742 per hour and 750 hours per month that comes out to a monthly total of $3,556.5.

For our Custom Viz, I can certainly go with very small instances with 2 ecu’s, but we should allocate equivalent compute power. As a result, I’m choosing the r4.large instance with 15.25GB of memory and 7 ecu’s. At $0.148 per hour and 750 hours per month times 40 instances we arrive at $4440.0 per month. It only costs 25% more in this extreme scenario. In reality I would choose much smaller instances for my Custom Viz with fewer ecu’s resulting in lower monthly costs. Even still, my 40 instance scenario affords me 610 GB of total memory capacity which is 25% more than the single instance scenario. On a per GB of memory basis the price is practically the same ($7.28 per GB of memory).

You're probably thinking that managing 40 instances will be insanely costly or difficult. In 2017, you can run 1000’s of instances with a few clicks. I would be surprised if you weren’t already running 100’s of machines to power your Big Data infrastructure alone. This would mean you already have the necessary automation and orchestration capabilities. Congratulations, if that describes you. If not, now would be a good time to think about moving your infrastructure to the cloud.

Finally, you can certainly deploy this architectural strategy with classic BI tools, but I would imagine the licensing costs would make it cost prohibitive.

Remember to share this post or give me a clap when you celebrate your first Custom Viz launch!

Ajo Abraham is a big data expert known for building beautiful, fast, and high impact visualizations. For consulting requests you can email him here ajo@veroanalytics.com

--

--