The Road to an Agile Data Analytics Solution
Like most tech companies who want to quickly grow and continuously improve our product and operations, we realized early on that we needed to be as data-driven as possible. That meant finding all of the best ways to store, collect and analyze data, and build an agile analytics solution that would grow along with us. In this blog post I’m going to share our decision-making process as we searched for the best systems in the market, and give some tips for anyone currently looking for an agile data solution.
5 years ago, Wibbitz was an early stage startup that faced the need to measure the performance, usage, and everything else that surrounded our product, but lacked the resources to build such a solution on our own. While Google Analytics was really built for website analytics, it offered a generous free tier and simple implementation — so we decided to use it not only for our website, but for our product, which at that time consisted of a web application and a video player. This required some tweaking in how we measured and reported the data.
With the fast growth of the company, we quickly reached Google Analytic’s free tier limit and decided to go with the fastest upgrade, GAP (Google Analytics Premium). During this shift we also redefined most of our events using custom dimensions and metrics.
The goal of data is to help different teams in the company make the right decisions — so having a clear dashboard and easy way to query data is critical.
We noticed we were working too much on adjusting our processes and defining our events to fit Google Analytics constraints. Custom dimensions and metrics are hard to query by and limit the ability to filter reports. You also can not add calculations on dimension or metric columns. Other issues included sampled data and the inability to query real-time data.
Choosing the best agile solution
We decided to look for a different solution and defined the characteristics we were looking for:
- Agile system that can adapt to company’s changing needs
- Handles over 3 billion events per month
- Lets us easily share data internally and with customers
- Connects data from multiple sources
- Stores data for up to one month and have previous data aggregated
- Events available in under 5 minutes for querying (useful when monitoring gradual releases)
- Events have standard enrichment (Geolocation and Device) and user session information
- We own the data
- Easy for our team to implement
- Affordable pricing
Analytics solutions consist of the following subsystems:
- Data pipeline solutions
- Data warehouse solutions
- Analysis & presentation solutions
Agile analytics solutions have loosely coupled subsystems, and are able to support many protocols and integrations.
The advantages of having an agile solution:
- It is easy to replace or upgrade every subsystem separately.
- You can have different presentation tools for different roles in your organization that connect to the same data warehouse.
Data pipeline solutions
Data pipeline solutions move and process data from different sources to your data warehouse. Here are some of the solutions we looked at:
- Used as a hub that can push your data to many services
- Based on specific DB schema but can be customized
- Data stored in AWS S3 and can be replayed. No pause/play support
- Data stored in AWS S3. Support automatic pause in defined cases which can than be replayed.
- Code engine supports real-time calculations, alerts and data enrichment.
- Available client SDKs
- Requires hosting the server
- Realtime loading of data into Elasticsearch, 15–60 minute loading into Redshift
- Available client SDKs
- Data hosted and managed by Panoply utilizing storage/model optimization algorithms
- Data stored over Elasticsearch, Redshift and AWS S3
- JS SDK or REST API endpoint for sending data with no data enrichment support
Data warehouse solutions
We were already using Redshift for our data warehouse solution. Another option was Google’s BigQuery, which might have lower maintenance requirements, but we decided to stick with RedShift to save money and resources.
Analysis & presentation solutions
We looked at the following solutions:
- SaaS. Gives the power to the data analyst building the views while view users have limited customization options
- Easy to share data and supports embedded views
- Price is calculated by the amount of connected data
- Great for data research — easy data visualization and manipulation using drag & drop
- Sharing views (not reports) requires the viewer to have an online user license
- Priced per user (Desktop & Online)
- Requires hosting the server
- Relatively high price tag
- Joining data from several sources requires use of one data warehouse
- No embedded views
We decided to base our system on Alooma’s data pipeline, Redshift’s data warehouse, and Periscope Data’s presentation solution with the option of adding some Tableau seats for our data analysts.
Alooma web based management interface is powerful and easy to use. With easy schema mapping, real-time data manipulation and enrichment, combined with a friendly and fast responding support team, it makes for a great solution.
Periscope Data proved to be great for sharing and embedding views, as it gives power to the data analyst building the views and dashboard and less customization options to the end viewer. Plus, views are fast to load due to an internal cache, so you can see when a view was last updated and perform an immediate update.
How long this setup is going to last? It’s hard to tell. But with fast implementation and the ability to upgrade each part separately, we have everything we need to get started.
If you want to get some more advice you can reach me at firstname.lastname@example.org or Twitter — @UriNM
Looking for a chance to overcome challenges with creative solutions just like this one, with a talented and results-driven team of developers? Wibbitz is hiring in our Tel Aviv office! Click here to apply.