SlicingDice Architecture & Features (and competitors comparison)

Published in

SlicingDice.com Blog

9 min readApr 5, 2017

This post will give a 30,000-foot view of the most important SlicingDice’s features and architecture definitions and also make comparison between well-known amazing analytics data warehouse solutions, such as Amazon Redshift, Google BigQuery, ElasticSearch Cloud and Keen IO.

TL;DR

If you are looking for the features comparison, here is it.
Scroll the page to find all comparison’s details and considerations.

For better visualization, please download the PDF version. [1] — Increasing Redshift capacity; [2] — Increasing ElasticSearch capacity

SlicingDice Features

Forever free /test endpoint — do I need to pay just to test?

You don’t even need to have an account on SlicingDice to test our services.
You can play forever with SlicingDice using the demo API keys on our /test endpoint.

This means that while you are figuring out the data modeling to use for your database, you can use the /test endpoint without paying a cent.

Serverless — how should I dimension my “cluster”?

Forget about infrastructure provisioning. SlicingDice is a serverless solution, that means you don’t need to worry about infrastructure at all, it’s on us.
In order to use SlicingDice you simply need to send data to the /insert endpoint and later query it using the /query endpoint.

As we describe on the post about our history, we built this platform for you to slice and dice your data as easy as possible, so you can spend your time generating insights and value for you business, not managing infrastructure.

UNLIMITED data storage — how much do I pay for storage space?

Send as much data as you want for each entity, this will not affect your cost.
Due our strong compression and innovative pricing model, you don’t have to worry about how much data you are going to store on SlicingDice.
(Are we crazy? No, we simply fell in love with data compression!)

Up to 10 seconds, or it’s on us — how fast are the queries?

We have a public commitment to get any query response within 10 seconds, no matter its complexity, or you get 10¢ discount each time on your account.

If your query is slow we believe it’s our responsibility, not yours.

Highly available — how often outages happen?

As we describe in details on the SlicingDice Uncovered series of posts, we do everything possible in terms of infrastructure redundancy to NEVER lose any data or become unavailable.

We currently have 3 completely independent data centers from different providers in different countries that operate simultaneously in a high-availability configuration. That means that two data centers can fail and our service will continue to support data insertion and querying.

We invite you to check our Service Status Page and confirm our stability.

Streaming live data — can it handle my data volume?

Our infrastructure is currently dimensioned (March 2017) to support as much as 90 BILLION new data insertion operations per day.

So, yes, you can send your data to SlicingDice directly from the source producing it or batch load later, at any time. All data sent to SlicingDice is available for querying within 5 seconds after the API request was received.

Very easy to start — does it support complex tables and data types?

Although we dynamically identify the type of data you are sending to be stored on SlicingDice and automatically create the column you need, you can still model your data and create the column you need at any time.

We have support for many data types, and are always adding more.

Flexibility — does it support updates and schema changes?

Absolutely. Columns configured with the (default) option storage=latest-value may have values updated. As for schema change, you can create new columns at any time.

It seems absurd, but you can’t change a table schema on Redshift and BigQuery has some known issues with data update.

Simple to query — is the query language complex/difficult?

Making queries on SlicingDice only requires you to write a simple JSON, avoiding all verbosity and overhead SQL brings to your development.
We believe that making a query should be as easy as possible, and that we can’t expect everybody to be an expert on SQL syntax.

Take this SQL query below as an example:

SELECT STATE, AGE, MAX(PURCHASE_VALUE)  
FROM     
   (SELECT * FROM Transaction WHERE CREATED BETWEEN "2017-01-01T00:00:00Z" AND "2017-02-01T00:00:00Z") T1      
JOIN     
   (SELECT * FROM Customer WHERE AGE = 18 OR AGE = 19) T2 
   ON T2.ID=T1.CUSTOMER_ID 
GROUP BY STATE, AGE;

Below is the equivalent query using the SlicingDice API:

[
    {
        "state": 5
    }, {
        "age": 2,
        "equals": [
            18,
            19
        ]
    }, {
        "purchases_value": "max",
        "between": [
            "2017-01-01T00:00:00Z",
            "2017-02-01T00:00:00Z"
        ]
    }
]

JOINing data on queries— can it JOIN multiple data types?

As we also describe on the post about our history, one of the main motivations for building S1Search (our own database engine) was our strong necessity for JOINing time-series and non-time-series data in a single query.

As you know, most databases are either optimized to hold either time-series or non-time-series data, and that these optimizations lose effect when mixing both types of data.

On SlicingDice you can easily make a query that JOIN two (or more) completely different data types, like time-series and non-time-series, something that is really “expensive” for databases in general.

Data access permission — can I create role-based access to data?

It’s your data, so you must have the ability to define in a fine grained manner who can have access to it and for what purposes.

Using SlicingDice Control Panel you can create as many custom API keys as you want, which can be configured to allow inserting and querying over specific columns from a database. For instance, you can create a custom API key allowed to work with columns visited-pages, age and gender, but forbidden to query username and address.

Write once, use everywhere — can I save my most common queries?

Wouldn’t it be cool if you could create a query just once and use it everywhere without typing it again and again? What if, in the case you had to update the query command, all locations using it would automagically see the most up to date version of your favorite query?

That’s exactly what you can achieve using the /query/saved endpoint.

Instead of re-writing the entire query whenever you need to analyze your data, simply save it and call it by name. You can also optionally define a cache period for that query, so SlicingDice brings results much faster.

Even better, saved queries can also be created, edited and deleted directly from SlicingDice’s control panel, where you can write queries and check its results in real time.

Scoring analysis within the database —can it do any other thing?

Using the /data_extraction endpoint along with the score query type, you can retrieve your data back while also receiving a score indicating data relevance according to statements on the query command. For example: you can make a query to get the score for all your users based on how many times they clicked on “add to basket” button but didn’t complete a purchase.

Moving out from SlicingDice — is there any supplier lock-in?

You can leave us at any time. There is no upfront commitment to use SlicingDice and you can simply delete a database when you wish to.

If you also want to move all your data out of SlicingDice, you can use our /data_extraction endpoint to export it all, without any additional cost.

We are far from being perfect — what are you not telling me?

We are not looking to be a one-stop-shop analytics database that supports all possible requirements. We simply want to be the simplest, fastest and cheapest solution for anyone that needs to store and query analytics-related (and time-series) data.

We don’t like nor try to hide what we really are. There are many things that we are really good at, but also things that we are not and you should be aware of all that before deciding to use SlicingDice. That also includes checking our current restrictions too.

Competitors Comparison — April 2017

Okay, fight time. Let’s compare SlicingDice features against well-known amazing analytics data warehouse solutions, such as Amazon Redshift, Google BigQuery, ElasticSearch Cloud and Keen IO.

In order to make this comparison as independent as possible, we will be using the data from multiple existing public comparisons between these services to fill this comparison table. All sources are also listed below.

Important notes:
1. We tried our best to understand our competitors features and make a fair comparison across all of them. We must say it isn’t a trivial task, so keep in mind we might have incurred into over-simplification errors. Let us know in case you disagree with the information we provide on this comparison.
2. We are not really comparing apples to apples here as Amazon Redshift and ElasticSearch Cloud are not serverless solutions and Keen IO is not a database.
3. Except by Keen IO, all solutions have more features and capabilities than SlicingDice, such as SQL support on Amazon Redshift and Google BigQuery.

Sources:
— How good is Google’s BigQuery as compared to Amazon’s Redshift?
— The Hitchhiker’s Guide to Redshift — part 1: With great power come performance issues
— Redshift v. BigQuery: Similarities, Differences and the Serverless Future?
— Which data warehouse should you use?
— Redshift Documentation and Redshift Pricing
— BigQuery Documentation and BigQuery Pricing
— ElasticSearch Documentation and ElasticSearch Cloud Pricing
— Keen IO Documentation and Keen IO Pricing
— ElasticSearch Documentation and ElasticSearch Cloud Pricing
— Redshift Limits, BigQuery Limits and Keen IO Limits

Price Comparison

Comparing the features alone will not give a complete picture of what is the best solution for your necessity, so we strongly recommend that you also take a look in the price comparison we made for the same services.

Another good feature comparison

The team from Panoply IO did an amazing job comparing Redshift and BigQuery multiple aspects (in July 2016) and also concluded that:

“On almost all fronts we found Amazon Redshift to deliver superior results. Significantly so for usability, performance, and cost for almost all analytical use-cases, especially at scale. And yes, at a glance there are apparent complexities to Redshift, but what it surrenders in terms of simplicity it gains in terms of functionality.”

Final Considerations

If you have a reasonable volume of data, say, dozens of terabytes that you rarely use to perform queries and it’s acceptable for you to have query response times of up to few minutes when you use, then Google BigQuery is an excellent candidate for your scenario.
If you need to analyze a big amount of data (e.g.: up to a few terabytes) by running many queries — which should be answered each very quickly — and you don’t need to keep the data available once the analysis is done, then an on-demand cloud solution like Amazon Redshift is a great fit. But keep in mind that differently from Google BigQuery, Redshift does need to be configured and tuned in order to perform well.
Although ElasticSearch is very often used to store and query analytics-related data due its great aggregation capabilities, managing and tuning an ElasticSearch cluster can be a real pain, even using a cloud version.
As we said before, Keen IO is not a database nor have all the database capabilities from other solutions, although they are focused on providing API-based analytics platform to store and process event data.
Although Amazon Redshift and ElasticSearch are currently used by thousands of companies as data warehouses, the unique serverless data warehouse solution that competes against SlicingDice is Google BigQuery, as these other solutions are simply a cloud version of a server.

We would love to hear your opinion about this feature comparison. Please, if you have anything to say or to protest, let us know.

Still not sure if SlicingDice is a good fit for you?

Click here and schedule a 15-minute talk with our developers, totally free of charge, so we can evaluate your case together.