Gearing up to go very fast through the clouds.

A Firebolt from the blue

Why we invested in Firebolt

Gil Dibner
Angular Ventures
Published in
5 min readDec 13, 2020

--

Last week, we were thrilled to unveil our investment in Firebolt, the world’s fastest cloud-native analytical data warehouse — the second time I’ve been lucky enough to partner with Eldad Farkash at the early stage. This week, we wanted to dive a bit deeper into the thesis behind the investment itself.

So what exactly is Firebolt?

Every once in a while, a company comes along that seeks to completely re-invent an existing industry. Firebolt is one of those companies. Firebolt’s ambition is to reinvent the way that databases are designed and managed — and an implication of this is that Firebolt will ultimately seek to reinvent the way that database infrastructure is delivered, priced, and sold.

Introduction to databases. A database is a collection of information (data) stored on a computer system. Databases perform a set of basic functions known as CRUD (create, read, update, delete). They also perform more sophisticated functions such as queries and transformations. Databases typically fall into one of two categories. Operational (or transactional) databases are used to support the day-to-day operation of applications. For example, when a consumer buys a flight online, that transaction is written into an operational database at the airline. Analytical databases are used to respond to analytical questions (post facto or in real-time). For example, if we wanted to figure out how many flights were sold in a particular month, we would query an analytical database that would read the data and aggregate it into the answer (in this case, the sum) that we are looking for.

Historically, the most common form of an analytical database has been a data warehouse. Companies have relied on ETL (extract, transform, and load) processes to extract data from different operational databases and applications, transform and merge data into a more usable form, and load it into a single unified set of analytical databases. These databases and the infrastructure that support them are typically called “data warehouse.” In practice, “data warehouse” is a term with many uses, but generally, it refers to data stored in analytical databases designed to make querying that data more efficient and less expensive — or at least that has been the goal.

Scale, Speed, Agility, Cost. Owners of data warehouses are engaged in a constant balancing act between scale, speed, agility, and cost. Typically an improvement in any one of those axes results in a degradation in the others. As datasets get larger, data warehouses become less agile, slower, and more expensive to operate. As a database grows, more storage is needed to store the data which is unavoidable and not necessarily very expensive. However, as the data grows, more computational power is necessary to answer the same query. As queries become more complex and varied (“how many flights were sold that arrived at Boston or departed from Boston that involved between two and four people but no children and were booked using frequent flier miles”), databases get slower and slower unless more computation power is added to be able to answer queries in a reasonable time.

The economics of cloud data warehouses are all wrong. With more and more data moving to the cloud, the economics behind these dynamics are getting more and more complex — and less and less manageable for enterprises. Cloud database vendors (Amazon Web Services, Microsoft Azure, Google Cloud Platform, Snowflake) are ultimately in the business of selling infrastructure (storage and CPUs). As a result, as databases get larger, they have no incentive to optimize the way a database handles queries. The result of this is that data management costs are skyrocketing — a trend with no end in sight. The cloud database vendors see themselves as being in the business of selling “iron” (i.e. infrastructure), as opposed to selling performance. In fact, their incentives are completely opposed to improvements in performance. If a given query were to be optimized, a customer would be able to spend less on cloud database services.

The cost of data ops. The result of this dynamic is that organizations large and small — whether or not their data is already resident on the cloud — are spending a tremendous amount of money on computing infrastructure to try to achieve high-performance agile databases — and it is not working. The talent required to optimize queries and database architectures in order to keep costs reasonable is extremely expensive. As data becomes ever more important to the enterprise — and as the sheer scale and volume of data increases — this problem and the associated costs are just expected to increase.

Enter Firebolt. This is where Firebolt has decided to attack the cloud database management market. Firebolt’s product consists of an integrated database with a query engine that compiles queries in real-time to continuously optimize queries on the fly. Unlike any prior database, Firebolt’s database is designed to monitor all queries in real-time and work behind the scenes to optimize those questions automatically ensuring that the database is utilizing the minimal amount of infrastructure possible to return a query in the fastest possible time. This approach allows Firebolt to charge for consistently outstanding performance as opposed to ever-expanding elastic infrastructure.

To deliver this, Firebolt has built its product with several design goals in mind:

  • Generic. Firebolt is designed to handle any type of data from structured to unstructured
  • Expressive. Firebolt will be able to handle any type of query from the most straightforward to the most complex
  • Fast. Firebolt has already demonstrated that it can achieve 10x-100x the performance of the nearest competitors — specifically Snowflake and Amazon Redshift.

The next era of cloud-native data processing. Firebolt’s thesis is that the latest advances in analytics database technologies should be leveraged in the cloud. That includes some of the newer advances ranging from data ingestion and storage to vectorized processing and cost-based query optimization; along with some of the older advances like indexing. A corollary here is that making it really easy — trivial perhaps — for any organization to provide many more users with blazingly fast performance on a much wider set of queries will actually increase the overall use of analytics itself, so that organizations get more value out of their data, and become more data-driven. In so doing, Firebolt increases the utility and value of data infrastructure itself. As a result, the market size for data infrastructure and the speed at which that opportunity can be captured can dramatically increase from where it is today.

Nearly everything in modern business is being captured in the form of digital data. That digital data is increasingly — overwhelmingly — in the cloud. The data pipelines and systems are in place to make that data accessible for analysis across the modern organization. The problem, however, has been that data warehouse infrastructure has not yet caught up. It is still one or two orders of magnitude too slow for data to actually be analyzed at scale, and at the speed of human thought. Enabling this is the fundamental value proposition of Firebolt — and it’s why we are so excited to be partners on their journey to reinvent the cloud data warehouse.

🔥🔩🚀!

--

--

Gil Dibner
Angular Ventures

A global venture investor. Fascinated by the finance of innovation. Trying to help the few to do the impossible. Investing across Europe + Israel.