Why We Invested in TileDB

Abhishek Sharma
Conversation with Nexus
2 min readMar 15, 2018

--

We connected with Dr. Stavros Papadopoulos, Founder & CEO of TileDB, Inc. while he was considering spinning TileDB out of Intel Science & Technology Center for Big Data at MIT. His mission of “making access to massive amounts of array data fast and easy” instantly resonated with us and we knew we had to partner with Stavros. Nexus Venture Partners recently co-led the seed round of TileDB, Inc. that enhances, maintains, and commercializes the open-source scientific data management system, TileDB. The company will also provide tools to manage TileDB clusters as well as popular interfaces.

Data generation is exploding inside enterprises, presenting challenges around how to optimally store, manage, retrieve, and prepare it in the appropriate format for high-performance analytics and data-science applications.

The enormous volumes of data arising from scientific and engineering applications can be naturally represented as multidimensional arrays. These arrays could be sparse (when most of the elements are empty) or dense (when every element has a value). Example datasets include gene sequences, satellite images, medical images, geo-locations, LIDAR data, social media interactions etc.

A single universal storage manager that is optimized for both dense and sparse data forms the core of TileDB’s data management system. It allows cross-platform use across Linux, macOS, Windows, as well as Docker containerization.

TileDB focuses on maximizing performance of large-scale analytics applications. It ensures data-scientists and developers get complete flexibility on choice of the backend, interface, application-type, data-type (sparse/dense, the dimensionality of arrays, schema) etc.

TileDB works with a growing set of storage architectures including HDFS and object stores (Minio, AWS S3, Ceph). Specifically, TileDB offers optimizations tailored to the mechanics of each storage backend, while exposing a unified API that works seamlessly irrespective of where the array data is stored. TileDB interfaces with a growing number of query languages and tools, including C, C++, and Python. Contrary to conventional approaches that address storage management as specialized packages tied to specific applications and data formats, TileDB eliminates the need to use different custom storage managers, providing excellent performance and increased productivity.

It offers massive parallelization, and in its early deployments, is delivering orders of magnitude better performance vs. as-is alternatives. Data-scientists will benefit greatly from this unified model for storing and working with large sparse as well as dense arrays.

We, at Nexus, have been long-time believers in the value of data and emerging opportunities in data management. TileDB will power many applications that require dramatically faster compute over a colossal amount of data. We’re very excited to partner with Stavros, Jake, and Tyler in the TileDB journey!

--

--

Abhishek Sharma
Conversation with Nexus

early-stage enterprise software vc; managing director @nexusvp