Concepts, techniques and papers to build a time-series database.
Below is a list of many concepts, books and techniques that helped us learn when developing S1search, the database engine that powers SlicingDice.
We all know that developing a database is one of the craziest things to do nowadays (although highly talented developers that have done it before recommend the experience). This happens as not only there are plenty of well established solutions in the market, but also because it’s really challenging technically speaking. So we are trying to give a little help here.
Important Concepts
Here we list some important topics about database architecture, consistency concepts and compression techniques. They aren’t ranked nor ordered in any kind of importance or priority.
— Architecture concepts
— Database-related concepts
- Column-oriented database
- Eventual Consistency
- ACID
- Shard
- Inverted Index and Posting Lists
- Bitwise Operations
- Linked Data Structure
- Binary Tree
- Bitmap Index
- Skip Lists
- Consistent Hashing
— Database compression concepts/techniques
Courses, Books and others
Below are some courses, books, videos and conferences related to database development (mostly time-series). They aren’t ranked nor ordered in any kind of importance or priority.
— Courses
- Stanford SQL Course
- In-Memory Data Management 2017
- CMU course on Advanced Database Systems
- CMU Time Series Database Lectures
- CMU The Databaseology Lectures
- Principles of Database Management
— Books
- Principles of Database Management
- Introduction to Information Retrieval
- Refactoring Databases: Evolutionary Database Design
- Joe Celko’s Trees and Hierarchies in SQL for Smarties
- Data Structures and Algorithms in Java
- Database System Concepts
- The Data Warehouse Toolkit
— Database related Videos
- Jens Dittrich Videos
- The Architecture of a Distributed Analytics and Storage Engine for Massive Time-Series Data
— Conferences
Papers, lots of great papers
Below are some database architecture and compression papers. They aren’t ranked nor ordered in any kind of importance or priority.
— Some Interesting Database Papers
- Optimizing Storage System Design for Timeseries Processing [PDF]
- The Design and Implementation of Modern Column-Oriented Database Systems [PDF]
- Comparison of Advance Tree Data Structures [PDF]
- Designing Fast Architecture-Sensitive Tree Search on Modern Multicore/Many-Core Processors
- Druid — A Real-time Analytical Data Store [PDF]
- Gorilla: A Fast, Scalable, In-Memory Time Series Database [PDF]
- Scuba: Diving into Data at Facebook [PDF]
- The MemSQL Query Optimizer: A modern optimizer for real-time analytics in a distributed database [PDF]
- Nitro: A Fast, Scalable In-Memory Storage Engine for NoSQL Global Secondary Index [PDF]
- Accelerating Analytics with Dynamic In-Memory Expressions [PDF]
- A Performance Comparison of bitmap indexes [PDF]
- Check Lucene design papers too
— Some Interesting Data Compression Papers
- Integrating Compression and Execution in Column-Oriented Database Systems
- Optimal Space-time Tradeoffs for Inverted Indexes
- Query Optimization In Compressed Database Systems [PDF]
- Compact Storage of Binary Trees
- Approximate Encoding for Direct Access and Query Processing over Compressed Bitmaps [PDF]
- An Accuracy-Aware Compression Technique for Multidimensional Data Cubes
- Optimizing Query Execution for Variable-Aligned Length Compression of Bitmap Indices
- Performance of Compressed Inverted List Caching in Search Engines
- Document Identifier Reassignment and Run-Length-Compressed Inverted Indexes for Improved Search Performance
- Factorization-based Lossless Compression of Inverted Indices
- Compression of Inverted Indexes For Fast Query Evaluation
- Check Lucene compression papers too
Other databases to look for inspiration
- Apache Arrow
- Apache Drill
- Apache Lucene (not a database per se, but great inspiration)
- Apache Parquet
- BTrDB paper
- Cityzen Data
- Druid
- ElasticSearch
- Infiniflux
- kdb+
- Pinot
- Pulsar
- Rocana
- SnappyData
- TempoIQ