Databases has been a topic that we wanted to tackle for a while. We are not talking here just about relational vs unstructured, or just types of databases. The focus of the roadmap is data — how do we store and retrieve data, especially when we are talking about bigdata.
Life is easy at the beginning of each project. The data can be saved in almost any storage without having much to think about it.
But once the project is in production, we start to understand the real dimensions of the data. Then we need to sit and rethink — where to store the data, what transformations need to be done, and how to retrieve the data for the application in a timely fashion.
The classic dataflow is as follows:
All data is received by some API gateway. The data is then sent to a queue (Kafka, Pulsar …). From the queue, the first step is to save the data in a datalake. The datalake will be our source of truth for all data. Of course, within the datalake we will have different layers of governance (bronze, silver, gold). From the lake we will then build a BI layer to allow of unstructured/structured analysis of the data. Finally, we will need to refine the data so that we can serve the data after it has been enriched and filtered.
Part of the roadmap was to go over the different types of databases. The idea was not to learn about a specific database but to understand the strengths and weaknesses of each database solution. Each database tries to solve some issue with either ACID or CAP THEOREM.
We went over the main questions that you need to ask when choosing a database:
- What is the structure of the data?
- Is the bulk of the traffic entering the data or retrieving the data (read vs write ratio)?
- How often does the data change (updates)?
- How much data do you need to store?
- Consistency restraints (is it a bank application, that must have transactions)?
Like every roadmap we start with an introduction to the topic. We then choose topics that we feel will help those with less knowledge on the topic an easy way in. So in the database field, we felt it essential to understand how are indexes created and used in databases. We then had a session with an overview of the different types of databases and the major vendors.
Anyone that saves data needs to model this data, so basic/advanced data modeling is a must.
We then had a panel where we brought different types of use cases so that the group can discuss different solutions for each case.
Since we believe in hands-on, to summarize the roadmaps we have a workshop on elastic search so that people can get a feel of the theory.
* Database Roadmap Introduction
* Introduction to Indexing
* Databases Overview
* Data Modelling
* Database Panel