Data-lake Roadmap

About US

Chaim Turkel
Israeli Tech Radar
4 min readSep 3, 2020

--

I am part of an Software Expert company called Tikal (https://www.tikalk.com/). Tikal is very special company. The heart of our agenda is to combine technology with people and companies. We are experts in the open source community in multiple disciplines (back-end [data engineers, distributed architects and more], front-end, mobile, dev-ops).

I would like to share with you some of our processes to achieve this.

Tech Radar

As a Group Leader in Tikal, there are two main prongs that we work with. The first is to monitor the market. We need to keep abreast of the latest innovations in the field, the status of the different open source projects that exists.

For this we have a circle of people called the “tech circle” that meets every few weeks to go over what is new in the market. In addition every quarter we update our public tech radar (have a look at: https://www.tikalk.com/community/radar/)

Personal Roadmap

The second prong is the personal advancement of each employee. What this means is that we sit with each person and tailor a plan for his personal roadmap based on his knowledge and moving tends of the market. The roadmap includes personal learning of new material and applying it in the field. After the learning phase, we believe that each person should contribute some of what he has learned back to the community. This can be done by various paths, which include internal / external lectures, meetups, workshops and more.

Group Roadmap

Our latest addition is the group roadmap. In this circle we see what areas of expertise are missing or lacking in our group as a whole.

The current area that we detected is data. We have data expertise in our group, but we felt that the knowledge has not infused itself within the group. Since data is a very big word and the field is very wide, we decided to focus the first quarter on the data-lake.

Data-Lake Roadmap

Similar to the personal roadmap, all group leaders within the back-end sat together to see in what areas we should focus our attention on. We then set to create different areas of learning.

The first step is to create a mind-map with the items that we feel should be covered. As you can see below this includes the ingestion phase, file formats storage structure issues and more.

Once we decided on the topics, we then find the appropriate people from within the group to do the teaching.

The major platform for learning within the group is bi weekly and monthly meetings. We try to meet online once a week for an hour, and in person (until the corona virus) once a month for 3–4 hours.

To give you a taste of what we did, the itinerary for our roadmap was as follows

27.4.20 — Roadmap Intro

4.5.20 — Introduction to data engineering — Part 1

11.5.20 — Introduction to data engineering — Part 2

25.5.20 — Introduction to Data Lake

1.6.20 — Project sharing

8.6.20 — Big Data File Formats

22.6.20 — Apache Pulsar

6.7.20 — Slowly changing dimension

13.7.20 — Project sharing

20.7.20 — Data Lake Ingestion Architecture

27.7.20 — Project/Client Infra Sharing — BigQuery

3.8.20 — Being a Data Engineer — Part 1

10.8.20 — Being a Data Engineer — Part 2

As you can see we tried to mix a few dimensions to the learning. After the introduction of both the group roadmap concept, and what it means to be a data engineer, we dove into some technical talks. In addition we had a few people that are currently working on a data project, bring the architecture to the group for learning and review.

For the finale we had two sessions on the soft skills of the job, to understand what interpersonal skills are needed for the job specificity oriented on data.

Augment frontal learning with hands-on

We believe that frontal learning is not enough, so for a smaller group that wanted to deepen their understanding of the data world, we create a data-lake course. In this course we went over the ingest architecture, and had 4 hands-on workshop that included ingesting of tweets using flink, and the analysis via spark. We summed up the the course with learning and trying out going over the data with presto.

Summary

I have to say, we all were very happy with the results, and are currently in the process of doing a group retrospective of the process so that we can improve towards the next quarter.

To summarize the flow of knowledge between circles, we used the following diagram:

Our goal is to have all circles infuse each other, so that we can continue to learn new technologies and bring them to the companies we work with and the community.

--

--