Member-only story
Hands-on Tutorials
How To Structure Your Git Branching Strategy — By A Data Engineer
Data pipelines require version control too!
If you’ve ever dealt with code collaboratively, you’d understand the importance of version control and branching strategies. These are the key tools that allow multiple developers to work on a project in parallel. Without them, your product is very likely to break.
For those who don’t understand what version control and branches are — In a summarized explanation, version control is the practice of managing changes to your source code. It allows developers to clone, work, and deploy code without interfering with other developers’ work.
Branches are simply versions of your source code. It is useful in separating code that is currently in development and actual working, stable code for production environments.
You’ve heard of the DEV, UAT, and MASTER branch for software engineers and developers. But have you ever come across a branching strategy for Data Engineers/Data Scientists?
Instead of a product, Data Engineers and Data Scientists build and maintain data warehouses. Data Scientists do build data products but are often not able to do so before establishing a stable data warehouse to gather data.