How to Enable Your Data Analytics Team to Work in Parallel

Seven Steps to Implementing DataOps: Step 3 — Branch and Merge

In a previous blog we introduced DataOps, a new approach to data analytics, which can put analytics professionals in the center of the company’s strategy, advancing its most strategic and important objectives. DataOps doesn’t require you to throw away your existing tools and start from scratch. With DataOps, you can keep the tools you currently use and love. You may be surprised to learn that an analytics team can migrate to DataOps in seven simple steps. This blog entry is step 3 of 7.

Imagine a team of developers working on a common code base. The team members take a copy of the code they need and begin work on their respective enhancements. What if two developers make changes to the same file? How is that tracked? How are conflicts resolved? Software development teams use a tool called a version control system to manage the continuous changes that are being made to the code. We’ve discussed the advantages of simply using a version control in a previous blog. In this post we talk about the more advanced capability branching and merging.

If a developer wants to work on a feature, he or she pulls a copy of all relevant code from the version control tool and starts to develop changes on that separate copy. This copy is called a branch. When a developer works within a branch, any changes that he or she makes will not immediately impact the rest of the team. This is important because code is written gradually and it may not work at any interim stage. A branch also insulates a developer from changes made by other team members and gives a developer control over when to integrate changes from other team members into his/her own work. A merge folds the code changes in a branch back into the main code base.

Branching and merging allows data analytics professinoals to work in parallel

It is common to create a branch for each new feature being developed. A feature branch provides a single place to keep all of the file updates that are related to a particular feature. When the feature development is complete, the developer tests the code by running the existing test suite as well as any new tests created specifically for the new feature. Once these tests all pass, the developer extracts any changed code and new tests from the parent branch. When the code is integrated and all tests run successfully, the branch is ready to be merged back into the parent branch or trunk.

Branching and merging can be a major productivity boost for data analytics because it allows teams to make changes to the same source code files in parallel without slowing each other down. Each individual team member has control of his or her work environment. Branching is a great way to experiment with new features. Developers can run their own tests, make changes, and take risks. No matter how much code a developer breaks within a branch, the rest of the team can continue their work without being affected. If a person wishes, they can discard their changes and start over.

Another key to allowing team members to work well in parallel relates to providing them with an isolated machine and data environment. We’ll discuss environments in a future blog.