Version Control Systems Suck For Hackathons: Here’s Why

Joey Rose
14 min readApr 20, 2020

--

Introduction

Reader, you’ve sat through a redesign of Oxy’s course registration system, as well as the evolution of a chatbot potentially responsible for the downfall of humanity.

Now, I ask you to join my group’s journey to solve an unsolved problem in the field of human computer interaction: version control systems (VCS) for hackathons.

At this moment you may be thinking:

How is this an unsolved problem?

Didn’t Git solve this like 15 years ago?

Why are ‘patio’ and ‘ratio’ pronounced differently?

Well, in this blog post, I will attempt to answer the majority of these questions.

Before we begin to understand the user need of hackathon attendees for VCS, let us discuss what the heck VCS are and why they’re so important!

What are VCS? Which one are you referring to?

What are VCS?

Defined by Atlassian as software that allows for the management of changes to documents, computer programs, large web sites, and other collections of information, VCS come in all different shapes and sizes. However, they all boil down to two categories:

  • Centralised Version Control Systems (CVCS)
  • Distributed Version Control Systems (DVCS)

While the two may sound vastly different, they’re not. In both systems, users…

  • update or pull from the main repository to update their files
  • commit or push to make changes to files in the repository
  • create branches, or alternate versions of the main source of code, in order to develop other features
  • merge to integrate branches into the main source of code

Furthermore, they’re both asynchronous in nature, meaning that someone’s edits to the codebase do not immediately affect the code of all users.

The main difference lies in the fact that CVCS use a central server to store all files and enable team collaboration on a single repository while DVCS feature a central server/repository and a local repository for every user.

Despite this simple difference, because everyone has a local repository in DVCS, users don’t have to always be connected to a network to perform an action and files are safe if the main repository gets corrupted. Therefore, CVCS aren’t used these days by developers, especially hackathon attendees.

Which VCS I’m referring to

Amongst the current DVCS whom have standardized the field of DVCS, (Mercurial, Git, and Bazaar), Git is by far the most popular. Together, Git and GitHub make up roughly 44% of the source code management market share, significantly more than the next eight systems combined.

VCS popularity according to Stack Overflow activity (via RhodeCode)

Despite this large gap in popularity, there are only a few things that distinguish the rest of the DVCS from Git. Mainly, Mercurial emphasizes history preservation, meaning that users can only go back one commit and branches are fixed, preventing them from being easily modified or deleted as needed. Furthermore, Bazaar has bound branches, which check if users are up to date with a remote repo before allowing them to make a commit.

Because Git and other DVCS share just about the same properties, while I most likely will only refer to Git, my critiques hold for the rest of the DVCS.

Now that we understand what VCS are, let’s explore the user need of hackathon attendees while using them!

User Need

To assess the user need that hackathon attendees have for VCS, my team reached out to several Oxy students of varying class years, asking them to describe their ideal VCS in a hectic hackathon environment.

Outside of the usual needs of VCS, we received a wide array of needs that all boiled down to the following categories:

  • when to begin next task
  • information necessary when completing task
  • debugging
  • merge conflicts

While they varied immensely in nuances and complexity, they all were a part of a larger user need: to have a means of organizing and integrating correct file versions that also supports the quick workflow required by hackathons.

This is different from the typical need of VCS found in slower, enterprise environments. There, verbosity is prioritized over speed, and the user need for meticulous integration vastly outweighs the need for an efficient workflow.

So it becomes a project management issue.

Evidence That This User Need is Unmet

There are many ways in which current VCS fail to satisfy this user need. Let us evaluate each one!

When to begin next task

During the interview process, one Senior computer science and spanish major stressed that his VCS would address the problem of not being able to see when he could start a task that was dependent upon the completion of another.

We felt like he had a point.

Like all coding projects, many features vital to the product’s success in a hackathon are built off of other core features. Because of the stringent time constraints in hackathons however, a fast rollover time between features is much more important than normal. In hackathons, group members must complete their assigned tasks so other group members can start theirs.

Take this scenario for example:

Amanda and Kelly wish to scrape IMDb data and visualize it in a cool manner. Kelly is the web-scraping expert, so she is tasked with collecting the data. Before Amanda can showcase her data visualization skills, she must know the properties of the dataset (i.e. dimensionality, size, existence of clusters, and context). Because web-scraping takes a while, Amanda must wait until Kelly tells her she’s done.

Because of current VCS asynchronicity, Amanda could not see how far along Kelly was in her web-scraping endeavors, and Kelly had no way of conveying that her task was completed other than letting her know verbally.

However, due to their asynchronicity, current VCS offer no standardized way of conveying when a task has been completed.

Instead, to ballpark when they can start working on their task, users have two choices: either get up and look at member’s screen or verbally check in with them, which directly hinders their workflow.

When a group member finally completes a task, they must push and communicate verbally to their team. This process can become cumbersome over the course of a 24 to 48 hour period, and forgetting to do so can drastically slowdown the workflow (since members would not know to start their tasks).

Furthermore, third party project management systems such as Trello that offer tools like task boards, which display the status of each task, still require users to navigate to their platform and manually create and move tasks, which slows down the workflow, hindering the core user need.

Example Trello task board

In addition, different functions may have to be created in order to resolve one task. Trello does not have built-in dependency management, like the completion of certain functions, thus no feedback is given to group members about how close other members are to completing a task.

While Trello advertises automation via GitHub integration, it only attaches pull requests, issues, branches, and commits to their respective cards in a task board, not change the status of tasks as they’re being completed.

Example of Trello’s Github Integration

While real-time collaborative tools such as Cloud9 and Visual Studio Live Share allow for users to view and make live changes to code, users are bounded to the same version of each file, meaning that they often can’t compile and run code at any particular time (since other members may be mid-code). As a result, they are impractical development environments for hackathons where users often work on the same files concurrently.

Cloud9’s pair programming interface
Visual Studio Live Share’s pair programming interface

Which brings us to our next point!

Information necessary when completing task

In a similar light, a separate Junior computer science and theater major wished that her version control system would portray the most recent code to her, particularly when working on features concurrently with another group member.

After mulling this over, we agreed that this scenario comes up quite frequently in hackathons. Take designing a backend to a RESTful API for example:

Sarah wants to write the backend to the rest API that she and Bobby are making for their web application. Sarah needs to know the names of the elements in the form Bobby is creating, as well as how far along the process Bobby is, before she can begin writing it. Asking him slows down his process of completing the task, and when she does, she’s unsure if the names are spelled in camel-case or with underscores. To alleviate this, she gets up and looks at Bobby’s screen, but by the time she returns to her computer, her train of thought is gone.

Because current VCS fail to communicate vital information in between commits, group members must sacrifice their workflow for the sake of maintaining correct version control. This directly conflicts with the user need to maintain a fast workflow.

Additionally, consider working on the front-end of an application with another group member:

Timmy wishes to write the JavaScript associated with the front-end that Ella is in the process of writing. Before he can do so however, he must wait for her to finish and push. When he finally has the ability to pull, Ella’s code is already modified, and class names have been changed (but not pushed). By the time he pushes, there’s no merge conflict, yet the JavaScript doesn’t work properly. The entire process had been slowed down, and a bug was easily produced due to a lack of constant communication, jeopardizing the need for verbose version control.

While Git’s “diff” command allows users to compare differences between commits, expecting users to use every time they pull is not reasonable (especially considering how many times users pull an hour), nor a part of most established workflows.

Example output of “diff” command

Again, the same reasons detailed above for Cloud9 and Visual Studio Live Share explain why they’re not viable for solving this need either.

Because current VCS fail to retrieve the most up-to-date information in rapidly evolving codebases, minute differences between file versions can lead to undetected bugs, which jeopardize the correctness of file versions, thus slowing down workflow down the line.

The takeaway: even when tasks are completed, they are revisited and evolve over time — not having access to the most recent/accurate info leads to errors in version control.

This is not something found in slower, open-source projects.

Now, on to the next point!

Debugging

On a separate note, a Freshman computer science major mentioned that the faster paced environment filled with strangers calls for a better means of debugging other people’s code.

He is right.

Unlike slower, open-sourced projects where people contributing have copious amounts of time to learn frameworks, libraries, and overall programming abilities, in a hackathon setting, the people contributing within a team often come from vastly different coding backgrounds, and share vastly different understandings of these aspects.

Thus, debugging sessions are much more likely to occur in hackathons, a setting, again, where the severity of spending time outside of developing features is amplified.

Despite this, current VCS fail to provide an efficient means of doing so. Take the following situation for example:

Susan and Matt decide to build a full-stack web application for their hackathon project. Because Susan knows a lot of Ruby on Rails, her and Matt agree to use it to build the backend despite Matt not knowing much about it. When writing the front end code, Matt needs to know how to do one thing in Ruby on Rails, but is unable to communicate the problem effectively in a verbal manner, and Susan is unable to verify the solution solely from a visual standpoint. As a result, Susan has to either stop what she’s doing, sit in front of Matt’s computer and run the code, or have Matt create a separate branch push the buggy file, and have Susan pull the new file, open it, and debug it.

In either solution, Susan’s workflow is immensely disturbed, and in the second solution, buggy code is saved into the codebase (if Matt forgets to delete the branch), jeopardizing the quality of the file versions present.

While GitHub has the “issues” feature as a means of communicating bugs, the time it takes to create it and resolve it slows down the workflow even more than communicating them verbally.

GitHub Issues’ interface

Furthermore, in a hackathon setting, users don’t have the luxury of waiting for an issue to be resolved. In Matt’s case, one small misunderstanding of Ruby’s syntax prevented him from completing the front end, a feature that many features down the line depend on.

While real-time collaborative tools such as Cloud9 and Visual Studio Live Share allow users to see and run other people’s code, if two group members are working on the same file, should one wish to test out their changes to the other person’s code, they’ll have two options:

  • stop their task, make it compile-able, and wait for it to run
  • comment out their code as there’s no best practice for running specific chunks of code while keeping import statements at the top
Visualization of second option

Now, on to the last category of reported user needs!

Merge conflicts

On another separate note, a Freshman computer science and economics major expressed a deep need to minimize the amount of time spent on resolving merge conflcits.

We couldn’t agree more.

Unlike slower-evolving collaborative environments, in a hackathon setting, there are often more moving pieces within a project at once that are expected to be completed at a faster rate over a 24 to 48 hour period of time. Because all current VCS are asynchronous, this makes merge conflicts extremely common in a setting where the amount of time spent resolving such conflicts should be at an all time low.

These contradictory expectations call for a version control system that effectively reduces time spent resolving merge conflicts.

While Git has done the best job at minimizing the number of time resolving them though its “status” command, none of the current VCS have succeeded at reducing the number of them — while Bazaar checks if users are up to date with a remote repo before allowing them to make a commit, they still must evaluate differences between file versions to prevent them, which still slows down the workflow.

“git status” tells users what file(s) need to be checked for conflicting code

At the moment, the current approach is to promote different workflows in order to integrate new features without causing merge conflicts. One of the most popular, ‘Git-Flow’, requires every feature to flow from several separate branches (develop, feature, release, and hotfix) before it’s finally integrated into a master branch. While this effectively partitions the number of moving parts — increasing the verbosity of the version control — because every feature must travel from branch to branch before it’s integrated into the codebase, the need for a quick workflow is hindered.

How Git-Flow works

Alternatively, ‘GitHub Flow’ uses a similar yet streamlined methodology, combining the mainline and release branches into a “master” and treating hotfixes just like feature branches. This sacrifices verbosity for the sake of a fast workflow.

How GitHub Flow works

At the end of the day, such workflows can only compensate so much for the asynchronous nature of current VCS. At the most fundamental level, because code is updated over time, and because features are often built off of other features, differences between files will almost always develop over time. Therefore, merge conflicts are essentially inevitable with current systems.

For both flows, a merge conflict will still occur in the following scenario:

In order to improve the functionality and appearance of their website, Becky decides to create a separate branch to incorporate JavaScript while Craig does the same for CSS. Both end up taking a long time. In the meantime, Toby adds some much needed content to their HTML file. When Becky finally attempts to merge her JavaScript branch into their development branch, her HTML file is missing Toby’s HTML. Craig finishes soon after, but his branch is missing the JavaScript import lines that Becky wrote at the top of her HTML, as well as Toby’s HTML, so he encounters a lengthier merge conflict.

Despite splitting up tasks into different feature branches, merge conflicts still occurred. While these conflicts can be resolved simply by accepting line changes, they build up over the course of a 24 to 48 hour long hackathon, leading to a slower overall workflow.

Conclusion

Version control systems have allowed developers to manage changes to documents, computer programs, large web sites, and other collections of information for multiple decades, satisfying the need to integrate changes into a codebase in a structured manner.

However, hackathon users don’t share this same need.

According to them, their need for VCS is to have a means of organizing and integrating correct file versions that also supports the quick workflow required by hackathons.

While users may try to use a combination of VCS, workflows, and project management systems to satisfy this need, because they are additional platforms that require manual manipulation, users’ workflows can only be facilitated to a certain degree. Furthermore, there are certain things that they can’t provide, like the ability to update the statuses of tasks as they’re being completed and show exactly what functions members are working on.

At the end of the day, VCS must to be the ones to satisfy this need.

However, current VCS fail to do so because they don’t allow users to know when they can start working on a task, see vital information needed when completing your task, debug with other members quickly, or significantly reduce the amount of time resolving merge conflicts.

When users try to compensate for this by starting tasks prematurely, using information based off verbal cues, visually debug other people’s code, and merging feature branches as quickly as possible, bugs not detected by merge conflicts may enter the codebase, jeopardizing the reliability of the version control system.

While other external forces such as worsened communication (due to a loud environment, teams of strangers, and overworked programmers), poor heath (no-sleep culture, lack of focus on nutrition), and lack of time (usually 24 to 48 hours can create stress and pressure) may also lead to poor coding behaviors, it is a bit disingenuous to put the blame on VCS for this: at the end of the day, while VCS should be accommodating for mistakes, they are not at fault if they allow such mistakes to jeopardize version control. They are at fault if their design neglects the core need of hackathon attendees, which eventually introduces errors into the codebase … which is unfortunately the case.

Takeaways

Justin Li, cognitive science and computer science professor at Oxy, describes an “unsolved problem” as one whose solution isn’t standardized.

To see if this definition applied to VCS for hackathons, we conducted interviews with Oxy students who’d attended hackathons to see if their needs were satisfied by current VCS.

Throughout this process and research done on current VCS, we learned that the user need of hackathon attendees for their VCS is is one quite specialized and nuanced. While a combination of Git, real-time collaborative tools, and project management systems have come close to satisfying it, the solution is a manual one, which inherently prevents them from facilitating a quick workflow. A version control system is needed to solve this, and at the moment there is no standardized approach.

Given this, my team found that, in the realm of hackathons, version control systems are unsolved, and suck.

--

--

Joey Rose

Computer science and mathematics student at Occidental College