Angular vs React : text analysis of commit messages

Sergey Abakumoff
7 min readSep 18, 2016

--

Every enthusiastic javascript blogger should write at least one article that compares the front-end frameworks, so do I! But no worries, this article isn’t Nth attempt to describe the pros and cons of the Angular and React, it rather glances at them from an unusual angle by applying the text mining methods to the commit messages that are collected from the version control system and remarking on the results. The research is fully reproducible, you can find the data and Rmd file here. The details of the text analysis methods that were used in this article are described in the great book “Tidy Text Mining with R” by Julia Silge and David Robinson.

Getting data

The first step in any data analysis is getting and cleaning data. In order to collect the React and Angular commit messages along with the relevant information, I used the BigQuery public Github data:

The results have been saved in the table called “react_angular_commits” that has been exported to a CSV file and downloaded for further processing that happened in R Studio. There are 7127 rows of the React messages and 8035 rows of the Angular messages.

Warming up

The first thing to compare is the length of the messages and drawing a box plot seems to be very suitable for that, here is the corresponding R code and its output:

So, the median values are close(React : 81 characters, Angular: 74 characters), but it seems that the distribution of the Angular messages lengths is significantly skewed right compared with the React’s distribution. Simply speaking it means that the Angular contributors are more “voluble” when it comes down to write a commit message. The only reasonable explanation I came with is the difference in guidelines for contributors :

But, I would expect the exactly opposite effect under these conditions — if everyone writes messages according to the pre-defined rules, then the length of these messages is not expected to vary a lot and vice versa. Apparently, it’s one of those cases where reality doesn’t meet expectations. Keep reading, more interesting stuff is coming :)

Analyzing word and document frequency

What are the most common words in the commits messages of Angular and React? To find it out these messages have to be split to individual words and cleaned off the stop words with the help of tydytext package.

Here is the stats of Angular’s words frequency:

Almost all of these words looks familiar to me except of “feat” and “chore”. The oxford dictionary provides the following definitions of these words:

feat — an achievement that requires great courage, skill, or strength

chore — a routine task, especially a household one

So, why are they so frequently used in the Angular’s commit messages? Let’s turn to the guidelines for contributors for explanation:

Okay, so “feat” and “chore” are just types of commits! There are other types in the top 15 words : “test”, “fix” and “docs”. Hence the next idea — the header of a commit message also includes the scope of the changes, so let’s find out the most popular scopes along with the type of changes they are affected by. The following code might seem to be pretty convoluted, but it’s not — it extracts the type and scope from the Angular commit messages and saves the result to a new dataset that is joined with the most common scopes dataset in order to filter out the rest of scopes, finally the code plots the bar chart grouped by scope.

That makes a lot of sense — for example, the most popular type of changes is “fix”, the documentation modifications affect the tutorial a lot and so on. One interesting observation : “$compile” scope demands a lot of attention in all the development areas. I don’t know a lot about $compile, but found a quite popular article that says:

View compilation in Angular is some of the most ingenious functional programming I’ve seen in JavaScript.

That explains the amount of attention required! How cool is that? Simple text mining tools available to everyone allows to reveal what’s going on in the development of one of the most popular open source libraries!

Okay then, what about React’s most common words in the commit messages?

A lot of messages have “merge”, “pull” and “request” words. In fact 2274 React’s commit messages start with “Merge pull request #xxxx” string. The culprit behind it seems to be the “Collaborating on projects using issues and pull requests” workflow and in particular the recommended approach to merge pull requests. The very same pattern pattern of commit messages can be found in other open-sources projects, for example take a look at Rails changes history. But it was not like that at all for the Angular commit messages! Why is it so? It seems that Angular maintainers use the alternative technique of integrating the contributors’ changes that boils down to applying the series of patches from the pull requests. You can find the detailed explanation in the excellent “Merge pull request Considered Harmful” blog post. The Angular’s approach along with the rules for writing the commit messages turns the changes history to a useful product story that is easy to read and analyze. Unfortunately, React commits lack this beauty, but that’s not a reason to give up. There are other text mining methods that can tell a story of React, namely term frequency–inverse document frequency can be effectively used. Here is the code that selects the words that are frequently used in React’s messages, but rarely occurred in both of React and Angular messages.

This stats is much more informative than the previous than. First of all, “spicyj”, “zpao”, “sebmarkbage”, “chenglou”, “syranide”, “jimfb”, “benjamn” refer to Github accounts of the most active contributors. These names are used in the messages like

Merge pull request #2343 from zpao/proptypes-deprecation Update PropTypes for ReactElement & ReactNode

so let’s ignore them and look at other highest tf-idf words. What about “korean” and “japanese” for example? It seems that React maintainers put a certain amount of efforts to translate the flagship documentation to CJK. That’s was not observed for Angular and its documentation seems to have the English version only. Other common words indicate the areas that are frequently affected by the changes and every React developer should be familiar with them. Funny enough, they perfectly describe almost everything you need to implement and test a React-based application, here is the illustration:

That’s all that I was able to extract from the React’s commit messages.Other methods like working with combinations of words did not help to obtain any other interesting information.

Sentiment Analysis

Finally let’s try to conduct the sentiment analysis of commit messages by using the dictionary that labels each word as “positive” or “negative”. Here is the code that plots the distribution of the most common positive and negative words in the angular’s messages:

Indeed the words labeled as “negative” are not actually negative in context! We already know that “chore” is just a type of changes suggested in the guidelines for contributors. Or, for example, “error” is certainly not negative in the “fix ‘type mismatch’ error on IE8 after each request” message. The similar picture is observed for React:

Practically it means that React and Angular contributors do not put a lot of emotion into the commit messages, for example there is only one f-bomb among 15K commit messages. See my previous post for the examples of slightly different approach to formatting the commit messages ;)

By way of conclusion

In his great book “How Google Works” Eric Schmidt explains how astonishing things are in the Internet Century:

Three powerful technology trends have converged to fundamentally shift the playing field in most industries. First, the Internet has made information free, copious, and ubiquitous — practically everything is online. Second, mobile devices and networks have made global reach and continuous connectivity widely available. And third, cloud computing has put practically infinite computing power and storage and a host of sophisticated tools and applications at everyone’s disposal, on an inexpensive, pay-as-you-go basis.

This story is the example of using these powerful technologies : the data to analyze has been obtained from the publicly available source(Google Cloud) by using the free computational power(BigQuery) that tool only 15.6 sec to process 121Gb of data , the analysis has been done in the local machine, but I could’ve leverage the kaggle kernels platform to run the code in the cloud.

--

--