Fixing Gluster’s git history…

Amar Tumballi
4 min readSep 6, 2019

--

When I say fixing the ‘history’ of GlusterFS, it may seem very surprising for many people, because many wouldn’t know that there is something before current code. Instead of giving one line answer, let me write multiple paragraphs :-)

If you are in a hurry, then all you need to know is, there were some ~2240 commits which were present in Gluster, before the current repository started. Now, for those who have time to go through the timeline of how we lost this history, below write up throws some light. It is also important because most of the code which came in then remains same even now. So, to check whom to blame, we surely need this history :-)

Also, it is not always easy to document every decision, every change done in code as part of documentation. A proper commit history provides all those information and would be a good thing to have for all new developers of the project.

History: 2006–2008

When we started Gluster in 2006, the code was hosted in nongnu savannah, in tla archive. The ‘initial’ commit was from our Founder/CTO AB. None of us were great at planning an enterprise level repo management policies. When we started, everyone had commit access. Since we were a startup we wanted to be quick in prototyping and building the features.

It took us almost 2 years to realise that we needed commit access control, and have review system in place. The events that led us to this realisation seem funny now, but being on the receiving end it didn’t seem funny then. I was embarrassed in front of a prospective customer. Ask me offline about what was the funny thing here :-)

We also didn’t have any release branch practice, and most of the 1.x series releases of glusterfs were done from a commit on master branch. To handle some of the performance issues of TLA, we did create multiple branches by copying over the code.

History : 2008-2009

This was the year Gluster started growing, specially with number of engineers in the team. We needed something better for our development. git project was proving to be great. Specially, we could have a better control on review, as we could apply patches from mutt clients, and test. At this time, we didn’t have really good tool to fix our history from tla to git, and we wanted to quickly move to git. All we did was to copy over the files from previous version to git, and start fresh.

Ref: Check this git commit.

We quickly forgot about it, as git was working great for us, we moved from email based review system to Gerrit based system, and suddenly our development process looked professional.

Thanks to Csaba, who created a git repo with TLA archive sometime later, which is accessible @ https://github.com/gluster/historic.

History: 2017

This is when I rejoined Gluster Team after a gap of 40 months (Gap was 2013–2017). I realised that many developers used to do git blame multiple times to track a change if they didn’t understand something in libglusterfs. As most of the core infra was done before 2009, ie, moving to git, people didn’t have an easy way to check history. I checked if we can fix git history by rebasing all new developments on top of historic repo, but it was not easy. There were 10k+ commits, which caused many merge conflicts, and I felt it was not worth the effort, as not many people were bothered about history, after all.

Present

Some things were bothering me.

Stats of any open-source projects were directly taken by what github provides

It was not AB, who started project as per gluster/glusterfs.

Most of the documentation about some features, why we did something were part of commit messages, and they were lost too.

The sad part was about 1st point, which made project just 10 years old, instead of 13 years. Also, Avati’s contributions were just 625 commits if one looked @ current repo, where as, his true contributions are surely above 1300+. About AB’s contribution, there are many other places, it could be verified.

Now that I have some free time, I thought of correcting this history. The outcome ? Check this repo — https://github.com/amarts/glusterfs

Now the first commit on the repo is AB’s and project’s activities start from 2006. Good part is, if someone wants to check why we introduced something in the libglusterfs part of code, you would be able to read about it from git history itself now, instead being clueless.

As of September 6th, 2019, there is just 1 commit which is different between gluster/glusterfs and amarts/glusterfs, which was done to add a .travis.yml to repo :-)

Plan is, I will try to maintain this repo up-to-date with gluster/glusterfs (or persuade gluster community to change the repo).

NOTE: Sad part is, the overall contribution list doesn’t capture the people whose email IDs are not registered on github. For example, the contributions of Basavanagowda Kanur, Raghavendra G, Krishna Srinivas, Vikas Gorur and Harshavardhana are not completely captured in this (and for that matter, in gluster/historic repo too).

--

--