GSoC Week 1 : Let’s get started

Abhradeep Chakraborty
4 min readJun 19, 2022

--

Hello readers, it’s about a week now the GSoC coding period has started. So, here is the first weekly update about my progress. From this onwards, I will post GSoC blogs every week.

But before that, let me give a summary of my project so that non git developers can also catch the logic of the project.

Git Reachability bitmaps

Whenever you run the command git fetch(or similar), git server first decides which objects it have to send. If you don’t know what is a git object, please refer to git objects. These objects are connected to each other forming a directed acyclic graph. Each node of this graph is an object.

The git client and deamon compares their branch tips and decides which ones to send. As git only know about the branch tips, there is no way for git to know about the desired objects beforehand. So, it has to traverse down from the tips to all the objects connected directly or indirectly to it. This traversal goes till a known tip (i.e. the tip which the client already has) has found.

In case of git clone , all the objects has to be fetched (assuming no partial-clone or shallow-clone). So, the git deamon has to traverse all the objects connected directly or indirectly to the branch tips till the last object. The traversal is necessary in order to restrict any non-reachable objects from sending to the client.

For small repositories, this might not be a great deal. But things become crazy, when the repository is quite huge (e.g. linux kernel or git itself).

To tackle the problem, reachability bitmaps were introduced. The idea is to store a bitmap for every selected commits in a .bitmap file and then refer to it whenever a revision is needed. Reachability bitmaps are arrays that store 1 and 0s. Each index of this array denotes a particular object. So, if a bitmap for a particular commit has the value 1 in the i’th index, it means the object denoted by the i’th index can be reachable from the commit. Counting objects has more info if want to know more.

My project is to improve the performance of the reachability bitmaps. See the GSoC project to know more.

Community Bonding period

Community bonding period was really good. I asked a lot of doubts to the mentors during the community bonding period. Thanks to the mentors — they helped me tremendously to clear all the doubts. During this time, we realized that the current Documentation/technical/bitmap-format.txt has a missing information — it does not say about the trailing checksum at the end of the .bitmap file.

The file had some other issues also. It was not passing to the asciidoc html generator and the formatting was not suitable for generating a good formatted html page.

So, I wrote a patch series fixing all the three issues. These patches are integrated into next and hopefully be merged into master soon.

My first task: adding a “lookup table extension” in the bitmap format

During the bonding period, my mentor suggested that instead of working on ewah -> roaring bitmap conversion first, I should start with integrating a “lookup table extension” as this task is simpler and smaller compared to other tasks. So, I started working on it. The idea is to have a table at the end of .bitmap file which will contain the offsets (and xor-offsets) of the bitmaps of selected commits. Whenever git try to get the bitmap of a particular commit, instead of loading each bitmaps one by one and then find the desired bitmaps from them, git will parse only the desired bitmap by using the offset and xor-offset of the table. This will reduce the overhead of loading each and every bitmap.

The proposed table contains the following things:

  • N object ids where N is the number of selected commits.
  • A list of <`offset`,`xor-offset`> pair (4-byte each; network byte order). The i’th pair denotes the offset and xor-offset(respectively) of the the bitmap denoted by the i’th object id in the previous list.
  • A 4-byte integer(network byte order)to denote the flags (currently no such flags).

As the implementation was already done by Taylor (one of my mentors), all I had to do was fixing some of the implementation code and adding tests. The tests include both performance tests and integration tests.

This week was a little slow for me. It was mainly for two reasons —

  1. I was facing some problems to setup the environment for performance testing. For some reason, some objects was missing from git checkout. This resulted the failure of all the bitmap performance tests (as bitmaps need full closure). Taylor helped me to solve the issue. I also downloaded a local copy of linux kernel for testing purpose.
  2. I was thinking if we can optimize it further.

The patch series is nearly ready for review. I will submit it by today or tomorrow.

Difficulties

I didn’t get the particular case refered by the NEEDSWORK comment (written by Taylor). I want to work on that before submitting it to the mailing list.

Next step

The next step is to submit the patch series to the mailing list and make changes according to the review comments.

--

--

Abhradeep Chakraborty

I am a 3rd year IT student. I like to build full stack web apps and have a strong interest in core development. Currently learning cloud technologies.