GSoC Week 7: improving Performance tests

Abhradeep Chakraborty
4 min readAug 2, 2022

Hello again readers, this is my week 7 work update. In this week I work further on Git specific CRoaring fixtures. Besides, there were another round of review on my bitmap-lookup-table patch series. So I worked on that also.

Roaring bitmap related progress

Firstly, I am gonna talk about my roaring bitmap progress. I previously integrated CRoaring into Git using their amalgamation.sh script. It combines all the library code files (and header files) into two files — roaring.c and roaring.h. This is good because all the roaring logics will be compiled into a single unit causing faster performance. But this has a downside too — I have to modify those files whenever I want to build some Git specific functions. This is because all the internal library files (i.e. the functions other than what CRoaring API provides in roaring.h) are present in roaring.c.

Truly speaking, this is not a problem because CRoaring is under MIT license. But I would have been more happy if I could put those Git specific functions in a separate file called roaring_git.c and then import roaring_git.h everywhere (instead of roaring.h). roaring_git.h would contain all the Git specific functions as well as the built in CRoaring functions. But as I said before, this is not easy to do. I have to move all the roaring internal functions (in order to use them) out from amalgamated roaring.c to another file (may be roaring_internals.c) and then use them in roaring_git.c. I tried this but it was a tedious job and it needs more time. Even I am not sure whether it can be possible or not.

So, for now I wrote all Git specific functions in roaring.c file. As Taylor told me — we first need to check whether roaring bitmaps really create an impact in performance. With these functions roaring bitmaps can now be stored in network byte order which means it can work in big-endian systems also.

Bitmap-lookup-table v5 progress

As I said before, there are some reviews for bitmap-lookup-table series. Performance tests that I wrote previously was not accurate. Because the second call to test_bitmap is always working on the previously repacked repo, causing the the performance of the second call much faster than the previous one.

One solution to this problem would be to setup the repo in the function itself and remove it accordingly. For example, this is the updated test -

#!/bin/shtest_description='Tests pack performance using bitmaps'
. ./perf-lib.sh
. "${TEST_DIRECTORY}/perf/lib-bitmap.sh"
test_lookup_pack_bitmap () { test_expect_success "setup repo environment" '
rm -fr * .git
'
# setup a large repo
test_perf_large_repo
test_expect_success 'setup bitmap config' '
git config pack.writebitmaps true
'
test_expect_success 'create tags' '
git tag --message="tag pointing to HEAD" perf-tag HEAD
'
test_perf "enable lookup table: $1" '
git config pack.writeBitmapLookupTable '"$1"'
'
test_pack_bitmap
}
test_lookup_pack_bitmap falsetest_lookup_pack_bitmap truetest_done

But still it seems that the results are not accurate. The results are differing in subsequent test runs.

So, the next option that came to my mind was creating a new file for every test cases (i.e. lookup table enabled and lookup table disabled). With this, the results are guaranteed to be accurate. Moreover, one can edit/modify a particular file according to their need without adding those cases (because that may not make sense with other configurations) in other files(i.e. let’s say where lookup table is enabled). I am currently work on it.

While submitting the previous version, I noticed that a test case was failing (see my previous blog to know about that). At that time I thought this was fixed in the lastly submitted patch series (because all CI/CD tests were passing). But later Johannes informed me that that test case under t5326-multi-pack-bitmaps.sh and t5327-multi-pack-bitmaps-rev.sh is failing. Below is the error message -

Cloning into bare repository 'clone.git'...
remote: Enumerating objects: 756, done.
remote: Counting objects: 100% (754/754), done.
remote: Compressing objects: 100% (281/281), done.
remote: Total 756 (delta 245), reused 740 (delta 234), pack-reused 2
Receiving objects: 100% (756/756), 77.50 KiB | 8.61 MiB/s, done.
fatal: REF_DELTA at offset 221 already resolved (duplicate base 4d332072f161629ffe4652ecd3ce377ef88447bec73f05ab0f3515f98bd061cf?)
fatal: fetch-pack: invalid index-pack output
error: last command exited with $?=128
not ok 319 - clone from bitmapped repository
#
# rm -fr clone.git &&
# git clone --no-local --bare . clone.git &&
# git rev-parse HEAD >expect &&
# git --git-dir=clone.git rev-parse HEAD >actual &&
# test_cmp expect actual
#

Now it is quite mysterious for me. Because it is only failing when GIT_TEST_DEFAULT_HASH=sha256. It is passing in all other cases. I looked into it further and found that this is not related to lookup-table implementation code at all. Rather the test script itself is somehow causing this problem.

I was not able to work much in the last 3–4 days as I was busy in hostel room shifting. So, I didn’t get enough time to investigate it.

I hope that I will figure out the problem soon and hopefully able to submit my next version (hopefully the final version) in the next day or two.

Thanks :)

--

--

Abhradeep Chakraborty

I am a 3rd year IT student. I like to build full stack web apps and have a strong interest in core development. Currently learning cloud technologies.