Unmasking the GitHub Star Story: Track Daily Trends & Break the 40k Limit

Gain deeper insights into your favourite open-source repositories. Explore star trends showing daily stars and go beyond the 40k-star limit.

Emanuele Fumagalli
5 min readOct 12, 2023

In this article, I’m excited to introduce a service to retrieve the full star history of a GitHub repo. As of now, GitHub lacks built-in functionality for showing full star history (I suppose it will be added at some point), and one workaround is to rely on external services like https://star-history.com/.

The webserver is developed using a Go library I created that can be accessed on GitHub at the following link: https://github.com/emanuelef/github-repo-activity-stats. While documentation is currently pending, I have plans to provide comprehensive documentation in the near future.

The website showcases the capabilities of this library and serves as a practical tool to obtain comprehensive star history data for any public GitHub repository. The website is available here: https://emanuelef.github.io/daily-stars-explorer.
In case interested on the server side and the React app the code is available here: https://github.com/emanuelef/daily-stars-explorer.

This project primarily serves as a response to inquiries regarding trends within GitHub repositories. The accompanying website is more of an experimental concept and may not undergo significant further development.
While there are certainly other factors to consider when evaluating the quality of a GitHub repository, I believe that tracking stars can provide valuable insights into trends and the repository’s popularity over time.

One pivotal motivation behind creating this website is to provide a means for accessing the complete star history, including the daily count of individual stars. This in-depth data representation offers a more accurate depiction of trending patterns.

It’s worth noting that using REST APIs, such as those employed by services like star-history.com, comes with certain limitations. There’s a restriction on fetching a maximum of 400 pages, each containing up to 100 items. As a result, this approach enables access to a maximum of 40k stars. As you can observe in the graph, this limitation results in a linear progression from 40k stars up to the current total star count.

https://api.github.com/repos/kubernetes/kubernetes/stargazers?page=400&per_page=100

The same repository will generate the following graph, offering the added functionality of zooming in on specific time periods and providing tooltips for each day since the repository’s inception.
The URL for the next graph is https://emanuelef.github.io/daily-stars-explorer/#/kubernetes/kubernetes.

In the website I’ve developed, you can also export a CSV file that contains the daily and total star counts for each day. This is in contrast to stars-history.com, where the CSV only includes data for specific days, as shown in the example I downloaded:

kubernetes/kubernetes,Tue Jun 10 2014 18:13:03 GMT+0100 (British Summer Time),0
kubernetes/kubernetes,Tue Aug 12 2014 15:07:03 GMT+0100 (British Summer Time),2610
kubernetes/kubernetes,Wed Jan 28 2015 22:44:25 GMT+0000 (Greenwich Mean Time),5280
kubernetes/kubernetes,Tue Jul 14 2015 03:00:22 GMT+0100 (British Summer Time),7950
kubernetes/kubernetes,Tue Dec 15 2015 05:02:03 GMT+0000 (Greenwich Mean Time),10620
kubernetes/kubernetes,Thu May 26 2016 00:07:10 GMT+0100 (British Summer Time),13290
kubernetes/kubernetes,Tue Oct 11 2016 12:55:50 GMT+0100 (British Summer Time),15960
kubernetes/kubernetes,Thu Feb 02 2017 19:29:45 GMT+0000 (Greenwich Mean Time),18630
kubernetes/kubernetes,Thu May 11 2017 09:27:32 GMT+0100 (British Summer Time),21270
kubernetes/kubernetes,Tue Aug 15 2017 19:22:31 GMT+0100 (British Summer Time),23940
kubernetes/kubernetes,Fri Nov 10 2017 00:15:48 GMT+0000 (Greenwich Mean Time),26610
kubernetes/kubernetes,Sun Jan 21 2018 12:54:42 GMT+0000 (Greenwich Mean Time),29280
kubernetes/kubernetes,Wed Mar 28 2018 07:24:01 GMT+0100 (British Summer Time),31950
kubernetes/kubernetes,Sat Jun 02 2018 17:37:56 GMT+0100 (British Summer Time),34620
kubernetes/kubernetes,Thu Aug 09 2018 20:07:02 GMT+0100 (British Summer Time),37290
kubernetes/kubernetes,Tue Oct 16 2018 11:31:30 GMT+0100 (British Summer Time),39960
kubernetes/kubernetes,Fri Oct 06 2023 13:34:09 GMT+0100 (British Summer Time),102106

And the following is an example of a CSV you can download in the website I created:

date,day-stars,total-stars
06-06-2014,0,0
07-06-2014,0,0
08-06-2014,0,0
09-06-2014,317,317
10-06-2014,441,758
11-06-2014,158,916
12-06-2014,96,1012
13-06-2014,31,1043
14-06-2014,37,1080
15-06-2014,79,1159
16-06-2014,63,1222
17-06-2014,55,1277
18-06-2014,34,1311
19-06-2014,26,1337
20-06-2014,12,1349
21-06-2014,10,1359
22-06-2014,18,1377
23-06-2014,10,1387
24-06-2014,10,1397
...

Currently, my existing solution demonstrates how to retrieve the complete star history, but it has some limitations. One major drawback is its caching capability, which is limited. Additionally, a significant disadvantage is that the GraphQL APIs employ a cursor-based approach, making it impossible to parallelise requests effectively. The only potential workaround I’ve considered involves initiating requests for the first half from the beginning and the second half from the end. However, this would only result in a partial reduction in the total time required. Please feel free to share your insights in the comments if you have any suggestions on how to parallelise GitHub requests for paging when using GraphQL.
As an example, fetching the entire star history for a repository with 100k stars will approximately takes around 5 minutes. The obtained result is subsequently cached for a few days. Additionally, there’s an option to force a refetch of the data up to the current day.
Please note that I’m using my Personal Access Token (not shared in the repo with the server) and so there’s a limit of 5k API calls per hours, that means you need to wait if that limit is reached.

Possible future improvements:
- query both forward and backward to reduce total time [DONE]
- add persistency for cache
- analyse time series to find peaks or trends
- compare multiple repos in the same graph [DONE]

--

--

Emanuele Fumagalli

Lead software engineer exploring tech. I write pragmatic articles, backed by code on GitHub, to share impactful discoveries. Join me in the journey!