Libraries.io April/May Progress Update
We missed our April update last month due to holidays, so we’ve got a combined April/Month progress report on Libraries.io since securing a grant from both the Sloan Foundation and the Ford Foundation, here’s what we got up to.
Notable New Features
Importing GitLab and Bitbucket users and organisations
After much refactoring and some painful migration of large sections of the database, our concept of repository owners is now longer tied to the specifics of the GitHub API, we now have support for Bitbucket Users and Teams and GitLab Users and Groups.
It’s also going to be much easier for us to add support for more kinds of repository owners in the future.
Importing GitLab Bitbucket Issues and Pull Requests
Along similar lines, the support for storing Issue and Pull Request metadata for repositories is no longer tied to the GitHub API, and we have support for indexing both GitLab and Bitbucket Issues and Pull Requests/Merge Requests, which will be useful in the future for calculating community engagement, responsive and activity metrics.
Redesigned Project pages
After Ben’s comprehensive UX review of the search experience, we’ve made a number of simplifications and moved some elements around to make searching and browsing the site much more enjoyable.
We’re also loading some of the intensive elements on the page as you scroll down the page which has reduce the load time by approximately 50%.
Search page refinements
You can also now search both libraries and repositories directly from the navigation header from any page of the site.
We also took the time to enable selecting multiple filters from the interface, this allows you to do more comprehensive searches, for example, searching for Testing Libraries written in CoffeeScript or Typescript, or any repository available under some kind of GPL license.
The search pages are also completely driven by elasticsearch now, skipping the need to load records from postgres after querying elasticsearch as we did before, making the search pages considerably quicker to load.
Civic Tech Research
One other large piece of work we did in April was researching The Impact of Civic Tech on Open Source which was presented by Ben at TICTec.
We compiled a list of the top one thousand most depended upon projects amongst the 93m declared dependencies tracked by Libraries.io which we’re calling “Digital Infrastructure”, then looking into how the Civic Tech community uses and contributes to it.
Some highlights include:
We found that amongst the 5,034 projects we call Civic Tech, 6,815 users have contributed 1,135,846 commits. This compares with 82,514 people who have contributed 2,731,564 commits to any open source dependency of a Civic Tech project.
The Civic Tech Community contributes around ~15% of the work needed to support its own foundations.
The Civic Tech Community contributes ~6% of work needed to support our shared, digital infrastructure.
Other Notable Changes
- Allow users to resync out of date repositories
- Add support for GitHub and GitLab Topics
- Extract JSON generation for API into serializer
- Added support for passing dependencies from package-lock.json
- Added support for passing dependencies from *.csproj files
- Allow sorting of repository projects via the API
- Add schema_plus for better index configuration
- Add pghero dashboard for visualizing db stats
- Match HISTORY and COPYING when checking for repo meta data files
- Add support for Maven dependencies
- Add #potentially_outdated? method to projects
A full list of all changes right across the Libraries.io org on GitHub is available in two gists, April and May, generated by: https://github.com/librariesio/org-pulse
Statistics
April: 230 commits, 11 pull requests and 129 issues opened across all Libraries.io repositories on GitHub: https://github.com/librariesio
May: 279 commits, 15 pull requests and 228 issues opened across all Libraries.io repositories on GitHub: https://github.com/librariesio
We’ve now indexed 9,375,765 published versions of 2,291,604 libraries, 23,470,130 open source repositories, 23,644,149 issues/pull requests and 93,965,312 dependencies from GitHub, GitLab and Bitbucket.
Contributors
We had a few patches from outside contributors in April and May whom we would like to thank:
Plans for June
The main aim for June is to release our first public data dump of dependency graph and metric data. This will including information about the 2.3 million libraries we have indexed, their versions and dependencies and the 25 million open source repositories and all of their dependencies.
It will be available under a Creative Commons Attribution-ShareAlike license, we’ve already had some amazing researchers start to play around with the data and find some interesting facts around dependency complexity and expect to see many more fascinating insights once people start to drill down into it.
Keep track of our progress via this GitHub issue: https://github.com/librariesio/supporters/issues/9
We’ll also be heading over to America for the Sustain OSS conference and a Digital Infrastructure workshop with Ford and Sloan Foundations to share what we’ve been working on.
As always, follow us on Twitter at @teabass, @benjam and @librariesio for more updates.