MOSS Milestone Achieved! Over 20,000 Nodes Added Around scikit-learn.

Jonathan Starr
Open-Source Science (OSSci)
3 min readJul 6, 2024
scikit-learn’s Institutional Cloud

We are thrilled to announce a significant milestone for the Map of Open Source Science (MOSS) project! We can now efficiently pull and map large volumes of data related to projects, papers, and authors by pulling data from ecosyste.ms and OpenAlex. Much of this data is powered by the recent workshop out of the Chan Zuckerberg Initiative. This advancement opens up new possibilities for visualizing and understanding the vast network of open source tooling use in scientific research.

Visualizing the scikit-learn Ecosystem

To illustrate the power of our new capabilities, we’ve created some test visualizations of the scikit-learn system, showcasing the web of papers referencing scikit-learn, the authors of those papers, and the affiliated institutions of those authors. Here’s a glimpse into the data we’ve mapped:

1. Scikit-learn’s Reference Corona

Yellow nodes represent papers that reference scikit-learn. In this visualization, we’ve mapped 2,430 papers that cite scikit-learn, forming a vibrant corona of scholarly activity around this crucial tool.

2. Scikit-learn’s Authorship Orbit

Green nodes signify the authors of papers referencing scikit-learn. This orbit encompasses 16,444 authors, highlighting the extensive network of researchers contributing to and utilizing scikit-learn in their work.

3. Scikit-learn’s Institutional Cloud

White nodes denote the institutions affiliated with the authors of papers that reference scikit-learn. This cloud maps 3,186 institutions, underscoring the global reach and collaborative nature of the scikit-learn ecosystem.

4. Close-up of the Scikit-learn System

A detailed view of a sector of the scikit-learn system provides a closer look at the connections and relationships within this scientific network.

Next Steps

While we’re excited about the progress made, we are continuously refining these visualizations to enhance their effectiveness. The foundational work is solid, and we’re eager to build upon it in the coming weeks by mapping additional projects.

We also need your help! Specifically, we are looking for assistance in porting MOSS to Neo4j, as the current number of nodes and connections overwhelms our existing platform, Kumu. If you’re interested in contributing, please check out the GitHub repository.

Join us for the next MOSS Interest Group (IG) call on July 8th at 1 PM ET, where we will discuss our next strategic steps, including expanding the scikit-learn test-case by mapping other projects cited by papers that cite scikit-learn.

To stay updated and participate in the discussion, join the MOSS Interest Group.

See you soon!

--

--