Introducing Libscore

Overview

Every month, Libscore scans the the top million sites on the web to determine which third-party JavaScript libraries are installed on each site.

Libscore aggregates this data to provide open source developers the numbers they need to measure their impact. Before Libscore, front-end developers only had Github star counts as a proxy for their library’s success. But most developers don’t star Github projects all, opting instead to bookmark Github pages or download libraries exclusively through npm or bower. (This is not a knock on Github; they’d be the first to admit that audience measurement is not the intention of starring.)

The end result is that developers contribute to open source in a vacuum; they develop, hoping — but never knowing — whether their library is being used at-large. Libscore’s data substantiates this hope with facts, resulting in a tighter feedback loop between a project’s development and its developer adoption. This feedback loop, in turn, works to motivate a much higher frequency of open source project maintenance, which ultimately leads to a greater lifespans for open source projects.

Specifically, Libscore detects modules loaded via RequireJS, jQuery plugins, window variables produced by non-jQuery libraries, and external scripts.

Libscore was created by Julian Shapiro and Thomas Davis. Jesse Chase built the website. The project is sponsored by Stripe and Digital Ocean.

Visit Libscore’s Github page to get links to library badges, which publicize a library’s site count. (Any number over 25 is something to be very proud of. Remember, only the top million sites are counted.) Consider embedding it into both your project’s Github README and official documentation page to showcase your library’s adoption at-large to potential new users.

Solving open source’s age-old problem

Let’s say, three months after your open source project’s release, it has twenty-five stars on Github. That’s not very motivating, is it?

But what if one of those starrers, unbeknownst to you, was a developer for CNN.com—which is now using your library. Your library, in turn, is affecting tens of millions of people on a daily basis. All of a sudden that Github star count seems like an awfully poor metric for measuring your project’s impact.

Enter Libscore

Libscore.com was built to address this: Which sites are using which front-end JavaScript libraries? Every month, Libscore scans the homepage of the top million sites on the web, using a set of simple heuristics to determine which third-party JavaScript libraries are installed on each page.

Specifically, Libscore scans for third-party modules loaded with RequireJS, jQuery plugins, window variables produced by non-jQuery libraries, and cross-domain external scripts (e.g. analytics services and CDNs). It scans every site twice: once to pick up results for the desktop version of the site, and again to pick up results for the site’s mobile redirect page (if one exists). It then aggregates this data, allowing for the following queries:

  • Variable search. With variable search, you can enter “jQuery”, for example, to retrieve a list of all sites that contain jQuery as a window property. Even if only 50 sites out of the top million use your brand spanking new library, you will know exactly which sites they are thanks to Libscore’s comprehensive search. What can you do with the resulting data? Well, if any of the retrieved sites are well-known, you can list them on your project’s homepage or in your Github README file in order to establish your project’s credibility. Or, you could list the top sites on your resume when applying for a development position. After all, if CNN is using your library, that’s a pretty big achievement.
  • Popular libraries. With Libscore’s monthly distribution of the most popular front-end JavaScript libraries, you can finally cut through the hype in the open source community and discover for yourself which libraries developers are actually using en masse — and not just talking about. This data set also doubles as a means to perform competitive and trend analysis within the open source community.
  • Reverse domain search. With reverse search, you enter a domain, such as “Stripe.com,” to find all the libraries that the specified website uses. Similar to the popular libraries list mentioned in the previous point, reverse domain search is tremendously helpful for competitive analysis: If you want to know which libraries make a site you love come alive, now you can take a quick peek behind their source code.

How it works

Libscore connects to the sites in AWS’s top million sites list (http://aws.amazon.com/alexa-top-sites/) using a headless version of the WebKit engine. Once connected, Libscore diffs the window variables and jQuery variables found on the live instance of the website against the variables that were natively populated by the browser and jQuery, respectively.

Libscore then runs heuristics on each of these variables to determine whether they are likely to be the result of a third-party JavaScript library (instead of arbitrary production variables that leaked to the window object). For each site, Libscore performs this scan process once with a desktop pass and again with a mobile pass. The results of this two-phase scan are aggregated. (Early tests found that a significant minority of sites that had mobile-specific pages/redirects contained a set of JavaScript libraries that differed significantly from those found on the desktop version of the site. Further, some JavaScript libraries are developed exclusively for mobile. Hence, this two-step aggregation process is performed for the sake of thoroughness.)

Beyond variable sniffing, Libscore also performs indirect analysis on the RequireJS and SeaJS (RequireJS’s Chinese counterpart) module loaders to detect which modules are likely to be third-party plugins, and which of those are likely to be jQuery plugins (if jQuery has not been made global).

If your library is not being picked up by Libscore’s scans, and you believe that it’s being used by at least 50 of the top million sites on the web, ensure that you’re following the recommended best practice of providing a top-level .version or .VERSION property on your library’s primary exposed function/object. While this is not the only heuristic Libscore uses to determine whether a window variable is a third-party plugin, it is the most surefire way of having Libscore consider including your code.

If Libscore is failing to detect a well-known library with significant popularity (i.e. at least 1500 stars on GitHub), please tell us in this thread.

Who made Libscore?

The Libscore scanner and website were built by Julian Shapiro while on Stripe’s open source retreat. Stripe incorporated, administrates, and funds the non-profit behind Libscore. Further, Stripe fully sponsored all 7 weeks of Libscore’s development.

The most operationally intensive aspect of Libscore is the scaling architecture that orchestrates its high-volume scanning. This architecture and the associated API were built by Thomas Davis, the co-founder of popular projects such as cdnjs and jsonresume, as well as a tech policy advocate who works closely with the EFF. Jesse Chase, the Creative Director at Digital Ocean and a shepherd of the Libscore project, built the website.

This entirety of Libscore’s architecture is hosted on DigitalOcean, which provides Libscore with the tremendously powerful and cost-effective computational resources it requires to regularly scan one million sites.

If you’ve found Libscore’s to be helpful, please shout out to @Stripe and @DigitalOcean on Twitter. They’re paying out of pocket so that front-end developers can have they data they need to measure their impact.

Where it falls short

The only completely accurate data set that Libscore can ever provide is cross-domain scripts data, which does not require in-depth filtering. In contrast, the variables that Libscore attempts to detect are subject to fallible heuristics. A minority of libraries will go undetected (we estimate ~3%), and false positives will be introduced (estimated at ~15%). The technical reasons for this are beyond the scope of this article, but the most important takeaway is: Libscore is an intelligent scanner that operates without the crutch of a whitelist; it does its best to detect everything and anything — even brand new libraries that only a couple dozen sites are currently using.

Further, note that Libscore only scans a site’s homepage. In our testing, this proved to be tremendously representative of library usage across the web as a whole, but it is certainly not fully representative. Given this, as well as the fact that Libscore limits itself to just the top million sites on the web (although the monthly visits received on sites past the one million mark is negligible or nonexistent), recognize that the goal of Libscore is simply to provide an accurate representational sample of JavaScript library usage, and nothing more.

Therefore, if your library only shows 100 results in a Libscore search, that does not that mean only 100 developers throughout the world are using your work. (My own library, Velocity.js, for example, does not rank highly, but it’s commonly used on animation-heavy one-off subdomains/pages like this one from Mailchimp, or this one from IBM, but it’s not necessarily used on those respective websites’ homepages.)

Finally, keep in mind what type of data Libscore does not represent: Libscore does not address the tens of thousands of mobile apps that were coded in HTML and JavaScript then converted into native apps. Libscore also does not address JavaScript library usage on the back end (via Node.js). For that data, consult npm.

For developers

Check out the Libscore’s API to dive deep into Libscore’s monthly data set. If you produce interesting infographics and charts using, let us know on Twitter!

If you’re the developer of a library, get the embeddable link to your library’s Libscore badge, which publicizes your library’s site count. (Any number over 25 is something to be very proud of. Remember, only the top million sites are counted.) The badge image hits our API regularly to ensure its site count stays current. Consider embedding it into both your project’s Github README and official documentation page to showcase your library’s adoption at-large to potential new users.


Show your support

Clapping shows how much you appreciated Julian Shapiro’s story.