Analyzing Books Cited in English Wikipedia

Numbers

3.79    million citations
1.7 million ISBN citations (books)
684,965 unique ISBNs

Year Published

Book count by year published
1999  19,379
2000 21,908
2001 22,393
2002 24,393
2003 26,782
2004 29,326
2005 30,283
2006 31,702
2007 33,039
2008 30,625
2009 29,421
2010 28,975
2011 25,856
2012 24,438
2013 24,111
2014 17,474
2015 12,012
2016 10,611
2017 6974
2018 927

Authors

  1. A lot their unique works are cited on Wikipedia (quantity, many works represented)
  2. A lot of different articles cite their works (maybe the same work cited in 1000 different articles)
Top 50 Authors by Most Article References
Top 50 authors by most articles

Holding Count

Books held by institution
Books held by institution 0–202 group

Subject Headings

Top 50 subject headings used

Data

684,965 records
New Line delimited json file (each line is its own json object)
Fields:
'title' : 'title of the book'
'isbn13' : 'isbn 13'
'year' : 'year published'
'isbn10' : 'isbn 10'
'oclc' : 'oclc number'
'authors' : 'array of authors'
'holdings': 'holdings count from oclc'
'oclcOWI' : 'oclc classify ID'
'google' : 'google books id'
'pages' : 'array of wiki article titles'

--

--

--

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Data and Tax

Assimilation of Spark Streaming With Kafka

Deploying Kubeflow to a bare-metal GPU cluster from scratch

Datastream instructor training 20190611: panel data and request tables

Complete Beginner’s Guide to Regularization

Starting from the start

Artificial data give the same results as real data — without compromising privacy

What Would Actually Happen If You Swam in Lake Merritt?

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Matt Miller

Matt Miller

More from Medium

Prison Mike analyzing data

Google Trends Keyword Tracker : Climate Change, Symptom, Near Me

Pyramid Analytics’ 2021 in Review

Brandwatch Becomes More Agile in Delivery of Digital Intelligence Insights to Customers