A brief analysis of class…. assignments.

What we can learn about sociology from a collection of course syllabi.

Kristin Thomson
8 min readMay 18, 2016

The Open Syllabus Project (OSP) is an initiative that has collected and mapped over one million syllabi from college-level courses. Joe Karaganis and David McClure explain the project in this New York Times piece, and the website offers more detail about the project’s methodology, including their Teaching Score criteria.

In January 2016, the OSP launched the Open Syllabus Explorer, which ranks the texts in terms of the frequency with which they are taught, and allows exploration of the corpus on the basis of author’s name, title, course subject, and host school. It’s also quite easy for users to extract data from searches of the Open Syllabus Explorer. I was intrigued by Andrew Higgins’ analysis of philosophy texts ranked in the OSP, so I thought I’d try something similar. I extracted the metadata and rankings for the top 500 sociology* texts. Then I manually added in:

· Author’s year of birth

· Author’s year of death (if applicable)

· Author’s gender

· Text’s year of first publication

…for each of the top 500 assigned texts.

When combined with the existing OSP data — Teaching Score, ranking, text title and author — it is possible to analyze the assigned texts through interesting new lenses. I used Tableau Public to analyze and display this data. Visit these workbooks to interact with the sociology data in real time, or read on for some topline assessments.


First, let’s look at the authors of these texts. By manually entering in their birth and (if relevant) death years, we discover that 60% of sociology text authors are currently alive.

It would be interesting to compare this percentage to other disciplines’ texts. It is certainly logical to think that sociology — a field that consolidated in the early twentieth century around the study of present day society — might have a stronger bias toward recent scholarship than, for example, history or literature, while life sciences — highly dependent on evolving knowledge and methods — may have an even higher percentage of living authors.

Gender of authors

Female authors are underrepresented in the corpus: only 18% of the top 500 texts are written by women. Barbara Ehrenreich’s Nickel and Dimed is the first, coming in at #21 on OSP rankings, followed by Candace West and Don Zimmerman’s article “Doing Gender” (#24) and Arlie Russell Hochschild’s The Second Shift (#29).


The ratio of female to male authors improves for books published more recently. The pie chart on the right looks at the gender breakdown for texts published between 1970 and 2015.

Clearly, a simple pie chart cannot capture the complex issues that determine what texts are assigned to sociology students. As with the living authors chart above, consider this an interesting data point that could encourage conversations about the trajectory of female authors in sociology, as well as comparative research on the gender of authors in other fields of study captured by the OSP.

Author’s age when texts are published

The addition of dates of birth and death for authors and the year of publication for the top 500 texts leads to two interesting analyses.

First, let’s look at the relationship between authors and their own published works.

The box and whisker plot below shows the range of authors’ ages when their texts were published. Half of the texts — those inside the shaded box — were published by authors aged 38 to 54. The youngest authors of a top 500 sociology text were 25, the oldest was 80, and the median age was 45. Marx and Engels were in their late-twenties when some of their early works were published. Howard Saul Becker was 25 when he published Becoming a Marihuana User, while Charles Tilly was 80 when his final edition of Social Movements, 1768–2008 was released.


Now let’s look at the same data cross-tabbed by authors’ gender. The top box plot shows female authors, while the bottom plot shows males. With the women, the earliest author age is 27, the oldest is 66, and median age is 47. With the male authors, the youngest is 25, the oldest is 80, and the median age is 44.


The box plots also visually illustrate two interesting points. For women, there is scant authorship below the age of 39, and a visible lack of authorship between 29 and 32. For men, not only is the prime publishing age window larger, but the age point of authors is more continuous.

The box plot below focuses on texts published between 1970 and 2015. While there is less disparity between male and female authors on more recently published works, gaps in the publishing age still exist.


Perhaps this difference can simply be chalked up to an abundance of male authors, but the delayed and shortened publishing window for women is a revealing finding based simply on the analysis of syllabi. I would be curious to see similar assessments of gender and publishing age in other disciplines.

Publication year

Finally, let’s look at the publication year of assigned texts. In the OSP, a quick scan of the top ranked sociology texts shows works by Marx, Mills, Weber and Durkheim, but there are plenty of late 20th century scholars as well. Examining a longer list of assigned texts, what can we surmise about the eras of social research and theory from which professors are drawing?

The chart below plots the top 500 assigned sociology texts on a timeline based on the text’s year of first publication (I used the first edition publishing year when applicable). The height of the bar reflects a basic count of the books that have that publishing year.


The histogram suggests that the majority of the top 500 sociology texts have a publication year between 1950 and 2008. By this simple measurement, the most well-represented publishing year is 1995, during which twenty of the top 500 texts were released including Connell’s Masculinities (ranked #58 among sociology texts), Bourgois’ ethnography In Search of Respect: Selling Crack in El Barrio (#71), Barber’s geo-political assessment in Jihad vs. McWorld (#75), and Newman’s textbook Sociology: Exploring the Architecture of Everyday Life (#88).

However, some of the texts that were published prior to 1950 are real heavyweights. The high OSP Teaching Scores for works by Marx, Engels, Durkheim, Tocqueville, Mills and Weber indicate that they are still widely assigned.

To account for the popularity/OSP Teaching Score of texts, the timeline below plots the top 500 ranked texts by publication year, but in this case the height of the bar is the aggregate Teaching Score for texts published in that year.


Here is the same book publication by year data, flipped on its side and sorted by the publication years with greatest aggregate Teaching Score.


The sorted chart above shows that the publication years with the greatest accumulated Teaching Score are:

1. 1990 (example: Giddens’ Consequences of Modernity)

2. 1995 (ex.: Connell’s Masculinities)

3. 1983 (ex.: Gellner’s Nations and Nationalism)

4. 1993 (ex.: Ritzer’s McDonaldization of Society)

5. 1984 (ex.: Luker’s Abortion and the Politics of Motherhood)

Sociology is a dynamic discipline, so the inclusion of many texts published in the past 30 years is not surprising. Nor is the continued importance of the foundational sociology texts published between 1850 and 1950. But perhaps we can see another kind of generational dynamic at work here. Most of the OSP collection comes from courses taught between 2006 and 2014. Perhaps the emphasis on works published from the mid 1980s to the mid 1990s reflects a process of canonization that takes roughly 10 or 15 years, as faculty in their 40s become senior faculty in their 50s or 60s, balanced by the need to assign material that is still feels relevant to the analysis of contemporary problems, which may have a roughly similar temporal horizon. Again, the OSP offers only some data points, at present, toward an understanding of contemporary sociological knowledge. But they are suggestive ones and worth further exploration as the data set matures.

* Notes

I undertook this data entry project for three reasons. Sociology was my undergraduate major and closely related to my graduate work in urban affairs and public policy, so I was curious about the texts and what additional data would reveal about the corpus. I was also excited to access a well-organized, robust, and open data set so I could use Tableau Public to both analyze and share the results. Finally, I hope that my experiments with a subset of data can help the OSP team as they consider future phases of this project.


I used OSP’s “Filter by Field” option to focus on the sociology texts, of which there were 53,177 total. I copied the top 500 texts straight off the website and pasted it into Excel. After some formatting cleanup, I used Google searches by author name and by text title to discover publication dates, author gender and author birth and death years (if applicable). In many cases, the author was notable enough to have an entry on Wikipedia. In other cases, I would search for information via Google Scholar or on the copyright page of books listed on Amazon or publisher websites.

After completing the data search for the top 500 sociology texts, I loaded the sheet into Tableau Public. From there, I was able to generate various analyses of the data and chart views. Visit this workbook to view the data.

Kristin Thomson is co-director of Future of Music Coalition’s Artist Revenue Streams project, a multi-method, cross-genre examination of musicians’ income streams and how they are changing over time. @kristinthomson