Textbook transparency: estimating textbook costs for students

Harshita Gupta
Harvard College Open Data Project
3 min readOct 30, 2016

Hello all, my name is Harshita Gupta! I’m a hybrid technologist-humanitarian in my freshman year at Harvard. Split between concentrating in computer science and social studies, I’m always looking for ways to integrate the two. I bounce my time between comping the Crimson, volunteering for Harvard Square Homeless Shelter, reading for the Humanities Colloquium, and splitting hairs over Systems Programming psets.

I’m excited by the meeting of people with different minds, and as a melting pot of policy-makers, human resource analysts, and engineers, open data is a great place for that. The way I see it, cross-collaboration is the way to better serving people and their needs. Having reliable, accessible data lays the foundation for engaging in conversation with that data and using it to improve student life at Harvard — and that’s what I’m all about.

The challenge: publishing the prices of course books

Course materials and the miscellaneous costs of being a Harvard student add up quickly, and they can often be overwhelming, especially for those on financial aid.

The prices of course textbooks have been escalating and turned into a sort of “hidden costs” metric for courses. For most classes, there’s no reliable information online for how much money a student can be expected to pay over the course of the semester for materials. Case studies, online portal access keys, and physical textbooks are all priced differently and not transparently. There’s no reliable “average textbook price” metric, and certainly no way to keep professors and publishers accountable for escalating and unfair costs.

The Harvard Undergraduate Council has recently made it a mission of theirs to address textbook prices and the financial barrier this presents to students with limited financial means. A simple look through the Coop’s textbook search tool (which is one of our data scientists’ first data extraction destinations) reveals that courses often have multiple required and recommended texts. With rental options from Amazon and discount deals from eBay, textbook data is scattered and hard-to-find, to say the least.

We aim to find a way to estimate the price of the books and other materials required by each Harvard course. Then we’ll visualize that data so that that students can save money when planning courses, libraries know which books to keep more of in reserve, and professors can be lauded for reducing their courses’ prices. This presents an exciting data science challenge, and is ripe space for expanding into data visualization, machine learning, and policy writing.

A dynamic, cross-functional team

I’m proud of and excited by the team we’ve put together this semester: at four people strong, we’re the perfect size for individual agency and tight collaboration to reign. I’ll be giving direction to our data science exploits and partnership collaborations, and working to merge the two into a cohesive product. Maxwell Levenson, a senior, and Dhruv Gupta, a freshman, will be handling the nitty-gritty with our data science work- scraping websites and picking the best APIs to build a reliable database of course book prices. Sarah Wu, a sophomore, will be leading our partnerships ventures with Harvard faculty and the Coop.

As we dive in, we’re beginning by learning about the data that’s already out there and accessible. Maxwell and Dhruv will extract what they can from textbook provider’s websites for course books and prices — we’re running the full gamut, from SlugBooks to Amazon. Sarah and I, meanwhile, will reach out to and begin collaborating with the Coop and Harvard faculty. We’re hoping to gain access to the machine-readable data already out there, so that we can focus on turning it into a dataset useful for students.

It’s sure to be an exciting experience learning about the ins and outs of open data at Harvard, and we hope to emerge with a way for students to take charge of their education, and institutions to be held to higher standards of equity and access.

--

--