My first attempt at coding came early. Back in middle school, a friend and I were obsessed with creating digital paper dolls — hundreds of drag-and-drop outfits that fit on a single, problematically waifish figure. I’d spend hours drawing pixelated clothes in Microsoft Paint, uploading the jpegs to my bare-bones Angelfire site, and tweaking the HTML until it worked just so. Every time I successfully ported a new image over, I’d get a little thrill. I built a thingie! The project was purely for my own amusement, but if I was particularly proud of a crop halter-and-bell bottoms ensemble, I’d put the URL in my away message on AIM, the precursor to Twitter’s “this ====>.”
It’s taken me more than 15 years to dip my toes into programming again. And I know why. Every time I’ve tried to “learn to code” before, it was to build a skill that I abstractly believed would be useful. The last time I gave it a shot was during my first years as a fact-checker at Wired, when I tried to teach myself Python in the lull after we shipped the latest issue off to the printers. I got partway through an online course before abandoning it, unable to connect the dummy practice examples to anything I experienced in my work.
This time, at Stanford’s John S. Knight Journalism Fellowship, I have a real project to hack on — and it’s made all the difference. During my first quarter, I wondered how journalists could learn from the successes and failures of scientific publishing. That led me to think about the scientific literature not as a subject to report on, but as a potential journalistic tool. Scientists struggle to keep up with the thousands of papers published every year, so they’re experimenting with AI-driven search engines to surface the most important stuff for them. Why couldn’t journalists build something on top of that huge pile of data that serves their own investigative ends?
All of the Python I’ve stuffed into my brain this quarter has been in service of understanding what that something could be. Here’s where I started: Almost every scientific publication has a section that lays out the researchers’ personal and financial interests in their subjects. I want to see if it might be possible to build a database of those conflicts, mining the text for connections. Scientists may not always reveal everything they should, but allowing journalists to easily search for public disclosures could be a way to identify leads and avoid biased sources.
If you’re chuckling to yourself: Yes, I know this is enormously difficult. But I decided to try programming again because I knew I needed to have some minimum technical proficiency to understand what’s doable. And in the process of realizing what is not possible, I’ve learned so much more that will support my journalism.
I dove in with two data journalism-focused courses — taught by the incredible Serdar Tumgoren and Cheryl Phillips — which introduced me to the basics of the Unix shell, Python, and API interfaces. Both courses gave me a sense of the quality and formatting of data that journalists need to produce bulletproof investigations. With that background, I was able to reach out to researchers using natural language processing, who pointed me to the existing APIs and mining interfaces that could support my work and helped me troubleshoot the many hurdles along the way.
As I use my buffed-up Python skills to mess around with open science APIs, I’m discovering more than I ever knew about how those databases are constructed. To pull just the metadata I want out of PubMed Central, the public source for NIH-funded research, I had to understand its inherent organization. In the process, I found new search parameters I never knew existed. And as I chip away at each part of the problem — first collecting the papers to analyze, then mining the text I need out of them, then organizing them in a searchable fashion — I’m learning discrete programming skills that I can apply to future projects.
None of this would feel remotely fun if I didn’t have a project I was passionate about. Googling Stack Overflow was like trying to learn a language by reading the dictionary until I was googling for a reason. And there’s no way I would spend this much time staring at error messages unless I could imagine the pot of gold (or dossier of docs) at the end of the rainbow. For now, conflicts of interest are my digital paper dolls.
They even give me the same zing of accomplishment I got back in middle school. My fellow fellows have gotten used to the signals: After working silently, head down in my laptop, I’ll push back in my chair, raise my arms, and triumphantly whisper: “Yesssss!” Whatever just happened, it was probably the tiniest, most marginal step in my actual project. But it got me that much closer to a source of knowledge that doesn’t yet exist in the world. I can’t imagine anything more exciting.
A footnote: If you don’t have a project to test things out on, that’s just fine. Coding isn’t a necessary skill for every journalist — at least not yet — but the fruits of other journalists’ code can absolutely be part of every reporter’s toolkit. My courses this quarter gave a broad survey of open source tools that do the hard work of tracking and scraping valuable sources, no programming required. You don’t need to be at Stanford to learn about some of these shortcuts for yourself. If you’re like me, you didn’t know any of these applications existed, and you wouldn’t have even thought to search for them. As soon as I learned about them, I couldn’t help but imagine applications on my beat. Maybe you will too.