Reading List

What are the best practices in data science documentation?

Photo by Sigmund on Unsplash

Data science projects increasingly use a complicated tech stack, and documentation that omits important information may prove to be a failure point when the project changes hands. I experienced this firsthand during a data science internship. Our team was brought on to refine and extend an existing machine learning model in production. We took care of a lot of data cleaning, as the dataset was dirty and lacked a data dictionary. We inherited a significant codebase from the previous team–without any documentation of their process or assumptions.

We ended up starting over from scratch. It was easier than struggling to…

Reading List

Learn how to write better for your colleagues and peers

Photo by Freddy Castro on Unsplash

The data science blogging ecosystem is rich and growing. TDS alone has an archive of more than 20,000 posts across numerous topics. Many experts have launched Substacks, newsletters, or personal blogs. If you’re looking for great new reads to add to your roster, check out Vicky Boykis, Randy Au, or start from this list of ten foundational ML blog posts.

Many of us are also interested in contributing our thoughts and perspectives; one of my goals for 2021 is to write and publish more. But for those of us more technically inclined, writing can feel harder than coding. How do…

Making Sense of Big Data

How we built and iterated on a machine learning model to identify instances of illicit Russian arms trade.

Written by: Elliot Gunn

Data science team: Sean Antosiak, Elliot Gunn, Andrew Mikol, Jason Nova

Project Lead: Elan Riznis

In this post, we describe how we put together a novel approach to use machine learning on a large trade dataset to identify Russian parties that ship to foreign militaries, or parties acting on behalf of Russian state military firms.

Russia is the world’s second-largest arms exporter. The sheer volume of trade data available poses a challenge and an opportunity to researchers and government agencies in the arms non-proliferation field. …

Art curation has been heavily biased towards supporting male representation in the most elite art institutions. In 1985, a group of anonymous American female artists, the Guerrilla Girls, plastered New York City with 30 different posters. In fact, the group came about when Museum of Modern Art (MoMA) held an exhibition where less than 10% of the artists featured were female. Their work sought to motivate “museums, dealers, curators, critics and artists who they felt were actively responsible for…the exclusion of women and non-white artists from mainstream exhibitions and publications”.

Where did they get that 5% statistics from? They created…

It’s been said that data scientists spend most of their time cleaning, manipulating, transforming data before it can be analyzed. And it’s not the most fun, as you can see below.

Because of my background in economics, I had previously encountered this very tedious process in Stata. If you type “stata” and “reshape” into Twitter’s search function, you’ll find a community utterly befuddled by and frustrated with the simple process of trying to change the data set from wide to long or vice versa.

In Python, reshaping your data feels a little more intuitive. If you’re using Seaborn, for instance…

Elliot Gunn

Data + Editorial @ TDS. Prev @ C4ADS (data science intern), Global Strategy Lab (research fellow). @elliot_j_g for tweets on data, writing, econs, tech!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store