What’s in a Name: SR Data Science Python Libraries 2.0

Nicole Carlson
6 min readAug 7, 2020

--

Three hexagons with cartoon animals. The first has five octopuses, the second has a stork, and the third has a wildebeest.
Logos for octopod, stork, and wildebeest libraries

When I first started at ShopRunner three years ago, my coworker Hanna and I excitedly chose Harry Potter as the theme for our repo names. We both loved the series, and it seemed like it would be relatively easy to find names that loosely connected with our projects. Fast-forward to the dumpster fire known as 2020: JK Rowling (author of the Harry potter series) went into overdrive with transphobic comments on Twitter. Once I heard about those tweets, I knew we could not continue using Harry Potter names. This blog post is the story of how we tackled renaming our open source libraries.

Team Decision

After JK Rowling’s transphobic twitter spree, I raised the issue on Slack. I was a little nervous that not all of my team member’s would agree that this was important, but I didn’t need to worry at all. Everyone on the team immediately agreed that our names were problematic and that we should rename our repos. The only concern raised was making sure we didn’t make any changes that would break other people’s code if they were using our libraries.

We decided that the most important repos to rename were our open source projects since these repos are the public representation of our data science team. We wanted their names to be consistent with our values and to not cause harm to those who would be targeted/excluded by a naming scheme that promotes transphobia.

Announcement to company

At the same time that our team was discussing our repo names, our company’s LGBTQIA+ ERG was also discussing JK Rowling in their Slack group. I wanted them to know that we were taking this seriously even if there did not appear to be any immediate action.

We sent this message out to the #general Slack channel so everyone knew we were taking action:

Hi all, as many of you know, for the past three years, the data science team has named all of our repos after Harry Potter related stuff. Unfortunately, JK Rowling said some terrible transphobic things this weekend.

Because of this, the DS team has decided we are no longer going to use Harry Potter names for our repos. We could not in good conscience call ourselves allies while using names created by JK Rowling.

We are in discussion within our team figuring out a new naming scheme and our process for renaming old repos. We will do our best to mitigate the risk of breaking services. Please let me know if you have any questions.

We hoped that sending that message would help our company see that we were serious about putting in the work to rename our libraries.

Yellow hexagon containing a cartoon wildebeest sitting down with the word “wildebeest” underneath
Logo for wildebeest library

Adding notes to existing repos

We also wanted to get a message out to any public consumers of our libraries. Our team had three open source repos with Harry Potter affiliated names. Our first step was to remove all Harry Potter imagery from the repos and leave a version of this note (after consulting with our People team):

Note: Our team previously had a tradition of naming projects with terms or characters from the Harry Potter series, but we are disappointed by J.K. Rowling’s persistent transphobic comments. In response, we will be renaming this repository, and are working to develop an inclusive solution that minimizes disruption to our users.

A small step, but an important one to show we were aware of the issue and making plans.

Technical Plan for deprecating the old names :

Our next step was determining how the heck you actually rename a library on PyPI.

Google was not super helpful so I turned to Twitter for help:

Tweet from parsing_science: Hi #python friends, does anyone know what the best practices are for renaming a package on @pypi
Tweet asking for help with PyPI renaming

Luckily, Dustin Ingram wrote back and gave us some advice. He suggested different options depending on whether we wanted to force people to upgrade or if we were willing to cause breaking changes, etc. Side note: Dustin is one of the PyPI maintainers; his PyCon 2018 talk taught me that “PyPI” is pronounced pie — pee — eye, not pie — pie.

We decided our goal was to transition people away from the old library name to the new name without deleting the old versions. Dustin suggested we publish one final version of the old library with a warning about the renaming and then republish the library with the new name.

Choosing a new naming scheme

Now we had to choose a new naming scheme for our repos. This was by far the most difficult part of the whole process since my teammates (and I) are extremely opinionated.

After a lot of debate, we came up with the following criteria:

  • Not taken on PyPI (especially important for libraries if they are potential candidates for open-sourcing)
  • Has some kind of meaningful connection to what the repo does — e.g. canaria (Latin name for the canary species) for the model performance reporting repo because a model performance report acts as a “canary in a coal mine” for model problems
  • Single word (to avoid confusion around underscores vs. hyphens)
  • Inclusive (e.g. not associated with racist tropes)
  • [Optional] easy for our users to read and spell (if possible, choose words that people are likely to be familiar with)

We ultimately settled on using animals as our naming scheme: either a species or specific animal. Animals seemed like a safe bet because we thought we were unlikely to discover that an animal was a jerk (except for sea lions; those pinnipeds are mean!). An added benefit was we were able to keep some of our existing repo names since they were already animals, e.g. our API named newt.

Some discarded ideas included:

  • condiments (“what condiment means recommendation systems?”)
  • plants (“similar to animals but more boring”)
  • Chicago landmarks (“what if the architect/artist turns out to be a jerk”, sadly a highly likely outcome).

The hardest part of choosing new names was finding ones that were free on PyPI. Most of the common animals were already taken so we had to start digging deeper.

However, even the animal naming scheme had some potential issues. One name we considered for our multi-task learning library was Balaur because the neural networks contain multiple task heads. A balaur is a multi-headed dragon in Romanian folklore. My partner is Romanian, and after consulting with him and his family, we realized this would be cultural appropriation. We think this demonstrates how important it is to think through repo names and ask tough questions before choosing a final version.

Adding deprecation warnings

One of our data scientists, Morgan Cundiff, took the lead on the technical work. First, she created a branch where she added a warning message to every importable item in the library (https://github.com/ShopRunner/octopod/pull/18/files). This was the message we used:

rename_message = ‘Tonks has been renamed to Octopod. Please install and use the new package name from PyPI.’

Hexagon containing five different colored octopuses on a light blue background with the word “octopod” underneath
Logo for octopod library

Renaming the repo

We knew we wanted the renamed repo to be released right after the deprecation warning release so she also completed the branch where we changed the name tonks to octopod everywhere (https://github.com/ShopRunner/octopod/pull/19).

One important part of the process was renaming the Github repo itself so we consulted with Cloud Ops to make sure someone on our team had the correct permissions on Github. We decided to rename the repo instead of creating a new one to help force our users to switch to the new name.

One deliberate choice we made was to have no breaking changes in the first release. This was important to us because we wanted to reduce the friction of having to update to a new library.

Once we merged in these branches, our rename was complete!

Blue hexagon with a cartoon stork carrying a bundle containing the Apache Spark logo. The word “Stork” is at the bottom.
Logo for stork library

Introducing our new libraries: Octopod, Stork, and Wildebeest

Now we are pleased to introduce you to:

  • Octopod (https://pypi.org/project/octopod/), formerly known as Tonks, the ShopRunner multi-task deep learning library.
  • Stork (https://pypi.org/project/stork/), formerly known as apparate, a tool to manage libraries in Databricks in an automated fashion.
  • Wildebeest (https://pypi.org/project/wildebeest/), formerly known as creevey, a file processing framework, designed for IO-bound workflows that involve reading files into memory, doing some processing on their contents, and writing out the results.

We hope you enjoy using our newly renamed libraries!

--

--