Ethical Design in CS 1: Building Hiring Algorithms in 1 Hour
I am an Assistant Professor of Computer Science at Bucknell University, a liberal arts university in central Pennsylvania. You can learn more about me on my website or on Twitter.
(Late edit: You can find an updated version of the activity talked about here at https://ethicalcs.github.io)
It’s no secret that algorithms impact our lives. 72% of resumes are never seen by employers, and we know that hiring algorithms aren’t exactly neutral. As we continue to shift the decisions of many into the hands of a few programmers, how can we ensure that our CS majors are reflective enough to consider the tradeoffs?
Can we build this kind of ethical thinking directly into Intro to CS courses? I think so — and I’ve written about it in the past. Briefly, I believe…
- … that ethical thinking should be integrated into existing CS curriculum from the very first day.
- … that ethical thinking should be paired with programming projects.
With those goals in mind, I created a short, simple assignment that can be integrated directly into my CS 1 course, and that can push my students to reflect on the tradeoffs of automated screening.
Programming and Ethical Objectives
Context: This module is built for a CS 1 course. Our CSCI 203 course uses python, and is built on the Harvey Mudd’s CS for all intro course.
- Programming objective: To improve competency with nested for loops and 2D lists (or arrays).
- Ethical objective: To reflect on the tradeoffs to between human and automated decision-making.
The Scenario: A Resume Screening Algorithm
Imagine you are hired for Moople, a well-known tech company that receives thousands of job applications from students every year. You are asked to create a program that algorithmically selects the applications that are worth a second look. Moople claims that algorithms will reduce costs and help negate the biases that result from tired application readers (will it?).
To help with this process, Moople’s application collects numerical data about each applicant’s computer science education. Applicants must enter the grades they received across 6 core CS courses as well as their overall GPA. The information for each applicant will be stored in a list, such as the following:
[100, 95, 80, 89, 91, 75, 83]
This list will represent their performance across the following courses:
- [0] Intro to CS: 100
- [1] Data Structures: 95
- [2] Software Engineering: 80
- [3] Algorithms: 89
- [4] Computer Organization: 91
- [5] Operating Systems: 75
- [6] Overall College GPA: 83
Index 0 always refers to Intro to CS, index 1 always refers to Data Structures, etc. If there was just 1 applicant, we’d just read it ourselves. But reality is messier. We have to deal with thousands of applications (a list of lists).
By the end of the class, students must…
- Determine the criteria by which they are going to select the top applicants.
- Given a list of 5000 applicants data (a list of lists) write a python function that returns a new list of worthwhile candidates.
- Articulate the tradeoffs of their criteria.
The Material
The following material is uploaded to this Github repository (please help me make it better!)
- Basic template code for the hiring algorithm: This is where students will write their algorithm — iterating through a 2D data structure using nested for loops.
- A python module that contains a list of 5000 applications: At this point in our course, students haven’t seen file IO, so we use a python module that includes 5000 randomly generated applications.
For the instructor:
- A second set of 5000 applications that intentionally hides a few ‘interesting’ cases. (more on this in a moment).
The Reflection: What did we miss?
Once students have finished (or come close to finishing) their algorithms, it’s time to chat:
- What criteria did you select? Why?
- What kind of applicants were you trying to make sure got through?
While I’ve found that students are naturally critical of the kinds of applications they admit, reframing the question to who did you miss? empowers an entirely different discussion.
To help nudge the reflection, I also reveal a set of stories that we could easily imagine happening to applicants — everything from input errors to unexpected data that results from real-life crises:
What if someone misinterprets the instructions?
[4, 3.9, 4, 4, 3.95, 4, 3.9]
[‘A’, ‘A’, ‘A’, ‘A’, ‘A’, ‘A’, ‘A’]What if someone skipped CS 1 and thought that putting a -1 would make that obvious?
[-1, 95, 99, 94, 96, 98, 95]What if someone makes an error in their entry?
[100, 100, 100, 100, 100, 100, 10]
[681, 68, 73, 70, 81, 91, 59]How should we compare two students with different trajectories?
[65, 75, 85, 95, 100, 100, 80]
[100, 100, 95, 85, 75, 65, 80]What about a student who dealt with a personal trauma one semester?
[95, 93, 50, 91, 98, 90, 90]
And this is where we get to that key moment of reflection… What have we lost in our transition to automation?
In some cases there are simple algorithmic fixes, such as input validation. But in other cases, students find that their algorithms excluded applicants who they may have picked by hand, and that every additional tweak seems to unintentionally ignore other applicants they meant to keep. This brings up questions that are wonderful to start considering from the first moments of their CS education…
- What does it mean to design a fair algorithm?
- What is the human cost of efficiency? More permissive algorithms may capture more interesting candidates, but it also means more costly, human work. What is the ideal balance?
- What systemic advantages/disadvantages are your algorithms likely to amplify (particularly focusing the last 2 stories above)?
I often tell my class that I don’t know the answers to many of these questions, but if this exercise prompts a moment of reflection later in their career, then we’re at least on the right track.
There’s time in your content-jammed course
If someone tells you there isn’t time for ethics in CS 1, don’t believe them! I’m excited to slowly build a repository of exercises that fit right alongside the normal curriculum in CS 1 courses. This hiring exercise helps reinforce nested for loops and 2D arrays, Justin Li has a fantastic lab on diversity I’ve adopted to help students practice if/else if/else statements, and I’ll use the Ethical Engine project (along with Justin’s reflections!) again to help teach Object-Oriented Programming.
There’s still a lot of work to do, and I’d like to think more carefully about the opportunities for reflection that I give students. But to me, the only thing stopping the integration of ethical thinking in Intro to CS is a lack of imagination and effort. Just as we reinforce variable naming throughout our CS courses, by designing assignments that naturally pair with existing curriculum, I believe that we can also reinforce important habits of reflection.
So please build an activity and tell me about!