Teacher to data scientist is a pretty short swim

Jesse Markowitz
Age of Awareness
Published in
9 min readAug 2, 2021
It might be a short swim, but it’s a deep pool. (Photo by David Boca on Unsplash)

Where is data science?

The more you learn about data science, the more you see it everywhere. The New York Times’s daily covid data graphs. Nate Silver’s election projections. Johns Hopkins predicting covid-19 progression in patients. Waze optimizing your drive in real time. Apple Music curating a list of new music with artists you’ve never heard of. Once you peer behind the curtain, what happens on stage becomes even more impressive. It’s not magic — it’s just data.

The failures are there too. Gaps in the data because of a lack of will or resources, or too much faith put in imperfect algorithms. The Horizon scandal in the UK demonstrates what can happen when we blindly accept the output of a block of code we’ve never seen. Caroline Criado-Perez illustrates in Invisible Women how half the population is excluded from medical research, auto safety testing, and policy considerations. Cathy O’Neil points out in Weapons of Math Destruction that many job applicants are rejected by a computer before their resume ever reaches human eyes. ProPublica and others have shown how algorithms in the criminal justice system, created using decades of racially biased data from the past, perpetuate those patterns today.

I take encouragement from the positive examples — they are many and they are inspiring. But it’s the failures that get me fired up. I want to work with the hard societal problems, the ones that seem ancient and intractable. I see data science as a tool for addressing problems at a scale too large to tackle without technological assistance. It’s the submersible at the bottom of the Mariana Trench — without it we’d be crushed by the volume around us. There are just too many data points. We need the digital world, but in service of the human world.

Dive in. (Photo by Michal Mrozek on Unsplash)

Where is the data?

Until recently, I was fully immersed in the human world, trying to alter one young data point at a time. For the last several years I taught 6th grade science in the public school system in New York City. The field of education often felt lacking in usable data for making decisions in my classroom. In some ways that’s an unavoidable aspect of the job — no one else was teaching science to my students at this stage in their development. Not to mention the vast differences between school districts, schools within one district, or students, teachers, and classrooms within one school. Most of what’s available is anecdotal: “I did this in my classroom and it was the best ever! You should do it too!”

This lack of data, and the limited application of the data that does exist, required me to experiment and evaluate on the fly, continually assessing the realtime qualitative and quantitative data I was getting from my students. Who understands my instructions? How many students have mastered this concept? What’s the distribution of grades on this quiz by class? By question? By student? Why did this class do better than that class on this project? Every choice I made, everything I did or said, everything I asked my students to do — it all generated data I had to sift through and interpret in order to decide what to do next.

Solving a big problem at a small scale

My last year in the classroom started out of the classroom — over Zoom. My biggest problem at the beginning of the year was how to form a community out of Zoom boxes. I had to get to know my students and they had to get to know each other. I couldn’t teach people I knew nothing about, but how could I learn anything about them when most students didn’t turn on their cameras and some didn’t even speak? I asked my students to create simple surveys and share them with each other, then use the results find out new things about their classmates. A lot of students asked about favorite colors, or school subjects, or anime. A few asked about sports and extracurricular activities. Some about family. Many asked questions with an eye toward making new friends.

I started with questions I knew they’d want to answer.

I gave them a survey of my own. I started by asking about favorite seasons and fruits — easy, fun questions to get them into it and model a couple of low-key questions. Then I got into the data I was really after for myself. What kind of technology did they use at home to access school online? Did they share their tech with anyone else? How loud was it around them? How private? Did they have good headphones or wifi? In the new world of teaching online I needed this information to know how to help them. I had no idea what to expect from my students and didn’t want to assume anything. I knew many of them had received iPads or laptops from the school back in the spring or in the beginning of the new school year, but I knew nothing about their tech savvy or work environment. Looking at the data gave me some insight into my students’ new lives as remote students.

But I wanted to go a little deeper.

Nothing like freshly cleaned data!

I gathered all of the data from every class into one big spreadsheet. I made a copy and got rid of the stuff that was irrelevant (a timestamp for each response) or that I didn’t want my students to see (emails, names). I renamed columns. I realized that the short-response questions needed to be organized. I changed “brother” and “sister” to “sibling”. Some questions allowed multiple answers, so manually changed some responses to “multiple”. Next I needed to visualize it all. I made bar graphs and pie charts and histograms of the different rows, arranging and editing them all on a separate sheet that I could lock and then send to the students to analyze for themselves.

The power of a graph — instant visual understanding.

But I wanted more. Wouldn’t it be nice if instead of scrolling through a big page of charts, my students could just choose a topic from a list? Down the rabbit hole of query functions and select statements and data validation I went. I had touched SQL a little before so the concept was somewhat familiar, but the symbolic syntax of cell names with dollar signs and ampersands was a confusing new layer. After an hour I finally emerged victorious — I could click a drop-down menu cell and, depending on what I chose, display a graph of one class’s results or one question using the whole grades data.

Choose a class for the Seasons question … or choose a question for the whole grade

But…wouldn’t it be better if you could choose a question to see the whole grade’s result, then also choose a class to compare to the overall data, on that same question? After another hour of googling, reading, trying, testing, googling again, and testing again, it worked. It was late at night. I still had no idea how or when or if I would even be able to incorporate this into an upcoming lesson. And I definitely didn’t know if my students would think it was nearly as cool as I thought it was. But I had done it.

Choose a question to view, then a class to compare to the whole grade.

I needed to know my students without meeting them in person. I could have probably played an ice breaker game or just talked to them or asked them to draw a picture of themselves. I made a spreadsheet. And loved it. If that’s not a clue that data science is a good fit for me, then I don’t know what is.

My first attempt at communicating my results goes as well as you might expect

I tried to explain how amazing all of this was to my wife, who it turns out is closer to the population mean in terms of her excitement about spreadsheets.

Me: “I did it! Check it out. Here are all of the kids’ responses, and I made some charts for each question, but look — you can compare each class to the overall grade. Isn’t that cool?!”

Wife: “That’s great honey, I’m thrilled for you about your spreadsheet.”

Me: “Isn’t that cool though?”

Wife: “Amazing.”

Me: “You don’t care, do you?”

Wife: “I’m so happy that you’re so happy about it.”

Turns out I was doing data science, but without all those nifty libraries

So what did I learn? That summer was unsurprisingly the most popular season for 6th graders, but winter was popular too. That most considered themselves strawberries or pineapples, but a strange few chose dragonfruit. That about a quarter of my students shared their technology with someone else at home, so I shouldn’t expect them to always be available. That a few students had only their phones or no reliable technology at all. That only a quarter of my students were the only ones doing remote learning at home — most would be near another student in another class or grade or school. That most students didn’t have a dedicated work area just for themselves. Most students dealt with background noise every day. And very few had reliable privacy during class.

So what did I do? I created work in multiple formats for multiple devices. Any directions I said out loud I also put in writing. I asked students to message me whenever they had technology issues and checked in constantly. I was flexible with deadlines and understanding about sudden wifi blackouts. I told jokes and put outdated memes into my lessons that I knew would make the kids cringe and roll their eyes. I tried to incorporate hands-on, non-tech projects whenever I could. I did everything I could to make sure my students would want to log onto science day after day from home.

Diving into the process of data science in my first two weeks at Flatiron, I look back and realize I was doing data science without knowing I was doing it. I used Google Sheets, not Pandas. I made simple pie charts and bar graphs with default settings, not Matplotlib. I had 92 responses, not a database with hundreds of thousands of rows. But my process was the same: collecting and aggregating data; cleaning it; visualizing it in different ways; drawing inferences and conclusions; and using those conclusion to make decisions. I found and read read documentation for functions I had never heard of. I tweaked colors and labels on my charts. And when I finally got the graphs to change based on drop-down menus and query searches, I threw my fists up in victory.

Why data science?

I used a little data to glimpse the beginning of a solution: Who are my students, what are they like, and what do they need? Thinking of my students as a whole at first helped me to get to know them individually and provide for their needs. This project taught me the value of data. Without my survey, I wouldn’t have known if students were prepared for online school. I might have assumed they all had their own desk at home, or their own device. This project reminded me that even the large-scale problems are rooted in the small-scale. Technology in the service of individuals.

There’s often an incredibly vast amount of information hovering over you all the time. You have to block out most of it just so you can get through the day. Opening the spigot and standing in its unfiltered flow would overwhelm you. Data science is a diving dress that lets you to wade in deeper and look around, to hopefully glimpse what would otherwise remain hidden, to sift through all that data, run your fingers through it, hoping to catch something you missed, closing your hand around some new truth that you haven’t seen before.

Diver in standard diving dress, Ožbalt, Slovenia (1958) [https://en.wikipedia.org/wiki/File:Gradbi%C5%A1%C4%8De_hidroelektrarne_O%C5%BEbalt_1958,_potaplja%C4%8D.jpg]
Wade in and then just keep wading. (Diver in standard diving dress, Ožbalt, Slovenia [1958])

My survey and my data

--

--

Jesse Markowitz
Age of Awareness

Data scientist with a background in science education and a passion for creative problem solving for public good. New York, NY