Office Hours
How Music Made Me a Better Data Scientist
My music and data science journeys have shown me that personal growth arises from a greater understanding of the bigger picture.
I still remember picking out my first electric guitar on a hot Texas summer day 11 years ago. It was an unremarkable cream-white Fender Squier with slightly rusted strings, but I couldn’t wait to take it home. When I did, I took the first steps towards becoming a shred lord. I learned my scales, turned up the distortion, and conditioned my fingers to move faster and faster. I was so inspired by the awe-inspiring guitar solos of 70s–80s rock that I didn’t care much to learn anything else. There was little room for rhythm, chords, or tone; I just wanted to play fast and flashy.
After a year or two, I could stumble through a decent number of famous guitar solos, so I felt ready to join a band. I began jamming with some friends from my high school, and we tried our hands at writing original music. I saw this as an opportunity to write guitar parts that showcased my lead guitar skills whenever possible. In one fateful jamming session, my friends started playing a riff reminiscent of early 2000s rock from bands like Green Day or Blink 182, and they asked me to add a lead guitar part. In retrospect, this groove called for a simple, energetic, and melodic guitar solo. Instead, I aimlessly zoomed up and down the pentatonic scale. Once the music stopped, my friend gently told me that while my solo was impressive, it “did not fit” with what the band was playing. This feedback certainly caught me off guard — in my mind, an impressive guitar part was a good guitar part.
In the years following that incident, I began learning from more experienced guitarists and teachers. Time and time again, these mentors stressed the importance of “serving the song.” This means that a musician’s primary focus should be to produce music that connects with the listener, and the instrument is merely a tool to aid in this endeavor. Sometimes serving the song requires virtuosic solos, sometimes it requires playing something simple, and sometimes it requires not playing at all. This paradigm shift fundamentally altered my practice routine. In more recent years, I have focused more on developing my sense of timing, pitch, and harmony—all of which increase awareness of how one’s part fits in the bigger picture. Serving the song also encourages collaboration because musicians must coordinate to write parts that interact effectively.
So how does this relate to data science? Well, my journey in data science mirrors my journey in music in many ways. When I began learning data science, I was enamored with sophisticated machine learning algorithms, and I viewed them as the core competency of a successful data scientist. Now that I have more experience, I value other skills — such as writing production-quality code, business sense, and communication — much more highly than before. In other words, I learned the value of serving the song as a data scientist. In professional data science projects, the “song” is usually a product or a decision, and data scientists must learn to quantify their success in terms of their contributions to these endeavors. This mindset greatly increases a data scientist’s employability. As a recruiter, I would much rather see a resume that says “built a model that reduced maintenance costs by 10%” than “built a model with 95% accuracy.”
To be clear, many fast guitar solos serve songs, and many cutting-edge machine learning models improve products. Above all, it is important to understand what provides value and what does not. The guitarists I admire play virtuosic solos when the moment is right, but they strum open chords when the song calls for it. When I interview data science applicants, I typically ask an open-ended question about choosing an initial model for a basic binary classification problem. I have found that the candidates who suggest starting with a simple logistic regression model are the ones who can explain how more complicated models operate in detail. Ironically, some candidates who suggest starting with more complicated models cannot clearly explain how these models operate. In both music and data science, the most technically skilled practitioners typically recognize that many situations do not benefit from the most advanced techniques in their toolkit.
Looking back, some of my overemphasis on playing fast as a guitarist came from playing the video game Guitar Hero. In this game, players earn points for each correct note they play, making it so that fast songs with many notes are ideal for achieving high scores. Although I still love Guitar Hero and credit it with instilling my initial passion for guitar, its incentives led me to overlook vital aspects of playing an actual instrument with others. I wonder if Kaggle competitions have a similar effect on some aspiring data scientists. These competitions certainly advance the field and are great for learning, but I fear that they lead some to overemphasize building performant models and underemphasize generating business value.
Sometimes there is actually a tradeoff between a model’s performance and its value. This tradeoff is often due to time constraints. On some projects, I have faced decisions of whether to spend my time making models more performant or more impactful. Making a model more performant includes tasks such as feature engineering, evaluating different models, and building ensemble models. On the other hand, making a model more impactful includes tasks such as deploying it in a production environment, configuring it to send automated notifications, and improving its documentation. The tradeoff between performance and impact sometimes occurs in model selection as well. Simple models such as linear regression are easier to explain to non-technical stakeholders than deep neural networks or random forests. Stakeholders are more likely to trust and ultimately act on the results of a model they understand. Consequently, there are some situations where it is desirable to sacrifice some performance for impact to maximize value. One hallmark of a mature data scientist is the ability to discern these situations from those requiring more rigorous approaches.
Ultimately, my music and data science journeys have shown me that personal growth arises from a greater understanding of how one’s contribution fits into the bigger picture. In both endeavors, I once believed that excellence consisted merely of mastering difficult and complex skills. While these skills are useful, I now recognize that excellence is the ability to “serve the song.”