5 Things I Learned from My First Data Project
Last weekend, I finally finished my first data project and put it on my GitHub. It was a fairly simple EDA on Oklahoma powerlifting, using data from the ambitious and amazing OpenPowerlifting project. In their own words on their website, “OpenPowerlifting is a community service project to create a permanent, open archive of the world’s powerlifting data.” My project would not have been possible (or at least incredibly more difficult) without their project.
Anyway, when I finally crossed the finish line and took a deep breath, I had a lot of thoughts about the things I’d learned along the way that I wanted to get down on paper. Frankly, I expect to be reminded of these lessons over and over again for the rest of my life!
In no particular order:
Understanding your data is crucial.
One thing that exploring this data forced me to do was… explore the data. Among other things, that involves learning how the data was collected and doing outside research to gain context on the data. For example, I never thought about how the data was provided by each federation individually — or knew that a lifter could enter multiple divisions in the same meet. I knew Wilks existed but never took the time to put different Wilks scores into context. What scores were unrealistic? Was 400 good? How much better is 400 than 350? (For the record, in order: 600 is around the upper end of possible. Good is relative but my Wilks is 261, so yes. At my bodyweight, 160lbs better.)
There was just as much googling about powerlifting as there was googling about Python syntax. Every time I looked at the data, I was forced to admit I only thought I understood anything at all.
We all have unconscious biases that shape our thoughts and actions.
Sometimes, it manifests in ways that have serious consequences. Sometimes, all it means (thankfully) is that you’re surprised that some truly impressive feats have happened in a “small place”. I was born and raised in a major city, which unwittingly (and unfortunately) led me to the vague notion that cities like Oklahoma City would just not be as impressive. Much to my delight, I was wrong.
It’s always wonderful to learn new things, gain new perspectives and challenge your assumptions. Oklahoma powerlifting wasn’t the only thing I learned about.
Perfection is a trap.
There will always be something you can improve on in anything you do. Good enough means it’s good. The problem is it’s hard to know when your work is good enough to be considered good enough. This seems especially true when you’re new to a field and barely even know what you don’t know, which is how you (or at least I) fall into the trap in the first place.
Knowing where the line is is probably the kind of thing that gets easier to evaluate as you get more experience… which at this stage probably means starting and finishing more things. I think? I’ll let you know when I’m good enough (hah).
Be a part of communities.
Share your work with other people. See what work other people are doing. After working on one project for long enough, I start focusing on everything I think my project is lacking and where it could be better simply because that’s where you need to look when you’re trying to make progress and make your work better. It’s a different mindset entirely to be evaluating the project for what it is. Sharing your work helps bring that different perspective back. I was genuinely surprised initially when people told me my notebook seemed “so neat and so well done” and that it was “very educational” and that they were “going to learn from this and learn how to properly organize [their] notebook”.
I certainly don’t say this to toot my own horn; I share this simply to remind myself that I’m probably my own worst critic. Sometimes it’s okay to pat yourself on the back too. And if you forget, sometimes other people will do it for you.
Even if your work does actually suck, that’s okay too.
More than anything, this is a note for future me. Until you decide to stop trying, you’ll always have another chance to build something else with the knowledge you’ve gained from past experiences. This parable seems to be quite popular and you may have seen it before. A ceramics teacher split his class into two groups. One group of students was tasked with focusing on making as many pots as possible, while the other was instructed to make just one pot, as perfectly as they could. All the highest quality pots came from the group that made as many pots as possible because learning is an iterative process that you refine by doing.
If you’re consistently growing, you never stop making mistakes; you just start making different ones (and sometimes still the same ones).