10 Things I wish I knew before I started my first data science project

Shawn Sobieski
4 min readOct 7, 2020

--

I’ll be honest, when I started my Data Science program at Flatiron School I knew basically nothing about the industry I was trying to break into. After years of working as a classroom teacher, the ins and outs of computer science were completely new territory. I had done a few months of independent practice with Python, and I knew I was ready for a career change, but I still had so much to learn about what it takes to complete a data science or machine learning project.

So, in the hopes that it might help someone out there like me, feeling intimidated before completing their first big project, I compiled a list of ten things that I wish I knew before I ever opened Jupyter Notebook.

1) Plan Backwards

Start with the end in mind. Think, ‘what do I want other people to get out of this project, and what will my final draft look like?’. Although many of the details are likely to change, having an idea of where you’re headed when initially cleaning and structuring the data will save invaluable amounts of time otherwise wasted figuring out what to do.

2) Be Prepared to Adapt

Let’s say that like me you set out to find a link between a movie’s rating and the month it was released. You explore the data, plot trends, and analyze the results only to find that there is no such link. Unfortunately, there’s nothing you can do to change the data you have. Sometimes you’ll need to alter the way you’re asking the question, sometimes you’ll have to pivot in an entirely new direction.

3. Stay Organized

Be strategic about how you run your code. Sometimes, such as between work sessions, you’ll need to walk away from your Jupyter Notebook and rerun it later. I wish I had known how much time it would have saved if I had just organized my notebook well enough to be able to get right back to where I left off without running the entire notebook every time I took a break.

4. Edit As You Go

I wish that I had understood how simple it was to format as I went. As a student used to the typical work flow of write, review edit, I underestimated how time intensive formatting already written code could be. Make sure your code is in the proper format as you write it, and avoid the mistake of leaving ‘editing’ for last.

5. Write Functions One Piece at a Time.

Functions that do a series of tasks that often go together are great. I made functions for just about everything in this project, and not only did it save me time and space, it was one of the more fun parts of the project. Functions can be interesting puzzles, and I had a blast making them more and more complex. What I wish I knew from the beginning was to separate them if they do multiple things. Adapting a long function to serve additional purposes is time intensive, and it won’t always be possible to give your function the flexibility to do two things well.

6. Use Copy and Paste

This is an accuracy issue. Before long I found myself using copy and paste for single words and simple changes, and not so much because it helped me be faster, although it did, but because when I copied and pasted I had confidence that the changes I was adding were accurate and the same every time. Typos can be a real pain, so when possible, use copy and paste.

7. Have a Process for Naming Things

Once I had a large collection of figures I couldn’t find them in the file without checking a few similarly named ones. It ended up being a inconvenient time vacuum. As for variables, finding the right balance between descriptiveness and length is difficult. When in doubt, remember that your code is probably meant to be read by other people, so when deciding between descriptiveness and brevity, err on the side of descriptiveness.

8. Take Advantage of the Internet

Stubbornly, my first instinct when confronting an error was always to try and work it out myself. On occasion that led to a learning experience and a crisp line of code. More frequently, however, it led to a lot of frustration and wasted time. I wish I hadn’t been too prideful to look things up on Stackoverflow as a first response. It would have saved time, and I would’ve learned even more from the experienced coders of the internet.

9. Choose Data Wisely

There is so much data out there. When I started this project I assumed that meant, the more the better. What I found was much of it was redundant, or easy to replace with a more organized source. What I did was clean every bit of data I had before making a plan of how to use it. Save yourself time and be careful about the data you choose.

10. SAVE OFTEN

This is self explanatory. Save, then save, then save again. Like saying “I love you” to your family, don’t regret not doing it more!

--

--