My Journey into Python and Programming Part II

This is Part II of my blog. See Part I here

Time flies

I can’t believe it has already been a month since I undertaken this mentorship project. Time really does fly. I have learned so much, but have made more mistakes than things I’ve learned.

My mentor

I couldn’t have asked for a better mentor. Aly and I get along great and have very similar backgrounds. I really relate to him because he came from the Finance side of things too. It gives me great hope and doesn’t make this feel like such a Heculanian task. He has been constantly reaching out keeping me on track and also encouraging me to stay motivated.

He recently started a new role at a data science firm so we are going through similar situations with changes at work. I really recommend to others getting started in the mentorship program to get to know your mentor on a personal level. Its makes it so much easier for both of you to understand what the other is going through and help with expectations.

The biggest thing that I have taken away from Aly is the confidence that he has instilled in me. He certainly hasn’t given me the impression that this journey will be easy but giving me a road map and guidance has allowed me to gain the confidence that I need to press on. Anyone thinking about switching careers to data science needs to find those who’ve gone before you and it makes it to seem like a much less daunting task. I would’ve given up or gotten side tracked many of times if it wasn’t for my mentor.

Things I’ve been working on

One of the first things that Aly was get me organized and setup a road map for us. I highly recommend a Trello Board. Aly made three lists: “To-do”, “Work-in-progress” and “Completed” The first thing was to get the basics down.

First thing on the list “Setup Trello Board” It may seem silly but even checking off one simple thing made me feel like I was making progress and getting somewhere. Next he had me read the first four chapters in Jake Vanderplas’ “Python Data Science Handbook” (or the pdf version here) I recommend this book to everyone starting out or has limited experience in Python. I read this book cover to cover and it gave me a great jumping off point. The book explores Python, Numpy, Pandas and Machine Learning. It is not a deep dive into any one of the topics but a step-by-step guide to the basics. It was invaluable for me to read through once and then go back with my own data set to analyze and use his code.

To complement this Aly had me work through a website firstpythonnotebook.org This was similar to Vanderplas’ book but limited to Pandas. The author sets up a Jupyter Notebook environment and analyzes data using Pandas. These two resources were great for me starting out to understand what I was looking at and how to get started. The website is broken up into 16 different chapters. Trello allows the users to make a “checklist” within each task. This was helpful to me to finish a chapter and check it off. I believe seeing progress is important for those just starting out.

When I first started analyzing data Aly recommended I first get comfortable with analyzing data and the really heavy lifting will come later. I feel this was/is invaluable advice. I have been working with data from Kaggle.com a community of data scientist and free datasets. Downloading data and writing simple lines of code has gotten me comfortable to be able to answer a few questions about any dataset I receive within minutes. As a person who can sometimes get ahead of themselves and wants the answer right away this has been very helpful for me to see that I need to get the basics down. One thing that has become very handy are shortcuts for my notebook. Like all programmers when you find yourself doing something over and over again you look to see if it can be done faster and easier another way. Since I was doing a lot of the same things with all of these datasets from Kaggle I found this blog for beginners a great tool.

Things outside of coding

While Aly has had me doing a lot of coding and analyzing data to get the basics down, he has also given me some great tools to work with when I’m not coding. Podcasts and videos have allowed me to complete the picture of learning and given me alternative ways to learn or think about coding. Here are three that were most useful to me:

Becoming a Data Scientist (This podcast is great for those looking to get into Data Science from another field. Highly recommend.)

Python Bytes

Talk Python to Me

I found this PyOhio Talk to be most useful to me and on-topic also. While it may not be the topic others choose this is still a very well put together video sharing one person’s story of facing hurdles and how he overcame them:

PyOhio Machine Learning the Hard Way- a story about ponies

Aly and I are reading “Data Science for Business” this has been a great resource for me to get into the analytical mindset that is critical for data analysis. This takes a wholistic, non-coding view of data science. This book is good for anyone looking to get into data science and have the right mindset but also for manager of data scientist.

The most productive task to do

Every time Aly and I meet we do what he calls “Retros.” It is a way of reviewing what I have been up to, what worked for me, what didn’t work and what I can improve on. Next he will go and share what he thinks I should work on, but also what he can do to improve my learning experience. This has allowed me to clarify what is the difference between “activity” and “productivity.” I am quickly learning that programming is an ever evovling process. I believe that one must always be learning and improving in this space. Having a mentor who is so engaged in my learning process has been the most valuable thing thus far.

One thing that has helped me the most

I am glad with my choice of data since I have previous knowledge of horse racing and what the data should look like. It has made it much easier to get a result and be able to quickly say “that doesn’t seem right.” Obviously, one will want to progress to work with data they know nothing about and I’m sure it will be exciting to learn something about a new topic from data science. I would recommend anyone just starting out like myself start with data that you understand so you can have an expectation of what your coding is telling you. Again, we have focused on the “basics” to get a good understanding of what I’m looking at and and where I’m going. This has been easier when I have a good understand of what things should “look” like.

What I’ve learned the most from

Every new coder learns very quickly that the majority of time is spent cleaning up one’s data. While it is not a very fun thing to do it is essential. Aly has stressed the importance of getting the basics down before moving forward. For example, I have been working with a data set that I found had “0's” for a column“DeclareHorseWt” the Horse’s weight before the race. Obviously, this will skew my data so I did a simple mean calculation, checked that it seemed resonable compared to the median of the column and replaced all “0's” with the mean. This will at least give me a more accurate understanding of how weight overall might affect the horse’s performance while keeping in mind that some of the data isn’t accurate. Again, this is just one example of getting down the basics.

I then checked my work to make sure that I had replaced all “0's” correctly:

“DeclareHorseWt” column is false when == 0

There’s sometimes a better way to do something

Another valuable lesson I learned is there is that there is more than one way to do something and this is a continuous learning process. In horse racing there is more than two categories of sex. I wanted to see what all the different categories listed were so I knew what I was dealing with and how many I had of each.

Sex of horses unique categories

I first used the method “count” to see how many I had of sex. While I got the information I was looking for I continued to see if there was an easier and more efficient way to do something. After continue to use Google and trying a few other options I found “value_counts()” was what I was looking for.

Tips to remember and things I’ve learned:

  • Cleaning data while tedious and boring is essential
  • Don’t be satisfied with just getting the answer, look to get the answer the most efficient and productive way. It will make you a better coder in the future.
  • The first way isn’t usually the best way.
  • Don’t give up but don’t hesitate to ask questions
  • You don’t have to be taught everything formally. A lot of this will be learned by doing.
  • The only limitation is one that you put on yourself
  • Never stop learning t