2nd Story — The Eternal Conflict of Python or R
(or JavaScript or Julia)
Once you have decided you want to learn Data Science and have a good idea about why you start on this path (for more about the why, check out the first story in this series Start with Why Data Science), then comes one of the biggest questions to answer: should I learn Python or R.
In short, knowing how to program is a core skill for any Data Scientist for a myriad of reasons, from being able to deal access the computing power needed, to, well, it is the way you communicate with computers to tell them what you need. When I was starting some people suggested you could get away with just using spreadsheets, but not only is that a clunky, limited and obsolete way of doing things, like using a fax instead of an email, once you get into big data or simply having to deal with a data stream, spreadsheets simply will not do.
Programming will also teach you something difficult to learn otherwise: computer logic. In data science, and specially in Machine Learning, we are giving computers instructions about how to interpret, in mathematical and computer language, things that come easy for us, like knowing when a picture has a cat in it. We must be able to tell the computers exactly what we want them to do.
What programming language should I learn then? This could be a long discussion, so please sit tight. Python, the answer is python. Well, this was short, but let us get into the interesting reasoning of why python.
First let us talk about the other languages that are doing some ML and DS. JavaScript has some remarkably interesting tools for it, and the advantage of being great for development. However, the tools are limited, and it is not the friendliest language to learn. So, unless you already have JS under your belt, you are better off staying away from it, at least for now.
There are also newer languages like Julia, also interesting, but, in reality, being used less and less in Data Science, and the community for it is much smaller that for R or Python.
Back to the main event. As I mentioned on my last story, my first programming language (after basic at 10 years old) was R, so I do have a big spot in my heart for it. This is a great language for Data Science and Machine Learning. It is free software (not to be confused with open source, here is a good article about the differences), it has a large and thriving community and tons of packages like Caret, GGplot2, Plotly, Superml and many more to make your life easier. It is also a very efficient language built for math and statistics, so it tends to run faster. It is not incredibly hard to learn and has easy to use IDE (Integrated Development Environment) in R Studio.
When I started learning Data Science for real, and since I already knew some R, you might be asking why on earth would I have picked anything else. After all, R seems to be the perfect language for DS. Well, it is. But there is just one small and tiny issue: Pyhon. Sometimes in life, you can have something better than perfection.
Python, named after the British comedic troupe Monty Python, not the gigantic strangling snake, is open source software, easy to install, easy to learn and use. It has one of the largest communities, many times larger than R, very vocal, friendly, open and collaborative. It also has tons and tons of packages from Matplotlib and Seaborn for visualization, Numpy and Pandas for dealing with numbers and data, to ScikitLearn and PyTorch for machine learning. The amount of learning resources available for Python online, free and paid, is mind-blowing. It also has a variety of IDEs available, like Spyder, and it is the main language for Jupyter Notebooks, my favorite tool for programming and DS projects.
The other incredible advantage of Python is that it will allow you to do many other things, from apps to websites to video games. I would probably use other languages for some of these tasks, but it is good to know you have many options with this one.
I did a lot of research before embarking on my road to data science and did not take this decision lightly. Still today, I am happy to have chosen Python as the main language for DS.
The most important thing is to go deep into the language. There are many Data Science learning outlets that will give you some basic programming skills before jumping straight to DS, but believe me, take your time to learn how to program first, before getting sidetracked with Numpy, Pandas, Matplotlib and all the other amazing libraries.
Take your time to understand variables, particularly the nuances of ‘True’ and ‘False’, the ever useful ‘if’ statements, functions, list comprehensions, *args and *kwargs (these took me forever to grasp), and do take a quite a bit of time understanding classes, it will make your life working with libraries much more enjoyable. Finally, take the time needed for the basic concepts to really sink in, delve into the differences of functional and object-oriented programming. In short, take your time to develop your programming skills before moving on. I know there is a lot of temptation to move on to data analysis, ML algorithms and all the interesting stuff, but just like any craft, you need to get your core skills off the ground first.
If you want to be a drummer, you need to understand rhythm. If you want to be a data scientist, you need to understand computer language.
This is about my Road to Data Science, so if you decide to learn R, or Julia, or JavaScript, go for it. You can also learn any of them afterwards. Just make sure to pick one and stick with it until you reach a high understanding before moving on to another language.
Bellow are some of the amazing resources I used to for learning how to program, some are free, or offer some free content, others are not. Check them out first to see if the way they teach is useful and enjoyable for you.
But do remember, programming should be your first and a vital step on your Road to Data Science.
Hope we cross paths through our Journeys…
Jack Raifer Baruch
Follow me on Twitter: @JackRaifer
Follow me on LinkedIN: jackraifer
Next Story: To Math or not to Math
About the Road to Data Science Series
Today, I am working on the first steps of remarkably interesting projects for human development based on Data Science and Machine Learning.
But not that long ago (really, not long at all) I knew extraordinarily little about data science and much less what it all meant (and I am still learning more and more about it every day). In my quest for reinventing myself from Psychologist working in Behavioral Economics to Data Scientist I went through an incredibly interesting journey and learned a lot. This series is mostly a letter to my past self, to help anyone like me take this amazing road and, luckily, avoid some of the mistakes I made on the way due to lack of knowledge or perspective.
Hope you enjoy my ramblings as much as I found joy on my Road to Data Science.
Need Help on your Journey?
This can be a difficult path alone, so feel free to reach out to me through LinkedIN or Twitter. I started this series because of the #66DaysOfData initiative by Ken Jee, it is a great way to connect and get support, so just check out Ken on twitter @KenJee_DS and join the #66DaysOfData challenge.
Learning Resources I have Used and Enjoyed:
A LOT of content, some free, most paid. Check out cupon sites where you can usually find free cupons for courses on python, R, data science, machine learning and much more.
Interesting place to learn, they have some free courses and then paid content. Very hands on coding exercises, few videos, mostly reading.
My favorite place to learn. Thousands of courses, a lot of content on programming, Data Science and Machine Learning. The University of Michigan has many courses here for python programming from the very basics to complex things. All courses are free to audit, you only pay if you want to earn a certificate.
The top free place to learn to code. Hundreds of hours of free videos on almost any language. They now also have certifications, also for free.
The place to learn anything. All of it is free, it might take a while to get to the content you want and enjoy.
Top site for data science, also run many competitions. They have many free courses, but the programming part is scarce, some basic ones and all focused on Data Science and Machine Learning.
Similar to Codecademy, with many paths and courses. Some free content, the rest is paid. Very focused on Data Science.
My favorite place to practice code, challenges for every level from beginners to advanced. This is a good place to challenge yourself and check your progress.