Homemade Data Scientist

Marina Bischof
womenplusplus

--

My way to Data Science was a bit long and not straightforward, so I am happy to share my experience and help others on their way to this adventurous world of data.

I have a background in economics and more than 4 years of experience in risk management and business analysis. While doing my PhD I started to program in R. My programming skills were not that strong in the beginning. I took a few online courses to reach the level of my PhD group mates. Struggling with the big amount of data I had to handle for my research (and procrastinating from it a bit), I decided to learn Python as well — and Python brought me to Machine learning and Deep Learning. I acquired most of the knowledge from different online sources, most of it was for free. Below, I will give you an overview of the sources you can use to learn Data Science. If you have been curious about ML for a long time but don’t know how to start, today is the day to do it — and below I outline the plan:

1. What Is Data Science And How To Eat It

For those who have concerns on how and where to start the course Elements of AI (30 Hours, Beginner level, Free) by the University of Helsinki will be very helpful. It is very well structured and it provides a basic understanding of Artificial Intelligence and its implications. I would recommend it to everyone who wants to have an informed conversation about AI and ML without going too deep into the details of programming.

The initial course in the Data Science specialisation offered by IBM on Coursera explains What is Data Science? (6 Hours, Beginner level, $39/month after 7-day full access free trial). While the previous course is interesting for the reader from any sphere, this one is for those who already decided to go deeper and need a starting point. It is possible to continue from that course by choosing from ”Introduction to Data Science” or “IBM Data Science Professional Certificate” tracks. As the course is offered by IBM, all the practical work happens on its platforms.

Trick: If you finish a few courses of the program during the first-week trial period, you sometimes receive certificates for free on Coursera even if you cancel the subscription shortly thereafter.

If you started with the above IBM course it offers a very nice specialization with an overview of main tools, methodology and intro into SQL — Introduction to Data Science Specialization (29 Hours, Beginner level, $39/month after 7-day full access free trial). As in the previous course, be prepared to work with IBM products and platforms.

2. Python Or What?

I came to Data Science through the programming language R, which I used for my PhD research. Initially, I did courses to improve my R programming skills, and thanks to recommendations from Coursera, the absence of a deadline for my research, and my curiosity, I started to learn Python. My path was thorny, and I suggest that you avoid my mistakes and choose Python from the beginning as it has become the most powerful and popular language for ML nowadays. Even though I still love R with all my heart, I list only Python courses here (for those like me who came from R Studio, my suggestion is that you download Spyder to mitigate the “cultural” shock).

The University of Michigan has a few specialisations on Coursera for learning Python. They suggest that you start with Python for Everybody Specialization (57 hours, Beginner level, $49/month after 7-day full access free trial) followed by the more in-depth Python 3 Programming Specialization (73 hours, Beginner level, $49/month after 7-day full access free trial).

If you started with IBM at the previous stage, you can continue learning Python with them as well. Python for Data Science and AI (10 hours, Beginner level, $39/month after 7-day full access free trial) is relatively short and covers all the basic topics

Udacity is a relatively expensive platform with a monthly fee of $399. It offers many nanodegrees, a more personal touch, real-world projects from industry experts, 1-on-1 technical mentor, and even a personal career coach and career services. In any case, they also offer a free Python course — Introduction to Python Programming (25 hours, Beginner level, Free).

Learn Python Basics for Data Analysis (12 Hours, Beginner level, Free) and Use Python libraries for Data Science (8 Hours, intermediate level, Free) are other free courses to get started with Python on the Openclassrooms platform. They consist of small videos (can easily be played with 1.5X speed) and text and provide a fast overview of Python with a few projects for peer review. The platform works mainly in a path mode with a price of €400 per month, individual mentorship and a job guarantee, so the more advanced courses are all paid.

DataCamp also has an Introduction to Python course (4 hours, Beginner level, Free) and many further possibilities to continue learning to programme for $29 per month. The platform is very convenient but all exercises are on their webpage, so you would not touch “real” Python. I started with this course and was a bit confused when I needed to download Anaconda and Spyder later for real work.

The Dataquest platform offers very good Python courses as part of the paid paths “Data Scientist” or “Data Analyst” (which would cost from $29/month later on). Python for Data Science: Fundamentals and Intermediate, as well as the main part of Pandas and NumPy Fundamentals (15–20 Hours for all three courses, Beginner level, Free), are for free and have only text content. The main concepts are clearly explained and there are guided projects at the end of each part to put what has been learned into practice.

3. ML, DL, NLP And So On And So Forth

Machine Learning(56 Hours, Beginner level, $79/month after 7-day full access free trial, audit for free) offered by Stanford University on Coursera seems to be the most famous course in the field. The instructor is Andrew Ng — a co-founder of Coursera, adjunct Professor at Stanford University and much more. The course is very good from a theoretical point of view but has the serious disadvantage of using MATLAB for practical training.

For R and Python users, Machine Learning A-Z™: Hands-On Python & R In Data Science(41 Hours, Beginner level, $19.99 with 90% discount) on Udemy provides — opposite to the course above — less theory but much more practical training in both languages. It works very well in combination with the above course. And in my case, I was able to master my Python comparing it with more familiar for me R scripts. At the end of the course, you will have ready-to-use templates for all ML models covered, just do not forget to write notes.

There is also another course on Udemy from the same instructors: Deep Learning A-Z™: Hands-On Artificial Neural Networks (22.5 Hours, Intermediate level, $10.99 with 94% discount). It is also very practical and includes only Python this time. It is an apt continuation of the previous one.

Machine Learning with Python(14 Hours, Intermediate level, $39/month after 7-day full access free trial or free audit) by IBM on Coursera is another option to learn ML, especially if you have already started with previous courses of IBM.

Neural Networks and Deep Learning (18 Hours, Intermediate level, $49/month after 7-day full access free trial or free audit) by deeplearning.ai on Coursera is another course taught by Andrew Ng but includes practical Python programming exercises this time. To advance your knowledge further you can continue with full specialisation — Deep Learning Specialization.

Machine Learning Crash Course (15 hours, intermediate level, free) is compact, free and fast. It helps refresh your knowledge in a short time, for example, while preparing for an interview.

For those who are interested in Natural Language Processing, this course could be interesting as well: Natural Language Processing in TensorFlow (9 Hours, Intermediate level, $49/month after 7-day full access free trial or free audit) by deeplearning.ai on Coursera. I suggest that you first do the ML and DL courses.

4. Data? Where? How?

Having SQL in your skillset becomes essential in case you are looking for a job in Data Science/Analysis, therefore I included a few SQL courses.

Retrieve data using SQL (20 Hours, Intermediate level, free) is one of three free courses on the Openclassroom platform. This is a oneday “quick dive” into SQL. It is good as an introduction as well as for refreshing your knowledge before an interview.

SQL for Data Analysis (31 Hours, Beginner level, free) on Udacity is also for free. It is a bit longer than the previous one but contains common examples that could come up in interviews.

Databases and SQL for Data Science (11 Hours, Beginner level, $39/month after 7-day full access free trial or free audit) by IBM on Coursera is part of IBM’s specialisation. The course provides not only concepts but also how to access databases from Jupyter notebooks using SQL and Python.

5. Can You Be A Data Scientist/Analyst?

Yes, you can!

--

--