Coursera IBM Data Science Professional Certification Program Review
Coursera, an online learning platform for massive open online courses and IBM have partnered to create the Data Science Professional Certification program. This program consists of 9 courses designed to take someone from the very basics of data science to creating their own unique capstone project. At the end of the program, you will have a completion certificate as well as 9 badges, one for each course in the program. These badges can be added to your portfolio, LinkedIn profile, or taped to your fridge to remind yourself of your accomplishments!
I recently finished the program and would like to share my thoughts on what I have learned so far, my tips for students, and where I plan to go next in my self-taught data science journey.
The program is a great introduction to the basics of data science. I recommend it to anyone looking to get started in the field. It is very high-level, however, so further learning and portfolio building is required before entering the job search market. However, the freeform final project is a great primer to what it is like to rely entirely on yourself to write every line of code in a project.
This course uses a subscription-based payment model. There is a monthly fee of $39 (at the time of writing) that gives you access to all the course modules, assignments, discussion forums, and peer-graded assignments. No matter how long it takes you to complete the courses, you will be charged the monthly fee unless you finish the program or cancel your subscription.
As the payment is monthly, the less time you take the less you pay for the course. I was able to complete the course in just under a month (28 days to be precise). However, I will note that I was on summer break from university at the time, had no main job except for some freelance writing projects, and generally had an astounding amount of time to devote to the program. Additionally, I had already completed the Codecademy Data Science stack, so I was able to complete all the assignments for the first few classes within the first week.
That being said, if you are an absolute beginner, Coursera estimates the program will take 2 months to complete. Some people, of course, may take longer. However, if you are looking to finish the program within 1 month, I highly recommend completing some prerequisite learning programs before starting.
Course 1: What is Data Science
This was a very high-level introduction to the field of data science. It went over the basic applications of data science, like data mining, linear regressions, and real-world uses, like engineering. Most of the course was interviews with professionals or students, relaying their experiences in the field. However, there was also an introduction to IBM Watson, the first instance of the tools of data science in this class, as well as the first introduction to IBM itself.
Course 2: Open Source Tools for Data Science
This short course flew through some of the main open source tools used in data science. It focused mostly on tools like Jupyter Notebooks, Zepplin, RStudio, and, again, IBM Watson.
This stood out to me immediately during the course. Most of my learning previously had been using tools like Codecademy. While Codecademy is a great place to learn the basics, it has been criticized for its isolated learning platform and many students say there is a steep learning curve when applying their knowledge off the website. This is not a problem with the IBM program as right from the beginning you are using open source tools, mostly Jupyter Notebooks.
Course 3: Data Science Methodology
This course taught how to think like a programmer. The thinking model provided in the course was highly recursive and iterative. Each step encouraged thinking back to previous steps and looking for points of improvement. I greatly appreciated this course as learning to think like a programmer and data scientist is one of the most challenging aspects of becoming a strong analyst.
Course 4: Python for Data Science
This is where the program begins to dive into actual coding. In this course, you learn the basics of Python, Pandas, and NumPy. Because I had previously finished the Codecademy Data Analysis stack, I flew through this course. However, I did watch all the videos and, frankly, I don’t think this course would be a good resource for someone who has never coded in Python before. Learning to code, in my opinion, should be highly interactive and the videos and quizzes did not offer the level of student input that I think is key to really learning how to code, especially for the basics.
This course also involves the first real project of the certification program. In this course, students are required to analyze a set of economic data using Watson Studio. Coursera uses an interesting grading model. Except for multiple choice quizzes, all assignments are graded by your peers. Unfortunately, this means that you likely will not get comments or feedback on your work, which may be a drawback for some students.
Course 5: Databases and SQL for Data Science
Again, this course used IBM Cloud as the main teaching platform. This was also the first course where virtually everything I learned was new content. However, the course was designed well and I learned a lot about how to not only build databases, but how to collect and analyze data from them using Python.
This course also incorporated the first long, robust project of the program. The previous projects had been very straightforward and most of the code was provided. This slow progression to more challenging material is one of the great strengths of how this program was designed.
Course 6: Data Analysis with Python
This was my favorite course in the program. I had learned much of the material in my Codecademy studies, but nevertheless, I liked revisiting Pandas, NumPy, and SciPy. This course covered a range of data analysis techniques, from finding and wrangling data to statistical analysis and modeling.
Interestingly, there was no peer graded project in this course. The entire grade relied on a large number of quizzes. I think this course would have benefited from a peer-graded final project to help solidify the concepts covered, however as the skills are required for the future courses and project, I understand why this decision was made.
Course 7: Data Visualization with Python
This course was a rapid-fire introduction to a range of data visualization techniques, including line graphs, bar charts, pie charts, and specialized visualizations like Waffle and Folium. I particularly enjoyed learning about Folium and Choropleth. I’ve always loved looking at map-based data visualization, but I never knew they could be generated with Python. I have a feeling these maps will find themselves embedded in many of my future projects.
Course 8: Machine Learning with Python
This was another high-level survey course and frankly, I found it to be the most challenging. I don’t think machine learning is inherently a difficult topic, but I had never had any experience with it before this course, so it was a bit of a demanding course.
The course quickly covered a lot of topics, including simple regression models, classification, clustering and recommendation systems. However, because of the tangible applications of what I was learning (we covered Netflix-style recommendation algorithms for example), this was one of the most interesting courses in the program.
The final project for this course involved applying four different types of machine learning protocols to a data set to determine which was the best. It was a fun challenge to complete.
Course 9: Applied Data Science Capstone
Finally! The capstone project!
The capstone project for this program consisted of two parts. First, there was another learning module where we covered the Foursquare API to get location information. Again, as this was a tangible application of data science I had seen before, this was a very interesting module.
The first peer-graded project of the capstone was a simple use of the Foursquare API to understand the venues in a city.
Then came the real challenge. The final project of the capstone was entirely open-ended. We had to make up our own question to answer using the tools we had learned. The only requirements were to use the Foursquare API, use data analytics, and build a Folium map as part of the presentation. Additionally, to build job skills, we were required to write a full report and develop a slide deck to explain our results.
I am interested in using data analytics for health and science, so I chose to build a correlational model and Folium map to show all the hospitals in Toronto, Canada as well as population data in the city to pinpoint a neighborhood that would most benefit from the addition of a new hospital.
I will be the first to admit that my model was highly simplistic. I did not incorporate socioeconomic data, statistics on health and wellness in the community, or really any data that would be actually helpful to the recommendation system except population. However, I appreciated the freeform nature of the project. I literally started with a blank Jupyter Notebook and through the project, I was able to learn more than I ever had through Codecademy as I had to come up with every line of code on my own.
Tips for Learners
My biggest tip for this course, and for any online coursework, is to practice discipline and be engaged in the content. It can be easy to passively watch the videos, run the code in the given Jupyter Notebooks and pass the quizzes without really solidifying your knowledge. If data science is something you are genuinely interested in learning, take the time to really go through each module and assignment to make sure you understand the code. You may even choose to rewrite portions of what is provided to help in the learning process.
Additionally, even though you may be a beginner to data science, start immersing yourself in the field. Follow data science based subreddits, YouTube channels, Medium writers, etc. For me, I found that spending 20–30 minutes a day reading about data science, techniques, or breakthroughs in the field helped keep me motivated and showed me what is possible with the skills I was learning.
Finally, be sure to have fun with the course. Especially with the capstone project, the Coursera IBM Data Science certification program is designed to be highly personalized. You can really put your own personality and interests into the projects you do and the types of analysis you complete. No matter what field you are in or are hoping to join, data analytics is always present so be sure to mold the program to whatever type of work you are interested in, be it biological sciences, civil engineering, marketing, and beyond.
The course was an action-packed few weeks during which a lot of topics were covered, but here are the key takeaways from my experience:
- Data science is a highly malleable, diverse field that attracts people from a wide range of disciplines. Learning data science is not only understanding how to use a set of tools, but more importantly, becoming comfortable with using those tools to answer a variety of questions and thinking like a data scientist.
- Of course, since the program was created by IBM, it relies heavily on IBM products like Watson and Cloud Services. However, I think the videos do a good job of generalizing what you are learning to be able to apply it to other systems. Additionally, most of the work is done on Jupyter Notebooks (unlike the isolated platform of programs like Codecademy), so you can go straight into other projects after finishing the course.
- This program definitely built confidence in my programming abilities as well. I learned data science in an iterative process, slowly building on my preexisting skills over time. The program was challening, but doable.
- When working on real-world projects, you are not going to have your hand held. Often, if you are starting a project, you are starting with an empty document and you are responsible to come up with every line of code that is needed to get to the results you are looking for. The final project in this program was a great primer to what that feels like. For that alone, I highly recommend this certification program.
I think this program is a wonderful option for those interested in getting a high-level overview of the basics of data analytics. The course is highly independent, however, and there is little to no opportunity to get valuable feedback from experienced programmers. All assignments are either multiple choice, and therefore automatically graded, or graded by your peers who must answer simple yes or no questions to reach a final score.
However, with future work and commitment, the skills learned in this program are applicable to real-world data science and the program as a whole is very valuable to get a broad understanding of everything you need to know to be a good data scientist.
My Next Steps
Now, while this certification program may seem comprehensive, and it certainly is, it cannot be the only training you do to enter the data science market. After all, you will only have a few projects in your portfolio, and only one truly unique project (the capstone).
Thus, for my next steps, I will work on expanding my portfolio to show my skills in all the topics covered in the certification program. To find projects, I am turning to the Rosalind project, DataCamp, DataQuest, and any other projects I can find that interest me.
Thanks to everyone who has read this far. If you have completed the certification program, I’d love to know what you thought.