Photo by Justin Luebke on Unsplash

How to Teach an Awesome Data Analysis Course

Last semester I taught a grad course named Essentials of Data Analysis. Honestly, I’m not there [to claim such awesomeness] yet, but this semester was amazing nonetheless. Here I explain my course’s (kinda experimental) workflow and its findings.

Some Background

  • The university is MEF University in Istanbul/Turkey. The programme is called Big Data Analytics and it is an executive masters programme. I taught BDA503, Essentials of Data Analysis, which is the core and initial course of the programme. All remarks and opinions of this post are solely mine and they do not necessarily represent the views of the university or the program.
  • The programme is hard for especially non-coders and non-engineers. In one semester, they take courses on R & Python, SQL, statistics, Hadoop and some big data operations. Students have also day jobs and course hours are between 6:30 PM — 9:30 PM.
  • The coordinator of the programme is Prof. Ozgur Ozluk, an amazing person and a very good academic. He gave me enough structure and lots of freedom to build the course. A bit of a gamble, I must say...
  • I have a PhD, but I am not a full academic (neither it is my intention to be one). I am experienced with data and tools (especially R). I had previous experience in teaching related topics but I am not a “veteran”. Despite my shortcomings in teaching, somehow I have a positive impression on students (based on those who provide their opinion on course evaluations).

Course Structure

In 14-week semester, we have 7 face-to-face classes in computer lab environment and continuous online support. I should teach some R, give some data analysis tips and tricks (data manipulation, visualization etc.) and show some modeling. Nothing more serious than linear/logistic regression, k-means, hierarchical clustering, decision trees etc. They have a full machine learning course in the next semester, this is just a warm-up.

I should provide studying material, some quizzes/assignments, a small group data project and a take-home final. Then, we call it a (good) semester. Here is what I have done instead…

Enter the Instructor

My “teaching style” (if you can say I have one) is more to a drill sergeant’s taste than a proper teacher’s. I wanted them to be able to deal with data as early as possible with the proper tools. I am also a fan of tidyverse and reproducible research (see my full rant on the topic, here). Finally I am a product person and I want them to have something they can show to display their progress.

  • I gave only a single week, 3-hour lecture on base R. Covered most of the basics with some exercises but everyone familiar with learning programming languages knows that 3 hours is not enough. I counted on them to work on the material I provided and grasp the basics. My plan was also to continue the course on (almost) full tidyverse. So, proficiency in base R, after getting the essentials, is a secondary objective.
  • Next two lectures were on full tidyverse + rmarkdown. In the first part I covered the basics of dplyr and ggplot2. I also showed them the basics of rmarkdown, so they could generate reports. Imagine the struggle.
  • In the second lecture, I had them build GitHub repositories and pages with the help of Github Classroom. It was literally the breaking point of the course. If majority of them had failed to properly build a GitHub page, this course wouldn’t be half as successful. Instead, with the help of a quickly made tutorial, they managed to handle their repositories and web pages. I called those pages Progress Journals and they have a full dedicated section in this post.
  • After the third lecture, students started to hone their tidyverse skills on some real data sets. The flow is simple and very similar to case studies. I give them a data set, a background story and an objective. They create their own analysis of the case and present it in accordance with reproducibility principles. It was also a good exercise for the group projects.
  • Once I saw the class overall performance and motivation are high, I showed them some “advanced” material like Shiny and package making briefly. Some groups picked up Shiny as they thought it would be a good idea to present their work interactively.
  • Finally, I covered some basic machine learning models (regression, logistic regression, CART etc.) and the intuition (but not the math) behind them. This was by design, since they had a full machine learning course next semester.
  • In the final week, they presented their data projects. Most of them did quite well on the projects.
  • There was a take home final, which I asked them about their opinions about some discussions in data analysis and presentation (e.g. use of double y-axis), an extension on their projects (one or two extra analysis/visualization) and a full blown mini-data project (starting from gathering the data).

DataCamp

DataCamp has an educational program. Initially, it was not part of the course, but in the mid-semester a student asked me to apply for the course. I did, got free usage for my full class for the semester and even sent some bonus homeworks from there. Even if I have my doubts on video learning, I highly suggest it as a support material. I use them in my other courses too.

Feedback

Apart from the usual course evaluation I asked the students five questions about the course in general. Here are the questions.

  1. Do you think that this course improved your data analysis skills? (Yes/No) Comments?
  2. Do you think that you learned R + tidyverse workflow that you can actually apply it outside the classroom? (Yes/No) Comments?
  3. Do you think that GitHub based workflow (Pages + progress journals + sharing all your code and stuff to public) is good practice? (Yes/No) Comments?
  4. What are your suggestions to the students of next year about BDA503? (write at least 1–2 good advice)
  5. What are your suggestions to the instructor of next year about BDA503?

The answers were generally positive. Here is a summary.

  • They reported that they got the gist of R and meanwhile their data analysis skills also improved.
  • They liked tidyverse and they think they can apply it in their professional lives. So, many hours of repetitive work is saved! (You are welcome employers)
  • GitHub was a huge success. They liked the idea of having a product and presenting their work. It is a great leap forward in terms of doing work because our work culture generally do not include openly sharing stuff.
  • They recommended prospective students to “keep calm and carry on”. It would be stressful at first but quickly you will get the rhythm of it.
  • They recommended prospective instructor (ahem) to be more organized, lay out a clearer path and go easy on the students.

Conclusion

This course was designed as the entry point for data science for the uninitiated. Students learned and worked with different tools and conventions, all with different purposes but, in a collaborative context. I think we established a nice pipeline for data analysis and some groundwork for machine learning. The workload was high but students managed to deliver. I got tired as well :) But the results were awesome and after all it was the students who make the course awesome.

See all the materials and student works from the course webpage. You can always contact me on LinkedIn about this course.