GRADUATE SCHOOL

[CMU MISM-BIDA] Attending Graduate School at CMU in Spring 2021

In a Heinz uniform, as a Tartan, on CMU campus!

Jack Chang
As a Graduate Student in Data

--

I guess it’s after you betrayed yourself many times in life that you’ll finally find something you really enjoy doing.

Hamburg Hall at CMU, Source: Me

Preface

A long spring semester has finally concluded. I won’t say I’m too excited to share my experience here in Pittsburgh, but I feel utterly relieved somehow! If you are wondering what graduate school is like in a pandemic setting, I think you are reading the right article. Over the past semester, I’ve also attended several in-person classes at Carnegie Mellon University. If you are also looking for some negative energy, this article may also be time-worthy to skim through. My last article about my adventure can be found in the hyperlink: [CMU MISM-BIDA] Attending Graduate School at CMU in Fall 2020 (use wisely)

Carnegie Mellon University
Heinz College of Information Systems and Public Policy
Master of Information Systems Management — Business Intelligence and Data Analytics (MISM-BIDA)

Overview

This is actually my official semester at CMU. With 54 units on the line, I think I can say that I survived the journey. After 2 months of rest and all the awkward things that happened, I can finally take time to sit down and think back on how it all happened. Again, this article will be simply about “graduate life,” and will be posted on my publication: As a Graduate Student in Data. If you are interested in reading more about my graduate experience, feel free to check it out! I will go over my experience in the courses and life at Pittsburgh! You will, however, not see any internship-related parts in this article. (future articles to come!) Let’s get started!!

Spring 2021 Class Schedule

CMU campus and great skies, Source: Me

For the 1st semester, students are required to finish a lot of core courses. In order to do an internship in the summer, Writing (95–717) and Speaking (95–718) are required. I also took some of the Heinz core courses in data science: MLPS (95–828) and UDA (95–865). Last but not least, I challenged myself to a course in the computer science department, MLLD (10–605). The following are the courses that are on my schedule:

  • 95–717 Writing for Information Systems Management: 6 credits (Mini 4)
  • 95–718 Professional Speaking: 6 credits (Mini 3)
  • 95–719 Accounting and Finance Foundations: 6 credits (Mini 3)
  • 95–796 Statistics for IT Managers: 6 credits (Mini 3)
  • 95–828 Machine Learning for Problem Solving: 12 credits (Full Semester)
  • 95–865 Unstructured Data Analytics: 6 credits (Mini 4)
  • 10–605 Machine Learning with Large Datasets: 12 credits (Full Semester)
Photo by Tim Collins on Unsplash

95–717 Writing for Information Systems Management

I will have to say that I did learn to write a proposal after this course! Writing in a more structured way is the main objective of the course and is the basic soft skill that is required in the industry. My overall thought of the course is that I think it was well-taught. Professor Haylee Massaro’s classes are all so well-structured! That is why I really enjoyed it. Each writing class starts off with Haylee giving us a brief of the topics covered for the day. We often have class discussions and activities to expose ourselves to real-life ideas. For assignments, we began with cover letters, professional briefs, and eventually worked towards a 4-page professional proposal with a topic of our choice.

In-person WRTG class with Prof. Haylee Massaro, Source: Me

95–718 Professional Speaking

Professional Speaking with Chris Labash was not heavy-loaded at all. We usually have 1 lightning round speech per week and 1 final presentation at the end of the semester. Topics vary from simple explanations, dear data to tech talks, and global crises.

SPKG class final presentation: Market Manipulation, Source: Me

My final presentation was on “Market Manipulation,” where I learned about and presented the “GameStop Stock Frenzy” that was on fire at that time. I would say the goal of the course is to help students get more explore to presenting. You might pick up some useful tips in each class. For instance, I learned that credentialing yourself first and stating the references are important for letting your audience believe in what you want to deliver. For those of you who are dog lovers, I suppose Chris’s dog Luke is a spotlight for the course if you are taking it in person! Speaking really takes practice! From that day on, I worked hard to be a better speaker every day!

In-person SPKG class with Prof. Chris Labash and Luke, Source: Me

95–719 Accounting and Finance Foundations

Since I was an accounting major back in college, I did not enjoy professor Lynne Pastor’s AFF course. However, if you come from a non-business background, this course might help you learn more about the financial terms in the industry. AFF includes weekly assignments, in-class quizzes, and 2 exams. One suggestion for those who want to survive the course: Take notes while you can after each course! You will definitely do better on exams if you have notes to review. (Note that exams often packed up in the same mid or final week!)

95–796 Statistics for IT Managers

What is P-value?

Professor Janusz Szczypula, one of my favorite teachers at CMU, taught me statistics in Spring 2021. Janusz’s classes were all so well-prepped! I love it when he draws out and explains the concepts. (See picture below!) For a mini, I think the course gives a decent introduction to probability, sampling, hypothesis testing, and regression models. All topics are relevant for a data science interview! We also had the chance to use Minitab (a statistical software) in some of our assignments. Overall, I would say that if you are not proficient in statistics, this class may be a good chance to brush up!

STATS class with Prof. Janusz Szczypula, Source: Me

95–828 Machine Learning for Problem Solving

Congrats to Leman on winning the 2020 SDM/IBM Early Career Data Mining Research Award

Now, one of the core courses of the BIDA program must be machine learning (ML) with professor Leman Akoglu! Leman is a really hardworking professor that tries to make the course entry-level to those who have never learned ML. With some ML background, I still learned a lot while taking her course. All the basic ML algorithms are included in this course: linear regression, logistic regression, k-NN, random forest, support vector machines, neural network, k-means, PCA, GMM, and so on.

At the beginning of the semester, students will form groups to do class assignments and final projects. This was great since students can work in teams to solve problems (rather than working on their own). Whenever you run into an error, it is always better to have someone on your team for feedback. I learned a lot from my teammates as well while struggling in the ML realm. For those of you taking this course in future springs (offered only in spring), I recommend reading “An Introduction to Statistical Learning” as the course goes on!

95–865 Unstructured Data Analytics

I took UDA in person (and also online)! For those of you wondering about the UDA prerequisite in Python, this simply means that you are required to take Python before taking the UDA course. I often see many of my cohorts take Python in mini 3 and UDA in mini 4 in one semester!

In-person UDA class with Prof. George Chen, Source: Me

UDA is a mini-course that covers a variety of topics. These topics are typically in areas below:

  1. Text analysis (Frequency Analysis, Co-occurrence Analysis, PMI, Pi, Chi, Cramer’s V)
  2. Dimension Reduction (PCA) and Manifold learning (Isomap, t-SNE)
  3. Clustering (k-means, GMMS, DP-GMMs, DBSCAN) and Topic Modeling (LDA)
  4. Predictive Data Analytics (k-NN, Decision Trees, Random Forest)
  5. Neural Nets and Deep Learning (CNNs, RNNs)

There is a lot of coding in this course and professor George Chen uses NumPy (no pandas) and Pytorch in his teachings. It is an introductory course for students seeking different ML domains. (ML, NLP, DL, RL, etc.)

10–605 Machine Learning with Large Datasets

Congrats to Virginia on receiving the 2021 Google Research Scholar Award and 2020 Towards On-Device AI Research Award (Facebook Research)

Congrats to Ameet on receiving the 2021 National Science Foundation CAREER award and 2020 Towards On-Device AI Research Award (Facebook Research)

The course that I struggled with the most must be 10–605 by professor Virginia Smith and Ameet Talwalkar. Both of the professors are titans in the field of large-scale ML. I would say the course assignments are really worthwhile (or time-consuming) since it has a written and a programming part. In some of the assignments/projects, students get free AWS credits to process 280 GB of the Million Song Dataset. That was quite an experience since one misstep may cost you extra dollars if you exceed the budget limit!

The tools I used in this course are Databricks, AWS, PySpark, and TensorFlow in Python. We also had two guest lectures on “Data Preparation for Tabular Data” and “SW/HW Innovations in Emerging DL Training Systems.” Words of wisdom for those who want to take the course: Take the course in your last semester (after taking ML-related courses). Also, it would be great to review linear algebra and probability since much of the written homework require this prior knowledge. Since 605 is more concept-based, if you are looking for a more practical course, you may also consider 95–869 Big Data and Large-scale Computing at Heinz College.

MLLD class with Prof. Virginia Smith and Ameet Talwalkar, Source: Me

605 covers useful techniques when you are dealing with large-scale ML, mainly on 2 problem settings:

  • large k problem (features): curse-of-dimensionality
  • large n problem (observations): efficient learning

Topics discussed ranges from the beginning of the ML pipeline till the end:

  • Visualization: Distributed PCA, Johnson–Lindenstrauss, t-SNE
  • ML Methods and Efficient Data Structures: Distributed Linear Regression, Distributed Logistic Regression, Kernel Ridge Regression (Kernel Approximations), Distributed Decision Trees (PLANET, Yggdrasil), Hashing, Randomized Algorithms (Count-Min Sketch, LSH)
  • ML Frameworks and Hardware
  • Large-Scale Optimization: computation vs communication (GD, SGD, Mini-batch SGD, One-shot Averaging, CoCoA)
  • DL Optimization: Adaptive Learning Rates (Newton’s Method, AdaGrad, RMSProp, AdaDelta), Momentum (Polyak Momentum, Nesterov Momentum), Adam, Batch Normalization, Early Stopping
  • Parallel & Distributed Deep Learning: Communication Strategy (Centralized, Decentralized), Synchronization Model, Compression (Quantization, Pruning, Knowledge Distillation, Efficient Architectures)
  • Hyperparameter Tuning: Random Search, Adaptive Search, Neural Architecture Search (ASHA, ENAS, DARTS, RSWS)
  • Federated Learning: FedAvg, FedProx

Pittsburgh Life

Photo by Fabio Comparelli on Unsplash

This part of the article will be visualized with photos rather than words. Most places that I find worth visiting are worth giving a try.

National Aviary

Sloth at the National Aviary, Source: Me

The Duquesne Incline and Grandview Overlook

The Grandview Overlook, Source: Me
The Duquesne Incline, Source: Me

Phipps Conservatory and Botanical Gardens

Phipps Conservatory and Botanical Gardens in spring, Source: Me
Phipps Conservatory and Botanical Gardens in winter, Source: Me

Carnegie Museum of Natural History and Carnegie Museum of Art

Carnegie Museum of Art, Source: Me
Carnegie Museum of Natural History, Source: Me

Pittsburgh Zoo & PPG Aquarium

Capybara at Pittsburgh Zoo & PPG Aquarium, Source: Me
Monkey at Pittsburgh Zoo & PPG Aquarium, Source: Me

There are also a lot of parks in Pittsburgh if you enjoy hiking and going outdoors. Namely, Schenley Park, Frick Park, and Highland Park are the top 3 I like to visit after school or work! My next aim is to visit the state parks around the city!

Epilogue

I have been through so much in the past 6 months while I was here. I have lost what was mine but also gained what I did not expect. Life is a rollercoaster ride and after all that, I said to myself:

Why not enjoy the journey?

My 1st official semester at CMU has been hard work but also rewarding! After my internship and some time for revision, I suppose I will get myself ready for Fall 2021. Thanks for joining on the ride as I will dive into more of my school life at CMU next time. If you have learned something or have a similar experience, feel free to comment below! If you have more to discuss, you can contact me via LinkedIn, Instagram, or Twitter. Also, feel free to follow me on Medium for more graduate life articles to come!

Previous Blogs

--

--

Jack Chang
As a Graduate Student in Data

Top Writer in AI, ML, & MLOps | ML Eng ✖ CMU Alum | #datascience #AI #ML | Follow me on LinkedIn: linkedin.com/in/yung-linchang