How is Cloud Computing (15619/15319) at CMU?

Lisa Hou
11 min readJun 12, 2019

--

A random good-looking image for Cloud Computing, because I need a cover image.

Right after I finished the class, I started writing this post, because I had so many mixed feelings towards it. Then I waited, until I made sure that I passed the course. I waited even longer, until I am now 3 weeks into my internship and this course is influencing me, mostly in a positive way.

Introduction

15619/15319 is a course at CMU on cloud computing. It does not simply teach you how to use the consoles of cloud platforms because you can mostly learn them by yourself. It treats you as if you are a software engineer in the industry, and you want to build products that can actually be published.

The course is completely online, no matter you are a CMU student in Pittsburgh or in Silicon Valley. There are no in-person lectures. You read. You finish the quizzes. You do your own projects. If you have questions, you ask questions on Piazza or go to a TA’s office hour.

This course is completely project-based. It is composed of 11 weekly individual projects for the first half of the course, biweekly for the second half. If you take the course in the Spring semester, you can drop the lowest score of one of your individual projects. If you are going to take it in the Fall, I’m sorry. For 15619, there is an additional team project interleaving with biweekly individual projects, which is very very time-consuming. Each project has its own budget — exceeding the project budget can give you a 10% or 100% penalty depending on how much you exceed the budget by. Yes, you could spend hours and hours on some project and lose all the points. That happened to me once in Phase 1 of the team project.

In addition, every week there is a quiz that you take based on some readings. Sometimes, there are online programming trainings that you need to take to prepare you for individual projects. And you may need to do some code review for other people from time to time as well.

IT IS A TON OF WORK!! Think carefully and read this article before you take it.

Projects

I rank how difficult each project is based on my personal experience. The more * I put in front of the project, the more difficult I think it is.

Module 1: Big Data Analytics

Before Spring 2019, this module only had one individual project, but it was too much work for students that just started the course. That one project was divided into two in Spring 2019.

*Project 1.1 Sequential Analysis: 1) get familiar with coding in Java and using maven to compile, 2) start using terraform to launch instances and destroy instances (scripts are provided), 3) know how to use Jupyter Notebook and analyze data in Python, 4) JUnit tests in Java. This project is the easiest. If you had prior Java programming experience, this should take you no more than 10 hours to complete. Don’t be fooled by this difficulty. This is just a beginning.

**Project 1.2 Big Data Analytics: 1) learn MapReduce, 2) learn to use AWS EMR Hadoop cluster, 3) learn some basic search techniques such as inverted index, 4) get familiar with Yarn. This project is one of the easier projects. The MapReduce, EMR and Yarn techniques that were used were easy to learn. It gets much more complicated later on.

Module 2: Automating and autoscaling distributed services

*****Project 2.1 Horizontal Scaling and Advanced Resource Scaling: 1) Invoke AWS APIs to auto scale resources horizontally (=adjust number of instances) based on the current load, 2) tradeoff between number of resources and budget. This project not only has a total budget, it also a live budget when testing your program. Simply launching too many instances might give you a 0. This project was a nightmare to me. It sounds very straightforward but writing AWS API in Java is a pain. The documentation for Java AWS API are so hard to read. I spent three days on this project 8am to 3am every day without doing any other assignments.

****Project 2.2 Containers: Docker and Kubernetes: 1) Administer docker containers (and write dockerfiles) and know how to use kubernetes clusters on GCP, 2) learn docker commands and manage docker images, 3) Create and deploy Helm charts to manage Kubernetes applications. This project is the most interesting one in my opinion but it’s far from the easiest. The code for application is provided so that you can focus on managing docker images and communicating with different instances in a Kubernetes cluster. The concepts are hard to grasp from the beginning and you may spend hours trying to understand without coding anything, but keep in mind that it’s very normal for this project.

**Project 2.3 Functions as a Service: 1) Know how to use Google Cloud Functions, AWS Lambda functions, Azure functions, 2) understand cloud events and react to cloud events using functions. This project focuses on using cloud services so it’s much less loaded in difficult concepts and is probably the easiest besides projects in Module 1.

Module 3: Storage and DBs on the cloud

Information on databases in this module is very important for the team project in 15619. Be careful.

***Project 3.1 Files v/s Databases: 1) analyze files using awk and grep, 2) know how to write some easy MySQL queries, 3) use MySQL in Java programs with JDBC, 4) search in NoSQL database such as HBase in Java programs, 5) use MapReduce to load data into HBase. 2), 3) and 5) are very very very important in the team project. 5) is not emphasized here because the MapReduce script is already provided for you. Make sure that you look through the code to familiarize yourself with how it works.

***Project 3.2 Social Networking Timeline with Heterogeneous Backends: I can’t talk much about this project because I waived it. My friend who did it told me it wasn’t hard. It was before my spring break and I was very busy that week so I skipped it. However, if you are not so busy and you have one full day left for this project, DON’T SKIP THIS!! I know it’s spring break but don’t be lazy. There are much more difficult projects that you can choose to skip.

At this point, the team project has started and will interleave with individual projects. I will first continue with only individual projects for now, but keep in mind that there is a lot more work from the team. This is the start of the time when this one course can easily take you 50 hours a week.

*****Project 3.3 Multi-threading Programming and Consistency: 1) Know different consistency models, 2) apply strong consistency on key-value store in Java, 3) know how to use locks and when to add and release locks. You will build a program mimicking a data center that can take read and write operations. This is the monster. This project is so difficult. It needs a lot of understanding. I devoted four full days to this project, one of which I was just trying to understand different consistency models. Go to TA office hours, get help and discuss concepts with others. Try to understand as much as you can before coding because it is very difficult to debug multi-thread programs. Probably you should just go ahead and drop this one if you haven’t used your waiving quota. Again, if you take the course in the Fall, unfortunately you have to finish this. Of course, if you still choose to do it anyways, you will feel like a hero.

Module 4: Iterative Processing with Spark

At this point, the team project really gets in the way and chances are that you can’t get all the points in these individuals. It’s ok. Do you best and you shall pass.

****Project 4.1 Iterative Processing with Spark: 1) Use Scala to program a Spark program, 2) be able to speed up Spark programs using the right data structures and functions, 3) debug Spark with YARN UI and YARN logs, 4) tune system parameters to optimize your program. You will eventually realize something similar to Page Rank using Spark. The skeleton (a basic implementation) will be provided for you, but you will need to understand everything to modify the code and make the right adjustments. This is the last demon in this course in terms of individual projects. I feel it’s not that hard because at this point, you should feel comfortable exploring different tools and reading through documentation (which is the most valuable lesson that I learned from this course). You will spend less time on getting stuck and more time on coding.

***Project 4.2 Machine Learning on the Cloud: 1) know how to do reasonable feature engineering, 2) know how to use existing machine learning infrastructure and tools in the cloud (hyperparameter tuning, pre-trained models, etc.). If you are into machine learning, this project is really for you. This is a quite interesting project if you love seeing your prediction accuracy grows as you tune everything.

***Project 4.3 Stream Processing with Kafka and Samza: 1) Understand IoT stream data, 2) deploy a Kafka and Samza stream processing system on the YARN cluster, 3) debug using YARN logs and UI. You will build a service similar to Uber that streams real-time data and recommends a best-matching driver to a user depending on the request. Now it’s almost the end of the semester and a lot of duties from other classes start to get in the way. Hang in there!

That’s it with the individuals! If you are going to take 15319, this is all you need to do, along with weekly quizzes. For 15619 students though…keep reading.

The Team Project

Content: This project will allow you to use both MySQL and HBase as your backend and you will build web servers that take requests, process the queries in your front end program, retrieve data from your backend databases and return a response of a certain format. There will be 3 queries in total, requesting different data. The first query does not involve any database. You just need to familiarize yourself with using web frameworks and optimizing your frontend code. The second and third queries involve backends and you will need to design a schema for your databases that optimize the performance.

There are 3 phases in this project. In phase 1, you focus on building the query 1 and query 2. You want to make sure that you put a lot of efforts there to optimize them well, even if it means that you may run out of budget a little bit. Otherwise, there will be so much more pain and penalties later. In phase 2, query 3 comes in and there will be a live test for all three queries. Phase 3 is another live test for three queries using managed services (=not using EC2). If your code is optimized enough, this phase should be very easy.

Live Test: For phase 2 and phase 3, there will be live tests as I mentioned. What happens is that on a specific Sunday night, there will be a few hours of live testing— loads will be sent to your servers and test your server’s throughputs. You want to make sure that you warm up your instances well before the test. Afterwards, just enjoy your life with your teammates and sit there to wait for the results. Keep an eye out for instances (e.g. CloudWatch metrics) and make sure that they are working properly because some of your instances might have already been dead. Oops. We put everything on tmux so that our servers won’t stop when the internet connection unexpectedly breaks down.

Budget: This is what makes the team project so hard. A big proportion of your budget can be wasted on the repetitive process of testing, redesigning and building again. The day of live test costs a lot of money because your machines need to warm up, and the test itself takes three hours. Save your money for that. My team went out of money in phase 1 and we didn’t get any points. This happens so easily and so often.

How I feel about the course?

This is what I want to write on the most.

To be honest, had I not waited for more than one month before writing, I would probably only complain about the course. Now, I could share some feelings from a more objective perspective.

First of all, this course is heavy. I have to give it that. It’s heavy in many ways. To begin with, there were two weeks that I spent more than 50 hours on this course. I was taking 5 courses in total and one course took me 50 hours. When I am writing down this now, I can’t image how I managed to do that. I was suffering a lot because I was a math major in undergrad and didn’t have a good enough foundation in CS when I started. I ate one meal a day and slept five hours a day for a few weeks. I noticed I wasn’t in a good shape and I am glad that I was able to find help soon enough. This class tries to teach you a little bit of everything. However, because it’s so heavily-loaded, I felt I didn’t learn each topic well. I talked to many people in the class and that was how the majority of them felt. That being said, now I think this is actually not a big deal. I will come back to this point later. Throughout the course, you probably will neither sleep well or eat well but be sure to find help if you feel your life is in disorder. Please take care of yourself!

I did learn a lot from the course. I’m not talking about the materials. Yes, you probably will never use AWS API to launch Autoscaling Group ever in your life, but you probably never used calculus to buy groceries either. Learning how to learn is more important than learning what to learn. The most important thing is that this class forces you to learn a brand new tool or concept in a very short time. You will gradually discover what is the most effective way for YOU to learn new things. This ability of learning fast has helped me me a lot since I began my internship 3 weeks ago. I had to get familiar with internal tools in a day. I was able to do that because I had done it before — I knew what to read and where to find help.

Because this course is completely online, it forces you to learn on your own and find help when you can’t do it on your own. In the real life, you couldn’t do everything on your own, so go ask for help. I befriended a lot of nice people (TAs and people who go to the TAs) who are patient and willing to answer my dumb questions. My team is perfect. We are far from the “best” team if you measure us with grades. We are best friends now because we go through all the sufferings together. We were together almost all the time since the team project began. This kind of friendship is a lifetime one.

You don’t have to be a good programmer but you have to be mentally well-prepared before taking this class. The class will also train you on that even more.

One last thing. Don’t compare yourself to others. Each individual is so different in terms of their backgrounds. Some people worked in the industry for years before taking this class, others only took 1 or 2 programming classes prior to this class. Some only take 3 classes in one semester and some take 5. Keep an eye on how YOU grow and how YOU learn. You will be forced to have this growth mindset as well. Otherwise, it will be so difficult to even get through this class.

Two months ago, I felt miserable that I had to take this class (because it was a required course in my program). Now I know it was an opportunity for me to grow because it was so tempting to just take easy courses and get a degree without learning much from suffering.

--

--

Lisa Hou

Facebook SWE. CMU & UCSD Alumni. Still learning to be a semi-good writer.