Scaling Data Science Learning for the Masses

How can individuals best harness data to create meaningful change? We’re helping figure out how — by helping build new infrastructure for UC Berkeley’s growing Data Science department.

By Sam Lau


Data Science: What Is It, Really?

To be completely honest, it took me a long time to understand and appreciate data science. I used to think data science was one of those buzzwords that gets thrown around whenever someone wants to sound important.

Of course, I quickly realized that there is a lot more substance to data science. During the spring of 2015, John DeNero — a faculty member in the UC Berkeley Computer Science Department — mentioned to me that he was helping to create a new data science course for freshmen. I joined mostly for the fun of building a new course, but I didn’t expect to learn how engaging the concepts could be.

In Data 8, the introductory data science class, students come in with no background in computer science or statistics. This is unique in and of itself, considering more and more students are entering introductory engineering courses with some prior knowledge. Despite that, they quickly learn how apply data science principles to a variety of real-world situations: spotting a biased jury, plotting bike rides across San Francisco, and even tracking the location patterns of a previous mayor of Oakland. This class is a lot of fun to teach because students come from all over campus — over 50 distinct majors are represented.

First day of Data 8 lecture

More importantly, I get to watch students develop the ability to ask, “Why?” Why is it that policy A is better than policy B? Is it really true that smoking lowers life expectancy? Knowing data science ultimately allows you to make informed decisions. That’s why teaching it is so important to me.


Data Science: Future Individual & Social Change

Why is understanding data science so important for the future? I’ll speak from two perspectives here: first, the perspective of a citizen who wants to make informed decisions, and second, the perspective of someone actually doing the act of gathering, analyzing, and presenting conclusions from data.

  1. For the citizen, data is quickly becoming a personal issue. For example, Facebook is probably quite good at automatically tagging you in photos now because they know what your face looks like. How far should we be able to trade privacy for convenience?
    It’s probably even more important to understand where data goes wrong than where it goes right. There have been entire books written about how numbers can lie. How do you know an article on the internet is interpreting the facts correctly? When it comes to topics that matter, we all too often debate on the basis of rhetoric rather than on the data and its assumptions. But often, there is a clear conclusion that can be drawn from the data available, and people can make better decisions if they were aware of this.
  2. For the data scientist, you have the ability to make a convincing case for just about anything you want. Having more data available at our fingertips also means you have more opportunities to use your skills to help people understand situations better and make more educated conclusions. That is no small burden of responsibility. 
    I hope that more people choose to use their skills for social good, because it’s a slippery slope the other way. And even though many are surprised to hear that you can use your knowledge and abilities for social good, I’m encouraged by the fact that most only need to hear it once to experience a shift in perspective.

The Data 8 Project: Scaling Up For More Students

Part of what makes data science so exciting is how rapidly the field is advancing. To help meet this demand, Blueprint is partnering with UC Berkeley’s Data 8 class for a year-long initiative to help expand data science offerings to students.

For Data 8, we have infrastructure set up to provide students with Jupyter notebooks in the cloud. This means that they don’t have to install anything on their computers. They can just visit a URL and practice coding in the same environment that academia and industry use today.

Many other classes at Berkeley and at other universities want to have the same setup for their students — it decreases barriers for non-technical students to learn and makes providing support easier. Unfortunately, it’s difficult for these classes to use our infrastructure because setting it up is very complicated and error-prone.

Blueprint’s Data 8 Team (from left to right): Jeff Gong, Derrick Mar, Sam Lau, Allan Wu, and Peter Veerman

My team’s goal this semester is to create a robust, scalable deployment of our infrastructure we can give to anyone, especially professors from other departments and universities who will be able to incorporate more sophisticated analyses in their classes. We want anyone to be able to copy and customize Data 8 for their students.

Our goal is to make learning data science as easy and accessible as possible for students across different campuses.

And with that comes a newly educated group of students empowered to make better decisions: for themselves and for the community.


Sam Lau is a senior studying EECS at UC Berkeley. Sam is currently leading the Data Science 8 project team for Blueprint.


To learn more about Blueprint and read more thought leadership about using tech for social good, follow us on Facebook and sign up for our community newsletter!

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.