Podcast Episode #8: The Insights of a Data Science Academic with Justin Eldridge

Wilson Xie
Data Science Student Society @ UC San Diego
5 min readAug 12, 2020

This article recaps the main takeaways of our podcast episode with Justin Eldridge. Make sure to listen to the full podcast below or on Podbean. Follow us to stay tuned for more episodes!

Justin Eldridge is an Assistant Teaching Data Science Professor at UCSD, and has taught DSC 10 Principles of Data Science, DSC 40A Theoretical Foundations of Data Science I, and DSC 40B Theoretical Foundations of Data Science II. In this podcast episode, Justin recounts his undeviating journey through academia and offers advice to students pursuing a potential double major or project.

Educational Background and Research Works

The entirety of Justin’s undergraduate and graduate education was spent at the Ohio State University. He initially intended to pursue an undergrad degree in Aerospace Engineering, but after realizing the subject was not for him, switched to Physics and Applied Mathematics. During this time, he became interested in machine learning and artificial intelligence and decided to pursue a PhD in Computer Science at OSU. His dissertation was concentrated on hierarchical clustering, which seeked to examine the accuracy of clustering algorithms. Justin primarily focused on developing definitions and theories to prove any clustering algorithms correct. For example, when we listen to music, we often try to categorize songs by different genres, mood, and more. But there is no one “right way” to categorize (cluster) our music because we can organize them using different methods. So Justin attempted to define correct ways of clustering, proving the algorithms that can cluster correctly. When Justin came to UCSD, he further expanded his clustering focus to his classes, DSC 40A and DSC 40B. These two courses concentrate on the mathematical theories and applications within Data Science, so knowing the concept behind clustering is extremely useful for students when learning about the foundations of Data Science.

Teaching at UCSD

Before becoming a professor at UCSD, Justin underwent the classic dilemma of what to pursue: academia or industry? Justin picked the academic path, and mainly focused on teaching rather than research. However, the University of California offered him the opportunity to continue his research while still teaching at UCSD as a Data Science professor. This opportunity attracted Justin because he always wanted to continue his research while becoming a teaching professor. Additionally, Justin liked working in San Diego because he always wanted to move back to California after graduating at OSU.

When Justin arrived at UCSD, HDSI was newly established. So Justin was given the opportunity to further develop the Data Science program by creating new major classes and writing their corresponding textbooks. The Data Science presence in San Diego also contributed to HDSI’s success because there were so many opportunities. Justin added, “San Diego has many more jobs in Data Science especially in Biotech, such as Bioinformatics. It is definitely a popular destination for people in Data Science because the environment could help their curriculum.” Besides contributing to the department of Data Science, Justin also met many great people that shared the similar mindsets with him here at UCSD. He said, “There are so many people here who are famous in theory of machine learning. It’s ridiculous. A lot of people I see in textbooks or giving Keynote speeches at conferences are teaching and working here. It’s so cool to be in the same place as them.”

Many professors with industry experience like to share and promote their experiences as they incorporate what they learned into their teaching style. However, Justin is a pure academic, having only seen industry through a short-lived consulting lens. When asked about this, he acknowledged this difference and replied that his teaching method was shaped by his physics background. His motives for enrolling in most of his physics classes at OSU stemmed from pure interest. Justin promotes similar ideologies in his Data Science classes, believing that students should have a parallel motive when taking his classes. “Treat people like they’re interested in the thing,” he said, “and even if they aren’t, if you show them what’s interesting about it, and have enthusiasm for the subject, then it’s contagious to the students who weren’t interested.”

Babypandas

When Justin began teaching DSC 10, an entry-level Data Science class, he and professor Fraenkel intended to model it after UC Berkeley’s Data 8 class. Although the material helped introduce Data Science programming — such as how to organize/make tables and develop simple data visualization through the Numpy library — it did not prepare students for the Pandas module.

Looking for a way to improve DSC 10, they decided to create a simplified Pandas library for the class. The Babypandas library introduced Pandas with fewer methods and arguments. It was designed for students who had no coding experience to learn Pandas through a simpler method. Babypandas differed from Pandas such that it had fewer methods and arguments. It replaced Berkeley’s Data Science module in DSC 10. This allowed a better transition for students to learn Pandas in the future. Moreover, Justin also rewrote the textbook that emphasized Babypandas, since the previous textbook used the Data Science module. Here is the textbook link.

Advice for Data Science Students

· Double Major

Justin primarily recommends students to consult with their Data Science advisor when considering a double major. But in his opinion, if a student is interested in academia, double majoring in Data Science and math would be an interesting approach because there would be theories that require math once the student enters graduate school. If a student double majors in Data Science and other science subjects, the student would have a great track in the career of Bioinformatics. For students double majoring in Data Science and history, Justin believes that they would have unique and useful perspectives because they are majoring in two different subjects, which takes a lot of time and effort.

· Data Science Projects

For students that want to work on data science projects, Justin recommends students to look for the data first before answering the problem. Although sometimes students come up with impactful questions, the data to answer the question is limited to them. Therefore, Justin suggests students to work on projects that are interesting to them and have already been answered by other people. Also, Justin advises students to always organize the work before starting a data science project. He adds, “you don’t want to get into a point where it’s a pain to do a simple thing in a project.” So be sure to plan out the schedule before starting a project, or else the student would lose interest in it.

Comments on the Data Science Society

When asked about the DS3, Justin was very satisfied with the organization. He said that Data Science professionals had to work on projects in order to stay competitive in the field, and the DS3 played an important role in giving students the opportunity to work on data science projects outside of classes.

Future and More

In the future, Justin will be teaching DSC 80, The Practice and Application of Data Science. He is also currently working on establishing DSC 40C, a new course in the DSC 40 series that features the application of probability in linear algebra.

If you’re interested, check out Justin’s personal website to learn more about him and his research on clustering.

--

--