Practicum Spotlight

Evie Klaassen
USF-Data Science
Published in
9 min readJun 17, 2022

A candid conversation with four current M.S. in Data Science students about their internship experiences at W.L. Gore & Associates, the New York Mets, the Golden State Warriors, and Nextracker.

MSDS Students Ashwani, Brendan, David, and Lucas

Background and Introduction

Ashwani: Before coming to M.S. in Data Science program, I completed my Bachelor’s in Engineering and Physics and I was actually working in astrophysics for quite some time. I did some data science related stuff while I was there, and that’s how I got interested in the field and wanted to learn more. To get more exposure to data science, I started working at a consulting firm in India, where I was working on some fintech related projects. At a certain point in my work, I figured I should probably get a formal education in data science, so that’s how I decided to pursue a Master’s degree. I ended up hearing about USF’s program through some friends and colleagues, especially about how well the curriculum here is structured. The curriculum, along with the practicum component, are why I decided to apply and I was lucky enough to be accepted and be able to attend this program. I’m currently doing my practicum at W.L. Gore & Associates.

Brendan: I’m originally from the Bay Area and earned my undergraduate degree in Math at USF, so I had heard about this program then. A friend of mine who previously went through the program told me about how much he enjoyed the program and how rigorous it was, but how it prepared him to be a data scientist. Right now, I’m with the New York Mets for my practicum, and what inspired that was actually my time in junior college, which is when I was exposed to the data behind baseball. Ever since then, I wanted to find a way to apply data to my own baseball career, and the fact that the New York Mets was a practicum partner was one of the main reasons I wanted to come back to USF for this program.

David: A little bit about me, I completed my undergrad at UC Irvine, I studied math and quantitative economics, and after I graduated in 2017, I worked at a startup company for about 3 years. When the pandemic hit, I realized that I always wanted to get my Master’s so I decided this was the time to take that chance. When I was at that startup, we had a lot of data but we didn’t really have anyone who could make use of that data, so that was one of the reasons that I wanted to pursue a career in data science. Right now, I’m doing my practicum with the Golden State Warriors. Go Warriors!

Lucas: I was born in Brazil and moved to the states when I was young, and I studied math and economics in college. After I graduated, I moved out to Los Angeles to play music with my band at the time. While I was doing that, I was also tutoring students in math, and eventually I wanted to move out of the music industry. I started working at a tax incentive firm, and I spent a lot of my time handling data there. With all of the grunt work that was originally being done, I decided I wanted to automate some of these things, which sparked my interest in programming, and then I figured I would take my interest more seriously and I started looking into schools. Here at USF, I’m doing my practicum at Nextracker.

Choosing the Practicum

Ashwani: I chose W.L. Gore because the projects were based on material science and I was really interested in that. I had some background in computer vision and physics, so the practicum projects seemed to be a good mix of that.

Brendan: One of the reasons I chose the Mets, besides my own baseball career, was I really wanted to get a whole in-depth look into baseball analytics because there’s a lot of stuff online, but they don’t always explain how they go about doing it. It’s been really cool getting to view baseball from a different perspective through my practicum.

David: For me, an obvious reason is getting to work for one of the best teams in the NBA — that’s super cool. Another reason is the marketing focus of the project and how the goal is to predict customer behavior, and that was really interesting to me. Before, I was just someone who purchased tickets, but now I understand the business side of how ticket sales work.

Lucas: I thought about getting into the green energy and renewable energy field for some time now, and Nextracker fit that perfectly. They also listed a project that involved economic thinking and forecasting, which I found to be a great way to tie in my undergraduate degree into my practicum experience.

Projects

Ashwani: At W.L. Gore, we had three main projects. One focused on computer vision, where we had to identify various regions within a material. The second project was related to medical imaging, where our goal was to segment different layers in the image and identify where the medical device was located in the cross section image. The last project is maintaining the PyTorch deep learning library at W.L. Gore, adding new features and models to it after we create them for the other projects, so that these tools can continue to be useful for future projects.

Brendan: For the Mets, we had a couple of projects. The first was the shift project, which is all about defensive placement; in baseball, the shift is a specific play, and through a lot of analytics and machine learning, we can create a model that says where teams should put their players based on certain characteristics. It’s a tough problem and a big cold-start problem, but we take the batter and the pitcher and we try to figure out what the possible distributions for a batted ball are. The second project is about trying to find a player’s best complement, which can be applied both to current players and when working on obtaining new players.

David: Right now, our main project is analyzing ticket and merchandise sales, and then predicting whether or not a customer will buy something in the next 6–12 months. Once that prediction is made, we also want to predict whether they buy a ticket or merchandise; if it’s a ticket, predicting what section they would buy, and if it’s merchandise, what type of merchandise they’ll buy. We then pass these results onto the marketing team to increase the conversion rate from marketing to actual purchases being made.

Lucas: Like Ashwani, my practicum is also related to the physical sciences. Last semester, I was working on this library that helps in the optimization and the simulation efforts to see how much more energy we can generate from the sites where our clients operate. Lately, I’ve been in the world of forecasting, and specifically have been trying to forecast solar irradiance. I’ve been doing so using some traditional time series methods like ARIMA and SARIMA, as well as some deep learning methods.

Most Useful Class

Ashwani: I think because I started working with deep learning models so early on, the deep learning certificate course that was offered was really helpful. Advanced machine learning was also really useful in the way it gave us more clarity around deep learning, understanding how different models work, and using PyTorch. The data structures and algorithms class was also useful in the way it emphasized object-oriented programming; we use object-oriented programming a lot when maintaining the library I mentioned earlier.

Brendan: I feel like the practicum presented me with a very unique problem where we didn’t really use any classical machine learning — it wasn’t a very well structured problem. For that reason, I’d say exploratory data analysis (EDA) has been extremely useful. Whether it’s understanding the missing values in our data or observing some wonderful distributions within our data, being able to dive into that dataset with EDA has been so important. I’d also say the introduction to machine learning course, specifically for when we covered clustering, since I’ve used clustering quite a bit in our unstructured problem as well.

David: Definitely the machine learning courses. We use a lot of models and techniques from those classes in my work, such as how to tune hyperparameters. I would also say our SQL classes — there’s so much data to work with, and the data is coming from a variety of vendors, so it’s not very well organized. Because of this, when it’s in the database, we need to write really complex queries in order to get the data that we want. It’s not always fun, but it’s really necessary, so SQL has been extremely useful for that reason.

Lucas: I would agree with a lot of what everyone else has said, solely because a lot of these classes have made me a better programmer in general. The emphasis on documentation, keeping your code clean, and being good to your future self from a programming perspective has resonated with me through all the classes we’ve taken. More specifically, the time series class has been extremely useful for me–I basically keep all of the notebooks from that class open to guide me through my own time series problems. Advanced machine learning has also powered me through this semester, helping me build stronger models for projects.

Tools & Technologies

Ashwani: We’re mostly using Python and PyTorch in all of our projects, and then to help other people in the company who may not be as familiar with coding, we’ve been working on converting a lot of our code from PyTorch to PyTorch Lightning, to make it more accessible for people to work with. Python and PyTorch alone are so powerful, we can do so much with them and don’t really need a huge tech stack to get our work done.

Brendan: For us, we’re using a ton of Python — NumPy and Pandas being some of the main packages we use to work with the data — and we’re working in Jupyter Notebooks almost all the time.

David: Just like everyone else, it’s pretty much all Python, and some SQL to pull the data. For our models, we use a lot of random forests, XGBoost models, and neural networks, and then we work on training these models and fine-tuning our models’ hyperparameters.

Lucas: All the work we do is on virtual machines, but we’re working primarily with Python, and working in a lot of notebooks. I also use a lot of stats model packages, a lot of scikit-learn, and of course NumPy, Pandas, and PyTorch for deep learning.

Most Interesting Thing Learned

Ashwani: I feel like I’ve learned so many things during my practicum. We had issues with not having enough data, so I learned a lot about the challenges that come with not having enough data. At first, the problem looked impossible, but then I read some research papers and it opened my eyes to a very innovative approach. Another big thing I learned is how to develop something in a very collaborative environment, instead of just working on my own on a project. The last thing — which may sound a bit naive — is how to Google stuff. Knowing how to find relevant research papers or look up different techniques and in general, understanding how to be resourceful, that’s always going to be important in data science.

Brendan: One of the main things I learned is to be very methodical when writing code, and to always be kind to your future self. In the beginning, my notebooks were super messy, and it was really frustrating when I would have to go back through my code to fix something small, so definitely being methodical when coding.

David: The most interesting thing I’ve learned is how to deal with imbalanced data — as you could probably imagine, nobody is going to attend a basketball game everyday. It’s also interesting to learn what other stakeholders may need, and how we can adjust the work that we’ve done to fit their needs.

Lucas: The main lesson that I learned is that wherever you are, it’s important to be nice to the data engineers on the team. You never know when stuff in the background is breaking, and you’re going to need them to help you out. I think I’ve also learned a lot along the lines of not letting the perfect be the enemy of the good; it’s important to know when it’s time to call it quits and try something else, instead of spending too much of your time trying to overdevelop something.

--

--