Practicum Spotlight #2

Jay Chung
USF-Data Science
Published in
7 min readMay 17, 2024

In continuation of our ongoing Practicum Spotlight series, I had the privilege of sitting down with more students from the MSDS program at University of San Francisco to delve into their practicum experiences.

Sissi Shen — Data Scientist Intern at Atlassian

Q: “Who is your company mentor (name, title, team, etc.)? How often do you connect and what kind of help do they provide?”

A: “We have two mentors: Steve Shibuya, Principal Data Scientist, and Nathan Murstein, Associate Data Scientist in the Decision Science team. We meet with them once a week on Monday, and occasionally on Wednesday for a code review. In our Monday meetings, we usually reflect on the work from the previous week and set the agenda for the following week. Steve oversees our progress, and Nathan helps us on a more detailed level.”

Q: “Can you describe the project(s) you worked on so far?”

A: “The Decision Science team has been working on analyzing customer behaviors and has built ​​an econometric model to predict the customer’s propensity to purchase a license for Atlassian products after signing up for a trial. Our main project so far is to build a scalable model validation framework specifically designed to test and facilitate the improvement of this Top of Funnel Quality (TOFU) model. To do it, we developed a model-agnostic Python package that streamlines the model validation process, and we’re now optimizing the MLOps production pipeline with MLflow for the TOFU model to incorporate this validation functionality. Since our package has contributed an 80% reduction in computation cost and total man-hours for the team, we are also looking at campaigning the other machine learning and data science teams to incorporate our validation package into their pipeline.”

Q: “Has your practicum experience changed your career plan/outlook?”

A: “This valuable experience has allowed me to really get the feeling of how a Data Scientist works at a SaaS company, exposing me to the types of data they handle and their day-to-day activities. Analyzing customer behaviors within the Decision Science team has opened new doors for me in this field, and I am eager to pursue further opportunities that allow me to continue learning and growing in this area.”

Irene García Montoya — Senior Data Analytics Intern at Eventbrite

Q: “Can you describe the projects you worked on so far?”

A: “With the other intern from the program, we have developed a data pipeline and analysis system to find discrepancies between the fees in signed contracts and the actual fees that are being charged to clients. We have identified lost revenue of ~10M dollars. Our second project, in which we are currently working on, is about Early churn detection for key clients. We plan on implementing a Supervised and a Unsupervised model and see which performs better.”

Q: “What have you learned so far in the practicum?”

A: “I have greatly improved my business sense and skills, so I feel that I communicate better with fellow employees. I have also become proficient in using tools such as Snowflake, Tableau and Salesforce. Although our first project didn’t have many ML elements to it given we were hired as Data Engineer Interns, it taught me to adapt my classroom knowledge to real data.”

Q: “Has your practicum experience changed your career plan/outlook? If so, can you describe how?”

A: “My practicum has helped me solidify the feeling that I had about favoring Data Engineering over pure Machine Learning. It was one of the factors that pushed me to enroll in the DE extension of the MSDS program. I have also become more conscientious of the value of data lineage, data consistency and data quality after having to consistently work with data that needed workarounds.”

Ian Duke — Data Scientist Intern at ACLU of Northern California

Q: “Can you describe the project(s) you worked on so far?

A: “The majority of our practicum work supports a case related to racially discriminatory traffic stops. Through this case, the ACLU has collected a lot of body camera footage. Instead of having the office’s investigator review every single body camera video, we are building machine learning models employing techniques in natural language processing and computer vision to flag videos that warrant more extensive manual review. We have engineered a pipeline that automatically associates over 550 body camera videos with a corpus of more than 3,500 written police reports. This combined information has been used to train machine learning models to classify videos containing relevant events, such as searches. We estimate that our efforts to date have allowed the investigative team to review videos more than 358 times faster than manual review, which is pretty neat!”

Q: “How did you choose your practicum?”

A: “I actually knew I wanted to work with the ACLU when I applied to the MSDS program. Prior to attending University of San Francisco, I worked as an Investigative Specialist at the Public Defender Service for the District of Columbia. I decided to go to graduate school to learn everything I could about data science and its many applications to social justice movements. Working with the ACLU was such a natural fit! In fact, I came into this experience hoping to learn more about how to use applied statistics to advance civil rights efforts, and my experiences at ACLU have left me with an even stronger dedication to do just that.”

Q: “What are some new technical things (skills, knowledge) that you have learned?”

A: “I’ve learned quite a bit about techniques in computational linguistics — how to convert language to numbers in ways that capture the meaning of the words. I’ve really enjoyed exploring ways that we can translate human-interpretable language into something our computers can understand, and then using features like cosine similarity to strengthen classification models. I’ve also gained a lot of experience in communicating these ideas to legal professionals who haven’t had much exposure to machine learning.”

Seneth Waterman — Data Scientist Intern at The Nature Conservancy

Q: “Can you describe the project(s) you worked on so far?”

A: “Groundwater is a critical yet diminishing natural resource in California, where aquifers supply about 40% of the state’s water in typical years, and even more during droughts. However, some regions are experiencing rapid aquifer depletion. In response, the California legislature passed the Sustainable Groundwater Management Act (SGMA) in 2014. This legislation established a framework for managing groundwater by requiring local agencies to form Groundwater Sustainability Agencies (GSAs) and develop comprehensive Groundwater Sustainability Plans (GSPs). These plans are vital for preventing groundwater overdraft and securing long-term water sustainability.

To support these efforts, The Nature Conservancy conducts an annual review of over 100 GSPs, offering detailed feedback to the state on how these plans accommodate the water management needs of natural environments and disadvantaged communities. Analyzing these extensive plans, each spanning hundreds of pages, is both time-consuming and expensive. To improve efficiency and cut costs, our team developed a large language model, called ChatGDE, specifically designed to automate the GSP review process. In its testing phase, it’s proven to have about 70% accuracy.”

Q: “What have you learned so far from your practicum experience?”

A: “Prior to starting the MSDS program, I was a high school math teacher. So, my calculus knowledge was sharp, but my technical skills were very limited! But I have learned so much over the course of the MSDS program and during my work with The Nature Conservancy, and I have seen other teachers that joined USFCA’s MSDS program and go on to work for major corporations like Coinbase and CapGemini.”

While working on ChatGDE, I’ve refined my understanding of large language models and learned how to develop a RAG model. My Python skills have significantly improved, along with my ability to work with APIs. This experience has provided a comprehensive insight into the entire ML pipeline, covering everything from data cleaning and processing to model creation and evaluation.

Additionally, working on the dashboard has provided valuable experience in creating an ETL pipeline. We used SQL queries embedded in API calls to retrieve data from the California Natural Resources Agency’s database, stored this data in AWS, and then transformed it into clean and interactive visualizations for a live, weekly updated dashboard.”

Q: “Has your practicum experience changed your career plan/outlook?”

A: “Working with The Nature Conservancy, I’ve seen firsthand the substantial and positive impact that data science can have on conservation efforts. This experience has inspired me to continue focusing on environmental issues. I am eager to apply the skills and insights gained at TNC to future projects, continuing to address challenges related to mitigating climate change and enhancing our natural resources.”

The MSDS program is looking for practicum partners for the 2024–2025 cohort! If you are interested, please contact Victor Palacios, Director of Data Science Partnerships (vpalacios@usfca.edu).

--

--

Jay Chung
USF-Data Science

Data + AI Product Manger. I'm passionate about ungatekeeping AI and write about AI for non-technical audience.