DSC Project Exposition 2023 Recap

Nurturing the Diversity of Data Science

--

How it all Began…

From early October to late November, NYU’s Data Science Club began a series of workshops that invited speakers to mentor students in essential skills that they would need to start their own data science projects. The workshops led to students building unique projects which they presented at NYU’s Data Science Club Project Exposition on December 1st.

The series started off with a Project Development workshop led by Jason Sheu from XiFin that focused on identifying what problems exist within their field of interest and what type of project (prediction, classification, or data analysis). Participants were able to focus on the holistic development of projects so that they can apply these concepts to a wide range of data.

Data Science Students at the Project Development Workshop

The level of excitement demonstrated in the previous workshop was also seen at the “How to Process Data” workshop with Aditya Singhal, the founder of the NYU Data Science Club and current SWE at Amazon Health. Aditya led an interactive session using a Reddit dataset from Kaggle, performing beginner-level data cleaning techniques to help participants begin their data science projects.

Following this workshop, the Data Science Club hosted the GitHelp Workshop with Yucen (Lily) Li, an ML PhD student at NYU and former SWE at Meta. Here, participants learned about tips for data visualization. Throughout the workshop, participants used R, Python, and SQL to work on identifying problem statements like predicting housing prices, classifying email messages, and analyzing climate-related data to discover patterns.

Many of those who attended the workshops intended to apply the skills learned to their unique projects, from GitHub logistics to cleaning data! Everyone who attended the first workshop started with ideas, and some didn’t even have that. However, workshop after workshop, students built foundational skills needed to bring these ideas to life.

Data Science Students at the GitHelp Workshop

Participants meticulously selected their dataset, preprocessed it, and created their visualizations. By choosing either the data visualization or machine learning route (or both), each group utilized a unique method to present their solution to a contemporary problem.

On the day of the project expo, four industry professionals judged the projects while the participants presented their findings from their months-long effort: Ren Liu, an NYU math major alumni with a master’s from UChicago in data analytics and 4 years of industry experience, Alex Kass, a Yale cognitive science major with a masters from Columbia in statistics and the director of data science for MongoDB, Sophia Tee, Northwestern math undergraduate a statistics masters from Yale with 10 years of experience with companies like Verizon and Samsung, and finally Will David, who has a computer science degree from John Hopkins and a masters in engineering with 3 years of experience at companies like JPM Chase and Microsoft.

Taking a Closer Look at Participants and Projects

A total of 9 projects were submitted, and judges chose the top 2 projects in the batch. Participants were given 7 minutes to present their analysis and findings, followed by a 3-minute round of questioning from the judges.

The final day of the Project Expo series, the NYU Data Science Club Project Exposition

The event commenced with four professionals attentively noting the performances of seven teams, who confidently occupied center stage to exhibit their analyses, complete with compelling graphs and insightful inferences. The atmosphere was electric, filled with eager anticipation and a sense of excitement, as attendees eagerly looked forward to the unveiling of these data-powered solutions. Alex Frzysucha, who presented a project on French manuscript translation said “This is more than just a project presentation; it’s my opportunity to demonstrate how data can breathe new life into historical texts.” The event began, and the audience was treated to a diverse array of projects that spanned a wide spectrum of domains, ranging from plastic waste management to french literature. Here are the details of these exceptional projects that made an unforgettable impression at the Project Expo 2023:

1. EloBAR: Predicting Live Win Probabilities — Jihwang Sung

EloBAR aims to predict live win probabilities for sports events, particularly focusing on the Korean Volleyball League. It combines ELO and BAR rating algorithms to calculate win probabilities based on the sets won by each team. The fusion of these two ratings provides a more comprehensive outlook on the game’s outcome, considering both live game scores and player statistics, and produces impressive results in predicting the winning team in a game.

2. Plastic Waste Analysis — Edward Wu and Vincent Qin

This project examined plastic waste data from the years 2010 and 2019, using GDP per capita and population density as features to analyze plastic waste contribution across countries. The analysis compared the change in trends of plastic waste in both years using various economic factors. A multiple regression analysis improved accuracy to 75%, and revealed a positive correlation between the country’s GDP and Plastic Waste Generation in both 2010 and 2019.

Edward Wu and Vincent Qiu presenting “Plastic Waste Around the World” (left). Jihwang Sung presenting “EloBAR: Predicting Live Win Probabilities” (right).

3. Predicting Musical Trends with Spotify Data — Annie Zhang and Krystal Wu

This project explores the prediction of future music trends using Spotify data. The authors conducted data cleaning and made use of features like acoustic quality and valence, and highlighted trends popular within the music community. Their work is an exploratory data analysis (EDA), and they plan to delve into time series analysis in future iterations.

4. Analyzing Professor Feedback — Ibrahim Sheikh

This project focuses on analyzing three years’ worth of professor weekly feedback data. Using web scraping and regex, the data was preprocessed, and participants employed natural language processing (NLP) techniques to predict whether students would find the next week confusing at class. His SVM model achieved an accuracy of 83.75%, but the author envisions that using transformer models would result in better text analysis.

5. Training AI Agents with Personalities — Zekai Zhang and Chan Hyun Yoo

This project aims to fine-tune AI agents around specific personalities by interacting with them in hypothetical scenarios. Each response generated by the agent becomes a data point, contributing to the development of its personality. Although there were challenges in UX and training, the project demonstrates the potential for creating AI versions of individuals.

6. Society Viewed Through Popular Music — Samuel Lee

The participant(s) analyzed the connection between music and society. Using data from Spotify and the Genius Lyrics API, the author conducted data cleaning and lemmatization, revealing recurring themes in music, such as love. Additionally, the author intends to continue this analysis by broadening the scope of his work beyond the U.S. and implementing Natural Language Processing (NLP) techniques.

Annie Zhang and Krystal Wu presenting “Predicting Musical Trends with Spotify Data” (left). “Training AI Agents with Personalities” presented by Zhang and Hyun Yoo (middle). “Society Viewed by Popular Music” presented by Samuel Lee (right).

7. Manuscript Translation using Computer Vision — Alexandra Frzysucha

Alex’s project seeks to translate French manuscripts using Computer Vision (CV). Current progress involves collecting scanned images of manuscripts and employing Optical Character Recognition (OCR) to convert text into strings. The next phase will incorporate Convolutional Neural Networks (CNNs) to further improve translation. Future plans include extending the project to translating manuscripts of other languages.

At the Project Expo 2023, two standout projects left a lasting impression on both the judges and the audience — the French Manuscript Translation project and the Class Feedback analysis. These projects not only showcased remarkable technical expertise but also the ability to effectively communicate complex concepts.

The French Manuscript Translation project, spearheaded by Alexandra, drew admiration for its clear objective. What truly captivated the audience was her phenomenal awareness and exceptional presentation skills, making the project both informative and engaging.

On the other hand, the Class Feedback analysis project, presented by Ibrahim, left judges impressed with its comprehensive approach. The project encompassed a well-structured presentation that covered data collection, analysis, model development, and prediction results in a logical order.

Alexandra Frzysucha presenting “Manuscript Translation using Computer Vision” (right). Ibrahim Sheikh presenting “Analyzing Professor Feedback” (left).

The Judge’s Corner: Remarks and Feedback

At the end of the expo, the judges shared their final thoughts, with an emphasis on clarity in defining a project’s results and purpose. They encouraged participants to reflect deeply on their project choices, urging them to consider the real-world applicability of their findings. Another notable expectation from the judges was the creation of an end product from the data analyses. Be it a unique application, a product, or even a piece of algorithmically generated music, the final product serves as a tangible testament to the practical utility and innovative spirit inherent in data science.

Transparency about methodologies and assumptions in data processing was another critical point of feedback. The judges stressed the importance of acknowledging any limitations or caveats in the approach. This transparency not only enhances the project’s credibility but also provides an educational component, offering insights for others in the field.

While many presentations at the expo were lauded for their design, the judges highlighted that the delivery of these presentations is just as vital. Effective storytelling in data science is about more than just conveying information; it’s about engaging the audience and making complex concepts understandable and relatable. This skill elevates a project from merely informative to genuinely engaging, connecting the audience with the data on a more profound level.

Now, let’s pivot for a moment to reflect on the versatile nature of data science, as shared by one of the judges at the Project Expo, Alex Kass. In a candid interview, Kass revealed his remarkable journey, highlighting how data science has become an adaptable and universally applicable field. Never has there been an academic field that is as flexible and widely applicable as the field of data science. From a prospective doctor to a musician, Kass has pursued various career paths before ultimately finding his passion in data science. While reflecting on his journey, Kass expresses an astonishment for how adaptive data science is in all of his different jobs: “You can apply it anywhere and in any context” [1].

Judges Panel: Alex Kass (the first person on the left), Sophia Tee (second), Ren Liu (third), and Will David (fourth)

Like many other data scientists, Kass began his academic journey with an entirely different field of discipline: cognitive science. He got his first sense of computer programming when taking a computer science class on building neural networks, generating syntax, and constructing rules around multiple languages. Kass continued his education at NYU by pursuing a masters degree in music, aiming to find a career that would lie at the intersection of all of his interests. During the next six years, he gradually realized the importance of analyzing data in producing and managing music technology, and therefore, decided to pursue a Master’s Degree in statistics. As his passion for data analytics grew bigger, he started building machine learning models for multiple companies across various industries such as commercial real estate and music technology. No matter where he works, he always finds data analytics skills to lie at the heart of problem solving in businesses. “You never run out of things to learn as a data scientist” [1], Kass commented.

After a long journey, Kass has found his passion in solving technological problems in business through applying machine learning.With consideration of his multi-faceted background, Kass placed heavy emphasis on the practicality of the projects that students displayed in the Project Expo. In particular, he focused on how useful a certain project was when it came to providing solutions or deepening our understanding of real world problems. Another aspect of the evaluation was whether or not students used storytelling techniques to get the audience excited about the story they are conveying. Overall, judges agreed that students demonstrated outstanding enthusiasm and creativity in their projects and successfully demonstrated data interpretation and visualization skills to address real world issues.

Kass’ experience serves as a mirror for students to reflect on their own journey. With various backgrounds and academic interests, participants found resonation in Kass’ story as they themselves are still striving to craft their own path. But what they know for certain is that they all see the importance of data science in any intellectual endeavor that they set themselves up for. From multilingual translation, environmental waste calculation, music generation, to financial analysis, data plays a critical role in every single one of the students’ diverse interests.

All participants and judges

We like to call ourselves data scholars, as we seek to use data to tell stories about the real world. Philosophers usually say that we improve the world not by solving the problems but by understanding them better. The project expo served as an intellectual gathering where data scholars and professionals could connect and produce better interpretations of the world with data. The past two months have shown us the amazing possibilities of data science. Not only did we end the year with a bang by celebrating the hard work and enthusiasm of the participants, but we also experienced the thrill of watching how data deepens our knowledge of real-world problems and brings new ideas to life.

Article written by Connor, Kathy Tran Anh Ngan and Joshua Alfred Jayapal

References

[1] Tran K., (Interviewer) Alex Kass, December 1, 2023.

--

--

Data Science Club @ NYU (Center for Data Science)
NYU Data Science Review

With one of the best data science programs in the nation, DSC@NYU aims to foster a strong community with students who span across multiple disciplines.