Data I/O 2024 Recap

Ryan Lingo
99P Labs
Published in
11 min readFeb 26, 2024

The Ohio State University’s Data Hackathon

Another weekend, another Ohio State University hackathon. This past weekend, I had the privilege of representing 99P Labs at Data I/O, The Ohio State University’s hackathon focused on data analysis and problem solving. Organized by the OSU Big Data & Analytics Association, the event gathered students passionate about leveraging data to develop innovative solutions.

As a mentor and judge, my role was to guide and evaluate the participants as they tackled complex challenges over the course of the event. I was consistently impressed with the creative solutions and interest in learning on display. The students’ enthusiasm for collaborating and creating social impact through data was infectious.

99P Labs was proud to sponsor this hackathon, which aligns closely with our mission of supporting the next generation of data scientists and changemakers. By providing resources and mentorship, we aim to empower students to harness the power of data for social good. Events like Data I/O are crucial touchpoints that allow us to engage with promising talent and build our community.

In this blog post, I’ll share my experiences from Data I/O, offering insights into the projects I encountered and the overall atmosphere of the event. Join me as we look back on a weekend filled with learning, innovation, and collaboration.

Event Overview

Data I/O brought together 92 participants which made 26 team submissions. The event was structured to provide a mix of hands-on work and learning opportunities.

Activities began at 10:30 a.m. with the hacking phase where teams started to work on their data-driven projects. Shortly after, the first workshop at 10:45 a.m., ‘Hacking the Hackathon,’ provided practical tips for navigating the event.

Lunch at 1:00 p.m. offered a chance for attendees to take a break and discuss their projects with peers. The afternoon session featured a second workshop at 2:00 p.m., focusing on ‘Dashboarding in Python’, which was directly applicable to the tasks the teams were tackling.

The hacking phase ended at 5:30 p.m.,(there are always uploading issues :) )shifting focus to the judging process as teams prepared to present their results. The event concluded with the awards announcement at 6:30 p.m., where teams were recognized for their efforts throughout the day.

The schedule was thoughtfully designed to facilitate a productive day of coding, learning, and networking. BDAA did a great job, especially Suvan Dommeti!

A team hard at work

Keynote: Foundational Advice for Data Enthusiasts

In my keynote before the hackathon started, I had the chance to share some guiding principles with the participants — principles important for anyone looking to pursue a career in data.

I emphasized the critical nature of continuous learning by developing a daily habit. The data world is always changing with new tools and methods coming out. Staying up-to-date by improving your skills isn’t just recommended, it’s essential. By taking part in events like this hackathon, the participants were already taking important steps on this lifelong learning path.

Another key point was the importance of critical thinking. Unlike tidy academic problems, real-world data challenges are messy and complicated. Being able to break down a project, ask the right questions, and define a problem effectively is a core skill for data professionals.

I also talked about starting simple with a minimal viable product, or MVP. This approach focuses first on the core features before adding complexity. It’s about getting the basics right and then gradually building more advanced solutions.

Finally, I stressed the importance of seeing a project to the end. The goal of a hackathon goes beyond finishing a project; it’s about practicing how to package and present your work to an audience. I encouraged the attendees to see their final presentations not as a formality, but as valuable practice in storytelling and communication — key skills for any data professional.

With these thoughts, I aimed to set a tone for the day that was as much about developing technical skills as personal and professional growth.

The Challenge

This year’s Data I/O hackathon prompt centered around a practical and socially impactful problem set by a hypothetical client, Cycle, a bike-sharing company based in Chicago. The challenge? To delve deep into the company’s open-source dataset and conduct an analysis of bike trip data collected over the past year.

Participants were tasked to employ numerous techniques, including data cleaning, exploratory analysis, and visualization. The goal was to find actionable insights that could steer strategic decision-making and drive innovation within the bike-sharing domain.

The deliverables were clear-cut: each team had to submit their source code, a well-crafted presentation deck, and a concise 3-minute video that articulated their findings and recommendations. These elements were not only meant to assess the technical proficiency of the participants but also their ability to communicate complex data stories effectively.

Highlighting the Top Five: A Closer Look at Data I/O’s Winning Projects

In this section, we dive into the standout projects from this year’s Data I/O. With 26 submissions in total, each entry showcased a high level of skill and innovation, making them excellent examples for anyone looking to enrich their resume and portfolio. Despite the strong competition, we narrowed down the field to five winners. Below, we’ll explore each of these top projects in detail, shedding light on what set them apart.

Best Insight

Best Insight Winners

Team Yeetcode delivered standout insights on their bike rental pricing model project, aiming to maximize revenue. The team analyzed membership data and bike usage patterns, distinguishing between members and casual riders across classic, docked, and electric bikes. They concentrated their efforts on understanding ride duration patterns, discovering that most rides fell within the 0 to 50-minute range, with the highest frequency of rides occurring between 5 to 10 minutes. This analysis was grounded in a comprehensive examination of a random sample of compiled data, revealing an average ride duration of 15 minutes.

Insights from Team Yeetcode revealed significant behavioral differences between casual riders and members, with casual riders using bikes 2.4 times longer than members on average. This critical finding, alongside a competitive analysis of pricing strategies by other bike rental services, informed their innovative pricing model. By strategically setting lower unlock prices and higher per-minute rates, and considering casual riders’ longer usage patterns, they crafted a membership pricing scheme that was competitive yet advantageous. This approach not only differentiated their service but also targeted maximizing revenue through tailored pricing that reflects actual user behavior and market conditions.

Team Yeetcode’s Presentation

Best Visualization

Best Visualization Winners

Team Janardhan stood out as the best visualization winners. Their project crafted a series of plots and charts that illuminate the patterns and trends within the dataset. Their work included plotting the total number of bike trips throughout the year, identifying a significant uptick in rides during the summer months, particularly in July and June, followed by fall. This seasonal trend was vividly captured in a pie chart, revealing that these periods combined account for more than 70% of the annual biking activity. Additionally, their analysis extended to comparing ride frequencies between weekdays and weekends, uncovering that weekdays experienced a higher volume of rides, largely due to commuting patterns, while weekends saw a balance between member and casual riders.

Digging deeper, Team Janardhan’s insights shed light on the habits and preferences of the biking community. They discovered that members typically ride for an average of 12.34 minutes, suggesting shorter, possibly routine commutes, whereas casual riders, potentially engaging in the activity for leisure, spent more time per ride. The team also pinpointed the most popular stations and routes, highlighting Streeter Drive, Grand Avenue, and areas around Northwestern University and the University of Chicago as focal points of biking activity. This specificity not only emphasizes the role of students in the biking ecosystem but also guides infrastructure and service improvements. Their creative use of heat maps further illustrated peak riding times, aligning with the end of office hours. Additionally, the revelation of 6,000 stolen bikes within the dataset points towards a critical area for security enhancements, underlining the team’s contribution to both understanding and potentially improving the biking landscape.

Team Janardhan’s Presentation

3rd Place: Muk

3rd Place Winners

Team Muk, securing the third-place finish, took on a project to tackle bike theft. They began by analyzing usage patterns, identifying peak hours with the majority of bikes in use around 5:00 p.m. Their investigation revealed a significant issue: 6,000 bikes were missing, primarily in Southside Chicago. They pinpointed the times when bikes were most frequently disappearing, notably around 4:00 a.m. and during rush hour. Their analysis extended to the distribution of bike types, noting 15,000 electric, 13,000 classic, and approximately 1,000 dock bikes. A crucial finding was that no electric bikes were stolen, attributed to their GPS systems, while classic and dock bikes were the primary targets.

From their data, Team Muk discovered that the majority of users were members, yet a substantial proportion of casual riders were linked to the missing or stolen bikes. This insight led to the calculation of potential losses, with 6,000 missing bikes averaging $500 each, resulting in a direct loss of $3 million. Considering usage rates and profit margins, the additional lost revenue was estimated at around $820,000. Team Muk proposed a solution to mitigate this issue: leveraging the data to establish a baseline and imposing a penalty on casual riders who fail to return classic or dock bikes, charging them 50% of the bike’s price as collateral. This strategy aimed to address the disproportionate theft and loss associated with casual riders, particularly in high-risk areas like Southside Chicago.

Team Muk’s Presentation

2nd Place: Small P Values

2nd Place Winners

The second place finishers, team Small P Value, created an impressive analysis of the Chicago Bike Share system. They began by generating a heat map to visualize the distribution of bike rides, revealing a concentration in downtown and near the University of Chicago in Hyde Park. The team then cleaned the dataset, removing entries without return information and correcting data anomalies, such as negative or overly long ride durations. They employed a strategy of data imputation for missing variables, ensuring a robust dataset for analysis. Their investigation uncovered patterns in rental activity, notably identifying peak hours between 6:00 a.m. and 9:00 p.m., with the highest frequency around 5:00 p.m. The analysis did not initially distinguish between weekdays and weekends, but further exploration revealed consistent rental patterns throughout the week, with members primarily renting on weekdays and casual riders on weekends.

Through their analytical efforts, Small P Value developed a generalized linear model (GLM) to delve deeper into the factors influencing ride durations, incorporating day of the week and membership status as predictors. This model highlighted significant predictors with P values less than 0.001, underscoring the different usage patterns between members and casual riders, especially during commuting hours. Their insights informed several business strategies, including focusing on retaining current members who account for 60% of all rides, and introducing targeted recreational plans to attract casual riders towards membership. They also suggested a tailored plan offering two 45-minute rental periods on weekdays for commuters, based on the finding that 45 minutes represented the 95th percentile of ride durations. The team acknowledged limitations in their dataset and proposed improvements, such as tracking individual user and bike IDs to enhance service offerings and address infrastructure challenges like dock overflow, highlighting a nuanced understanding of operational efficiencies and customer engagement strategies.

Team Small P Value’s Presentation

1st Place: Byte Busters

1st Place Winners

Now for the first place project, Byte Busters tackled the challenge of optimizing bike distribution within a bike-sharing system. By analyzing the provided bike-sharing data, their goal was to enhance resource efficiency through strategic bike relocations to meet demand patterns. Through a series of exploratory analyses, including histograms of start times and density maps of bike usage, they identified key temporal and spatial trends in bike sharing. These analyses helped them understand when and where bikes were most needed, focusing on seasonal and hourly variations in bike usage. Their approach was data-driven, utilizing visual tools like bubble maps and animations to illustrate the dynamics of bike movement across different times and areas within the city.

The insights gleaned from their analysis revealed several important trends: there was a clear seasonal variation in bike usage, with different patterns emerging in January and July, attributed to factors like the school season. Time-based analysis further indicated peak usage times, suggesting optimal periods for bike relocation. Specifically, they observed a general movement of bikes towards the city center, with certain areas experiencing balanced bike inflow and outflow, thus not requiring relocations. The project concluded with actionable recommendations, such as the need to relocate bikes from downtown to suburbs around specific times to address the increased demand during peak hours. This strategic approach aimed to ensure that bikes were available where and when users needed them most, thereby improving the efficiency and effectiveness of the bike-sharing system.

Team Byte Buster’s Presentation

Conclusion

Data I/O demonstrated the immense potential of data science to drive impact. Through dedication and teamwork, participants showcased how analytics can uncover insights to inform business strategy and civic progress. The projects highlighted innovative applications of data across domains like transportation, pricing, infrastructure, and theft prevention.

Equally valuable were the connections and conversations cultivated at Data I/O. The collaborative atmosphere nurtured learning and growth, allowing students to share knowledge and motivate one another. Mentorship played a pivotal role as well, enabling experienced data scientists like myself to impart guidance to promising talent.

At its core, Data I/O embodied a spirit of community building around data for social good. The participants’ drive to gain hands-on skills and develop practical solutions revealed a generation eager to advance society responsibly. Events like this catalyze the future of data science by convening diverse perspectives under a common purpose.

The ingenuity on display inspires optimism about where data-driven innovation can lead. But realizing its full potential requires ongoing collaboration across sectors. If you found this glimpse into the hackathon rewarding, I welcome you to stay engaged with our community. Consider subscribing to the 99P Labs blog, connecting on LinkedIn, or reaching out to explore partnership opportunities.

Through open exchange of ideas, we can direct data science’s immense capabilities toward the human experiences that matter most. Technological hype often loses sight of real needs. Working together, we can ensure analytics improves lives. There are always more perspectives to incorporate and connections to make. 99P Labs looks forward to continuing the conversation with you.

--

--

Ryan Lingo
99P Labs

🚀Dev Advocate @99P Labs | Unraveling future mobility & data science | Insights on #AI #LLMs #DataScience #FutureMobility 🤖💻🚗📊🌟