My first Data Science Internship

Ahmed Jouda
DataSoc
Published in
3 min readSep 13, 2020

Data Science has a huge application in the sports industry. Mathematical models for predictive analysis are built which include machine learning and AI to try and maximise the performance and achieve the best results in the given conditions based on previous data.

Last week I completed my summer internship as a data science intern at the Insight Centre for data analytics. I designed and built an app that gives training recommendations for marathon runners based on Dublin Marathon 2016 data. My tasks included cleaning data, finding the best software to build the app and manipulating data and displaying it in the most user friendly way. I used Python and Streamlit to build the app and handle the data.

App Home page

Going into this internship with limited Python knowledge and not hearing about Streamlit before was quite scary and challenging, however, once I got over the fear factor I managed to pick up the tools as I worked quite easily. I also learnt how to conduct a proper app building process starting with a detailed app design. I also compared and contrasted softwares in order to choose the most appropriate one.

Working from home is something that was new to all of us, it taught me the importance of having colleagues working around you in the office. Encountering problems with my work was harder to deal with by myself locked up in my room. However, I received tremendous support from my supervisor Dr. Aonghous Lawlor which allowed me to keep going in difficult times.

I started with a huge data set with over 100,000 rows and 20 columns. After 10 weeks of work the data was cleaned, a software was chosen after conducting a comparison and an app was built. The app analyses the user’s training pattern and asks the user for their marathon goal time. The user’s training data is graphed, then it is compared and contrasted with the training data of runners who achieved the specified goal time as well as having a similar training pattern to the user based on number of runs and gap between the runs. The key part of the program is the K-nearest-neighbours algorithm which finds the “similar runners”. It is a simple, easy-to-implement supervised machine learning algorithm.

Runner inputs their goal time
Similar runners comparison

The first thing I did was manipulate the data in Jupyter Notebook to understand it fully. I then compared the different softwares by building trial applications using them, I did this with Steamlit and Dash then ended up choosing Streamlit. A basic app with very few features was completed first on a Jupyter Notebook then built into Streamlit. Once that ran, I worked on the final application, first on Jupyter Notebook then developed it in Streamlit. All of my research and trials are uploaded to my Github repository (Link Below).

Completing this internship in Machine Learning and Data Science has allowed me to explore these fields in a work environment and made me realise that I would love to pursue a career in them. This internship has taught me not only technical skills but also how to conduct comparative research and work in a company setting. For once I am not dreading my coding modules next semester!

Link to the Github repository: https://github.com/AhmedJouda2000/Marathon-App

--

--

Ahmed Jouda
DataSoc
Writer for

Consultant Developer @Guidewire Software || UCD Computer & Data Science Graduate || DataSoc || Enactus