Fueling Student Success

Tenicka Terell Norwood
6 min readJul 24, 2023

--

The Power of Machine Learning Pipelines

Photo by Christophe Dion on Unsplash

An average person with only a high school diploma or GED would earn an estimated 1.6 million dollars in their lifetime, compared to a person with a bachelor’s degree who potentially can make ~2.8 million dollars in their lifetime. — Carnevale et al. (2021) Georgetown University The College Payoff

This profound disparity in earning potential underscores the transformative power of higher education and highlights the significance of timely graduation in enhancing students’ long-term financial security.

· Choosing a meaningful project
· Leveraging the OSEMN Pipeline
· Attempting to stay DRY
· Pay attention to your AUDIENCE
Insight 1: Curricular Units in 2nd Semester Matter
Insight 2: Monitoring Tuition Fee Payments is Pivotal
Insight 3: Evaluations and Grades are Crucial
· What are my next steps?
· Contact Me

Choosing a meaningful project

As I progress in my data journey, one of the things that I have learned is that domain knowledge and genuine interest MATTER. Continuously reworking projects that a multitude of folks have completed simply will not help you build your skillset. Along these lines, here are some of the practices that I have collected along the way:

  • Simple is best. I am a true believer. Whichever language you are using determines the style, function, and skills in the industry that make coding in that language readable. Currently, for my Python users out there, I am using PEP8 which is a style guide that gives users a heads-up on the conventions that will enhance readability and collaboration.
  • Stay DRY. The Don’t Repeat Yourself principle helps me from the exhausting practice of rewriting large code blocks that NO ONE will read, use or adapt.
  • Pay Attention to Your Audience. Figure out what matters to your audience. Then find a way to show them things they care about (so what), and what to do with the insights you shared (now what).

I have often leaned into my education domain knowledge. So when I started a project on using machine learning for classification, I chose a dataset that was related to something near and dear to my heart, academic success. The University of California at Irvine has lots of datasets here that you can use to hone your machine learning knowledge(a bit squeaky clean, and yet rather useful) that I highly recommend.

iNterpret → Identify insights and create visualizations of findings.

For this project:

  • Stakeholder: Instituto Politecnico de Portalegre
  • Business Case: Researchers at the Instituto Politecnico de Portalegre want to reduce the rate of student academic failure in higher education.
  • Recall was valued over precision as I was trying to minimize false negatives. (I postulated that minimizing the number of false negatives will help to ensure that fewer students who need educational assistance slip through the cracks and are not appropriately identified and supported).

Leveraging the OSEMN Pipeline

I wanted to tackle this data in a logical way that aligns with steps that folks in the industry would implement when tackling projects that matter to their stakeholders. So I used the OSEMN pipeline:

Obtain → Import the data.

Scrub → Manage the datatypes, and resolve missing data or duplicates.

Explore → Identify patterns within the relationships between variables in the data.

Model → Create a set of predictive models.

iNterpret → Identify insights and create visualizations of findings.

Attempting to stay DRY

In my latest iteration of attempting to stay DRY, I began writing classes so that I can in essence stop trying to reinvent the wheel. Classes help with organization, the reusability of code, making objects interchangeable, and making the overall implementation feel a bit more polished.

Here are two classes I used in this project that helped me explore concepts like constructors (initialize attributes), inheritance (properties can be shared between a class and subclass(es)), and polymorphism (multiple objects can be called if they share a common method).

The ObtainData class helped me load data I stored in a .csv file with semicolons as the separator in a specific data path.

After creating the ObtainData class, I created an instance of that class to get the data to pass it through the pipeline. While this data is pretty clean, I still crafted classes to Scrub, explore and analyze the data.

Here are some of the plots from my exploration:

Image by author/Influence on Curricular units 1st semester (credited), 2nd semester (approved) on Graduation status
Image by author/Influence of Tuition Fees and Age at enrollment on Graduation status

Check out my entire repo here.

Pay attention to your AUDIENCE

For this project my audience was a set of researchers trying identify students at risk of not graduating on time. The general idea is that if you can predict the features of groups who are at risk and provide them timely interventions you can put them back on the path to graduation. So I provided them with two main pieces of evidence:

  • feature importances / coefficients of machine learning models I trained on the data using an 80(train) 20(test) split and evaluated using a recall score.
  • Insights from the exploration that related to key features that the models identified as important.
Image by author/ Evaluation of Machine Learning Models on Ternary Classification of Student Data

Insight 1: Curricular Units in 2nd Semester Matter

  • Students with more approved units in the 2nd semester have a higher chance of academic success.
  • The number of enrolled units in the 2nd semester also positively impacts academic outcomes.
  • Encouraging students to pass more units in the 2nd semester can lead to a reduced rate of academic failure.

Insight 2: Monitoring Tuition Fee Payments is Pivotal

Being up-to-date with tuition fees is crucial for predicting academic success. Timely tuition fee payments correlate with better student outcomes. Implementing strategies to ensure prompt tuition fee payments can positively impact academic performance and reduce failure rates.

Insight 3: Evaluations and Grades are Crucial

Academic performance in the 1st and 2nd-semester evaluations significantly influences students’ likelihood of success. Students who achieve higher grades tend to have better academic outcomes overall. Early identification and intervention for students struggling with evaluations or grades can lead to improved academic performance and decreased failure rates.

Generally, I want to improve my craft and design projects that are insightful and clear to the audience. I will continue to keep pushing myself to build more skills and improve my implementation of data analysis tools.

What are my next steps?

I want to gain a better understanding of the following:

  • How can NLP techniques be applied to projects related to the education domain?

I am currently working on creating visualizations for all of the elementary schools of Sumner County in Tennessee using tools from my ever-growing toolkit.

Contact Me

If you would like to be updated with my latest articles, follow me on Medium. You can also connect with me on LinkedIn or email me at tenicka.norwood@gmail.com.

WRITER at MLearning.ai // EEG AI Prediction // Animate Midjourney

--

--

Tenicka Terell Norwood

Focus not on the intention, but on the impact and value of your insights. I help learners combine their domain knowledge with design thinking principles.