A lot of Nigerian graduates hope that they’d be retained after their National Youth Service Corps (NYSC) especially if they served somewhere they’d always dreamed of working before. Some of these people basically just feel their first job after NYSC might be gotten through these places that they have served in via recommendation to some other company, retainment, posting to some other branch of the same company or organisation they served in, or something related. But this isn’t always the case. Most people go to starting afresh in searching for a job and fitting in properly.
**NYSC is a program undertaken by most Nigerian Graduates after graduating from a University or a Polytechnic. The aim is basically for nation building and development.***
So should I have some hope of getting my first job via NYSC? Knowing this would help make some major decision. Decisions such as paying the next rent, turning down a job offer or accepting it, relocating to another town, switching careers or learning a new skill, or simply allowing people understand how these selections, recommendations and retainment are done or generally just allowing people plan their lives as much as they can after service.
Well, trusting a model to make this life important decision would mean the model is properly validated. How well will you trust my model? Cos that’s what I’m about to do ;), build a predictive model.
I downloaded this dataset (Stutern Nigerian Graduate Report) from Kaggle but this data set was extracted from Stutern.
I first of all did some data cleaning in Excel because it was such a ‘raw data’. I eliminated 15 columns, that weren’t related to the task at hand, changed the title of the columns and made answers brief.
Then I took my data set to Jupyter Notebook.
You can find my codes HERE on Github
Next, I extracted information from the dataset, that is the column types, null values present, number of rows and columns to understand the data better.
EXPLORING THE DATASET
Most graduates for this survey graduated in 2015, 2016 and 2017. And the range of graduates’ graduating year for this survey is between 2013 to 2017. The highest number of graduates for this survey are from the year 2015; a total of 1327.
From above, we see that the values in the gender column are Male, Female and ‘Prefer not to say’. The dataset is balanced between male and female but unbalanced when we consider the ‘prefer not to say’ option. The prefer not to say option is quite low (3). And although slight, female graduates are seen to be lesser than male graduates from this survey.
Polytechnic or University Attended:
There are different polytechnics and university represented in this survey. There are a total of 158 unique schools represented here
University of Lagos has the highest number of students in this survey, followed by Covenant University. Dorben Polytechnic has the lowest number of students represented in this survey as well as Federal Polytechnic, Mubi and others. The datasets aren’t quite balanced here. Comparing the number of rows covered by Lagos State Graduate and by Dorben Polytechnic, this shows that the dataset isn’t quite representing of all schools in Nigeria.
Federal Universities and Polytechnics have the highest number of representation followed by Private Universities, then State, Foreign schools and other Nigerian Schools not mentioned by names. The number from Federal Universities is quite understandable as most Nigerian Universities and colleges are owned by the Federal Government.
Qualification before First Job:
Bachelors Degee Holders are highly represented in this survey, followed by HND, Masters, OND, MBA and PhD’s. From this survey, it is observed that most Nigerians prefer to go through Universities than Polytechnics.
Job Search Mode:
Most graduates search for jobs or hear about jobs via personal contacts from family and friends. Social media is the second major place people search for jobs or get job vacancy information from. Recruitment agencies and online job sites is the third main place people get job vacancy information from. People barely get news about job vacancies directly from The employers websites.
Course and University Type:
From this survey, a lot of Nigerians study Computer Science in school. This is an interesting insight as what makes most developed countries is the level (growth) of Science and Technology in those countries.
From this, we can see that several Nigerians attend more of the Federal Schools than any other kind.
PREPARING THE DATASETS FOR THE MODEL
This particular data set had a lot of missing values but for the data set required for the model, the missing values were replaced with the mode of the columns while some other missing values especially when very few were dropped.
I later dropped irrelevant columns. The job search mode column was filled with the mode.
Conversion from strings to numerical values were done:
SPLITTING THE DATA SET TO TRAIN SET AND TEST SET
DEVELOPING THE DECISION TREE MODEL
The decision model was developed and the accuracy for the test and train data set were displayed as well as the confusion matrix.
From the confusion matrix, we can see that 777 were rightly predicted as No (No job via NYSC) and 205 were wrongly predicted as No, when they were supposed to have been predicted as yes (Yes, Job 1 via NYSC). 24 were predicted to be yes when they were actually no and 17 were predicted correctly to be yes.
Displaying the classification report and pickling(saving) the model.
Predicting someone’s first job status via NYSC is shown below:
A prediction for a Female graduate who studied Economics in a Foreign University and whose job search mode is through Personal contacts, including family and friends.
Note that the meanings of the digits are displayed here
The predicted value (1-Yes) is that that Graduate will likely be retained, get recommendations from NYSC and her staff or anything related… but generally be offered a job via NYSC.
The features for this prediction can be expanded upon such as the state of origin of the graduates as well as the location(state) where they served, their roles during service, their age during their service year . Also using a larger data set will make the model more dependable. For better accuracy, Boosters can be used such as XGBoost, CATBoost, etc.