# Job after MBA — A Data Analysis View

## Data Description

You can find the data here. Let me explain it here for the following project to understand.

The data set consists of Placement data of students on a college campus. It includes secondary and higher secondary school percentages and specialization. It also includes degree specialization, type and Work experience and salary offers to the placed students.

The data contains the following Predictors:

1. Gender (Categorical)
2. Secondary Education Percentage (Numerical)
3. Secondary Board of Education (Categorical)
4. Higher Secondary Education Percentage (Numerical)
5. Higher Secondary Board of Education (Categorical)
6. Specialization in Higher Secondary Education (Categorical)
7. Degree Percentage (Numerical)
8. Field of degree Education (Categorical)
9. Work Experience (Categorical)
10. Employability Test Percentage (Numerical)
12. MBA percentage (Numerical)

The Response Variables are

1. Status of Placement (Categorical)
2. Salary (Numerical)

# Question

Which factors are really important in getting placed?

`install.packages("RColorBrewer")install.packages("rattle")install.packages("rpart")install.packages("rpart.plot")library(rpart)library(rpart.plot)library(RColorBrewer)library(rattle)`

`#read the datadata <- read.csv("data.csv")#changing data columns to factor and numeric using sapplydata[,c(2,4,6,7,9,10,12,14)] = sapply(data[,c(2,4,6,7,9,10,12,14)] , as.factor) #change to factordata[,c(1,3,5,8,11,13,15)] = sapply(data[,c(1,3,5,8,11,13,15)], as.numeric) #change to numeric#changing NA to 0.newdata = datanewdata[is.na(newdata)] <- 0#Dividing the data frame according to the two response variable of Salary and Job Status after eliminating the useless data of serial number.Data_status = Data[,-c(1,15)]head(Data_status)`

## Method 1 (Decision Tree)

`#Job Status Decision Treemodel_status = rpart(status ~ ., data = Data_status, method = "class")rpart.plot(model_status, box.palette="RdBu", shadow.col="gray", nn=TRUE)fancyRpartPlot(model_status) #For better visualization#analyze and fitmodel = rpart(status~., data = Data_status, method = 'class')predict_unseen = predict(model, Data_status, type="class")table_mat <- table(Data_status\$status, predict_unseen)table_mataccuracy_Test <- sum(diag(table_mat)) / sum(table_mat)print(paste('Accuracy for test', accuracy_Test))"Accuracy for test 0.893023255813953"`

You can see from here, that the important variables to get a job are

1. Secondary Percentage
2. Higher Secondary Percentage
3. Degree Percentage
4. MBA Percentage
5. Work Experience

## Method 2 (Boruta: Feature Selection by Random Forest Wrappers)

`Data[is.na(newdata)] <- 0Data_salary = Data[,-c(1,14)]boruta_salary <- Boruta(salary ~ ., data = Data_salary, doTrace = 2, maxRuns = 100)plot(boruta_salary, las = 2, cex.axis = 0.5)getSelectedAttributes(boruta_salary, withTentative = F)`

You can see from here, that the important variables to get a job are

1. Secondary Percentage
2. Higher Secondary Percentage
3. Degree Percentage
4. Gender
5. Specialization
6. MBA Percentage
7. Work Experience

Note: The prediction of salary by interpretable methods like Anova, Linear Models, and Decision was a really bad fit. Thus, I didn’t include it here to avoid confusion.

Note: I also didn’t show the train and test algorithm in the first part because I just wanted to show that it is a valid way of plausible reasoning for quicker action.