The pursuit of job satisfaction is an important factor that directly affects the employment goals of workers, and further the company’s performance. So when we discuss “job satisfaction”, what can we understand about the term? According to Arne KalleBerg, “A worker’s level of job satisfaction is a function of the range of specific satisfactions and dissatisfactions that he/she experiences with respect to the various dimensions of work. It is thus “the pleasurable emotional state resulting from the appraisal of one’s job as achieving or facilitating the achievement of one’s job values” (Locke, 1969).” (KalleBerg 127)
The concept of job satisfaction has generated an outpouring of research among social scientists. As discussed in the previous post, for this project, we have chosen to look at the job satisfaction levels among developers from the 2017 and 2018 StackOverflow surveys posted on Kaggle.
With these datasets, we have decided to build a descriptive regression model for the 2017 job satisfaction levels, then try use the model to predict the 2018 data and compare with the actual survey results for 2018. We started off by cleaning the dataset with Python, choosing only the variables that we would like to include in our model, such as job satisfaction, career satisfaction, salary, expected salary, number of hours looking for new opportunities per week, benefits, gender, race, etc. Furthermore, we have rescaled the CareerSatisfaction and JobSatisfaction for the 2017 data so it would match the scale of the 2018 data. For instance, a JobSatisfaction level of 10 or 9 would become 7, that of 7 or 7 would become 6, so on and so forth.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as npdat = pd.read_csv("survey_results_public.csv")
use_data = dat[['JobSatisfaction', 'CareerSatisfaction', 'LastNewJob', 'Salary', 'ExpectedSalary', 'HoursPerWeek', 'ImportantBenefits', 'Gender', 'Race']].copy()#how to scale the 2 satisfaction ratings
conditions = [(use_data['JobSatisfaction'] == 10) |
(use_data['JobSatisfaction'] == 9),
(use_data['JobSatisfaction'] == 8) |
(use_data['JobSatisfaction'] == 7),
(use_data['JobSatisfaction'] == 6),
(use_data['JobSatisfaction'] == 5),
(use_data['JobSatisfaction'] == 4),
(use_data['JobSatisfaction'] == 3) |
(use_data['JobSatisfaction'] == 2),
(use_data['JobSatisfaction'] == 1) |
(use_data['JobSatisfaction'] == 0)]use_data['CL_JobSatisfaction'] = np.select(conditions, [7,6,5,4,3,2,1], default= use_data['JobSatisfaction'])
Next, we has generated some visualizations to get an idea of what the relationships between the explanatory variables and the dependent variables would look like. Using Python, we tried to display the number of developers reporting each job satisfaction level for different salary categories.
From the graph, we have noticed that overally, people are satisfied with their jobs regardless of their salary levels. In addition, the job satisfaction level distribution seems to be similar between different salary ranges. However, the number of people reporting to be satisfied or very satisfied with their job is the highest for the salary range of $30,000 — $60,000, which is the lowest salary level. Is it the case that high salary would not necessarily make developers more satisfied with their job? Keep an eye on our next posts to find out the answers!