# Analyzing Los Angeles City Employee Payroll Data Using R

13 min readFeb 16, 2023

--

# Introduction

The “City Employee Payroll” dataset, updated bi-weekly by the Los Angeles City Controller’s Office, provides an in-depth analysis of the total payments made to Los Angeles City Employees. This report aims to provide valuable insights into the financial situation of each city department and its employees, helping researchers, policymakers, and anyone interested in the topic to gain a deeper understanding of the total payments made to Los Angeles City Employees.

“Payroll is the oxygen of the company. Without payroll, you cannot breathe.” — Jay Reinemann, Managing Partner of Propel Venture Partners.

Additionally, this article will serve as a comprehensive story-telling example for novice learners who are venturing into the world of R programming and exploring the functionalities of ggplot2.

# Section 1: Insights into Budget Allocation and Departmental Payouts

In this section, we will take dive into our dataset and explore various avenues to answer a variety of questions. The questions presented below serve as an example of exploratory data analysis, providing insights into the workings of Los Angeles City’s payroll system.

## Question 1: Which city departments have the highest volume of payouts in LA?

`colnames_tree <- c("Department.Title","Job.Class.Title","Total.Payments")df_tree <- df[colnames_tree]df_tree\$Department.Title <- gsub("\\s*\\(.*\\)", "", df_tree\$Department.Title)df_tree <- aggregate(df_tree\$Total.Payments, by=list(df_tree\$Department.Title), FUN=mean)colnames(df_tree)[1] = "department_title"colnames(df_tree)[2] = "avg_earning"df_tree <- df_tree[order(-df_tree\$avg_earning),] # Sort the data in descending order by valueggplot(df_tree, aes(area = avg_earning,                    fill = avg_earning,                    label = department_title)) +  geom_treemap() +  geom_treemap_text( place = "centre", reflow = TRUE,                    size = 11, grow = FALSE, color = "white") +  scale_fill_gradient(high = "#926efa", low = "#472a58", na.value = NA, space = "Lab") +   ggtitle("VOLUME OF AVERAGE PAYOUT BY CITY DEPARTMENT (IN USD)") +  labs(fill = "Average Payout \n(In USD)", subtitle = "Fig 1A") +   theme(plot.subtitle=element_text(size=9, hjust=0, face="italic", color="black")) + theme_hc()+   theme(legend.position = "bottom", legend.key.width = unit(2.8, "cm"))`

## Insight:

The treemap provides a visual representation of the city departments with the highest payroll. The departments listed, such as the Public Accountability Department, Fire Department, City Attorney Department, and Water and Power Department, are just some of the departments that have a significant payroll output. The volume of payouts in most Council districts is similar, indicating a balanced allocation of resources among the various council districts.

## Question 2: How much do departments pay their top employees?

`df_top10 <-df[colnames_tree]df_top10\$Department.Title <-gsub("\\s*\\(.*\\)", "", df_top10\$Department.Title)df_top10 <-aggregate(df_top10\$Total.Payments, by = list(df_top10\$Department.Title, df_top10\$Job.Class.Title), FUN = median)colnames(df_top10)[1] = "department_title"colnames(df_top10)[2] = "profession"colnames(df_top10)[3] = "median_earning"df_top10 <-df_top10[order(-df_top10\$median_earning), ]df_top10 <-head(df_top10, n = 10)#adjusting records in descending order for x axisdf_top10\$department_title <-factor(df_top10\$department_title, levels = c('Harbor', 'Fire', 'Police', 'Airports', 'Water And Power',                                                                         'Office of the Chief Legislative Analyst', 'Public Works - Engineering'))df_top10\$professionx <-paste0(df_top10\$profession, "\n( \$", round(df_top10\$median_earning), " )")df_top10\$professionx <-str_wrap(df_top10\$professionx, width = 20)custom_colors <- c("#926efa","#8c65e6","#855dd2","#7e54bf","#764dac","#6d459a","#643e89","#5b3778","#513068","#472a58")prof_map <- c("Chief Port Pilot II", "General Manager Harbor Department", "Port Pilot II", "Fire Chief", "Fire Helicopter Pilot V", "Chief of Police", "General Manager Airports", "General Manager & Chief Engineer Water & Power", "Chief Legislative Analyst", "City Engineer")ggplot(df_top10, aes(x = department_title, y = median_earning, fill = profession, label = profession)) +  geom_bar(position = "stack", stat = "identity") +  ggtitle("TOP 10 CITY EMPLOYEE EARNERS AND THEIR DEPARTMENT") +  scale_x_discrete(labels = function (x) str_wrap(x, width = 20)) +  ylab("Earnings (In USD)") +  xlab("Department") +  geom_text(aes(label = professionx),            size = 3,            color = "white",            position = position_stack(vjust = 0.5)) +  scale_fill_manual(name = "profession", guide = FALSE,                    values = custom_colors,                    breaks = prof_map,                    labels = prof_map) +   theme_hc() +  labs(subtitle = "Fig 1B")+   theme(plot.subtitle=element_text(size=9, hjust=0, face="italic", color="black"))`

## Insight :

Surprisingly, the top three highest-paying professions are found within the Harbour Department, despite the overall payroll budget being lower compared to other departments such as Fire, Police, and Airport. Upon further analysis, the data suggests that the payroll budget of the Harbour Department is skewed towards higher-paying positions. This finding may be explained by the ratio of jobs to the department, indicating that the Harbour Department has fewer employees relative to other departments.

## Question 3: What are the employee benefits provided by the City Council? How have they changed throughout the years?

`df_benefits <- df[c("Average.Health.Cost","Average.Dental.Cost","Average.Basic.Life","Average.Benefit.Cost","Year")]colnames(df_benefits)[1] = "Health"colnames(df_benefits)[2] = "Dental"colnames(df_benefits)[3] = "Life Insurance"colnames(df_benefits)[4] = "Total Benefit"df_benefits <- melt(df_benefits, id = 'Year')df_benefits <- aggregate(df_benefits\$value, by=list(df_benefits\$variable,df_benefits\$Year ), FUN=mean)colnames(df_benefits)[1] = "variable"colnames(df_benefits)[2] = "Year"colnames(df_benefits)[3] = "value"df_benefits\$value <- round(df_benefits\$value,2)LINES <- c("Health" = "solid","Dental" = "solid","Life Insurance" = "solid", "Total Benefit" = "dotted")ggplot(data=df_benefits, aes(x=factor(Year), y=value, group=variable, color=variable, size = 0.5, linetype = factor(variable))) +  geom_line(lwd=1) +  geom_point() +  ggtitle("AVERAGE EMPLOYEE BENEFITS PROVIDED BY CITY COUNCIL") +  ylab("Average Benefits (In USD)") +  xlab("Year") +  labs(colour="Benefit Type", subtitle = "Fig 1C") +   guides(alpha = "none", size = "none") +  scale_linetype_manual(values = LINES,                        guide = FALSE) + labs(fill = "Benefit")+  scale_color_manual(values=c("purple","#E8A9FF","#8E7FFF",'black'))+   theme_hc()+   theme(plot.subtitle=element_text(size=9, hjust=0, face="italic", color="black"))`

## Insight :

The City Council allocates a significant portion of its payroll budget towards healthcare benefits, while comparatively less is allocated to other benefits. This allocation reflects the City Council’s recognition of the high healthcare costs in California and the importance of providing adequate healthcare benefits to its employees. Compared to a developing country like India, the life insurance benefit to overall compensation ratio for city council employees in Los Angeles is low. This comparison highlights the need for the healthcare industry to improve, not only in the United States but globally.

## Question 4: How are payroll resources allocated to bonus payments? Does the City Council favor a particular type of bonus?

`bonus <- df %>%   group_by(Department.Title) %>%   summarise(Permanent_Bonus_Pay = sum(Permanent.Bonus.Pay),             Longevity_Bonus_Pay = sum(Longevity.Bonus.Pay),             Temporary_Bonus_Pay = sum(Temporary.Bonus.Pay)) colnames(bonus)[2] <- "Permanent"colnames(bonus)[3] <- "Longevity"colnames(bonus)[4] <- "Temporary"bonus_payments <- melt(bonus,id.vars = "Department.Title") %>%                  group_by(variable) %>%                  summarise(value=sum(value))bonus_payments\$proportions <- bonus_payments\$value / sum(bonus_payments\$value)colnames(bonus_payments)[colnames(bonus_payments) == "variable"] <- "Bonus_Type"colnames(bonus_payments)[colnames(bonus_payments) == "value"] <- "Amount"custom_pie <- c("purple","#E8A9FF","#8E7FFF")ggplot(bonus_payments, aes(x = "", y = Amount, fill = Bonus_Type)) +   geom_bar(stat="identity", width=1, color="white") +   coord_polar("y", start = 0) +   geom_text(aes(label = paste(round(proportions * 100), "%")), color = 'white',            position = position_stack(vjust = 0.5))+  ylab("") +  xlab("") +  labs(fill="Bonus Type", subtitle = "Fig 1D") +  ggtitle('PERCENTAGE SHARE OF BONUS TYPES')+  theme(axis.line = element_blank(),         axis.text = element_blank(),         axis.ticks = element_blank(),         panel.grid = element_blank(),         panel.border = element_blank(),        plot.title = element_text(hjust = 0.5)) +  theme_hc()+   scale_fill_manual(values=custom_pie)+   theme(plot.subtitle=element_text(size=9, hjust= 0, face="italic", color="black"))`

## Insight :

According to the data presented, a significant portion of payroll allocation is dedicated to permanent bonuses. Meanwhile, the allocation for longevity and temporary bonuses is roughly equal among city departments. While it is beneficial for employees to receive a high permanent bonus, it is also crucial for city council departments to pay attention to longevity bonuses, especially considering that the rate of attrition has increased in recent times. This could negatively impact a company’s growth if not properly addressed.

## Question 5: What percent of the employment is part-time in the top 10 part-time offering departments?

`options(dplyr.summarise.inform = FALSE)df <- read.csv("City_Employee_Payroll__2013-2018_.csv")df\$Department.Title <- gsub("\\s*\\(.*\\)", "", df\$Department.Title)df_new = df[df\$Total.Payments>=0,]a=df_new %>% group_by(Year,Department.Title,Employment.Type) %>% summarize(x=n())dept_type=a %>% group_by(Department.Title,Employment.Type) %>% summarize(x=ceiling(mean(x)))#widening for graphnew_df <- pivot_wider(dept_type, names_from = Employment.Type, values_from = x)df_new = df[df\$Total.Payments>=0,]pt_top10 <- new_df %>% arrange(desc(`Part Time`)) pt_top10 <- head(pt_top10,10)part_time_df <- pt_top10 %>%  mutate(total_count = `Part Time` +  `Full Time`,         part_time_pct = `Part Time` / total_count,         full_time_pct = `Full Time` / total_count)part_time_df <- part_time_df[order(part_time_df\$part_time_pct, decreasing = TRUE),]# convert dataframe from wide to long format for ggplot2part_time_df_long <- part_time_df %>%  pivot_longer(cols = c("part_time_pct", "full_time_pct"), names_to = "employment_type", values_to = "pct")# create stacked bar chart using ggplot2ggplot(part_time_df_long, aes(x = str_wrap(reorder(Department.Title,-pct),width=5), y = pct, fill = employment_type)) +  geom_bar(stat = "identity", position = "stack") +  labs(x = "Department", y = "Percentage", fill = custom_colors, subtitle = "Fig 1E") +  xlab("")+  ylab("Percent Shared") +  scale_y_continuous(labels = scales::percent_format()) +  ggtitle("PART TIME VS FULL TIME ROLE AVAILABILITY") +    scale_fill_manual(values = c("#7e54bf","#472a58"), name = "Employment Type", labels = c("Part-time", "Full-time"))+  theme_hc()+   theme(plot.subtitle=element_text(size=9, hjust= 0, face="italic", color="black"))`

## Insight :

A closer examination of the data reveals that while the Recreation and Parks Department might initially seem to have the largest representation of part-time positions, it is noteworthy that the City Clerk and Los Angeles Convention Center are equally competitive in this regard, both offering ample opportunities for individuals looking to pursue a part-time career path. With over 75% of part-time positions available, these departments present a viable option for job seekers seeking a balanced work-life setup.

# Section 2: Are Employees Being Paid Fairly?

It is important that employees are paid fairly as it helps to ensure that they feel valued, motivated, and engaged in their work. Fair pay can also help to attract and retain talented employees, which is crucial for the success of any organization. Additionally, paying employees fairly can help to reduce turnover, increase productivity, and promote a positive work culture. Fair pay is not only a moral and ethical responsibility of employers, but it also makes good business sense.

## Question 1: Is the yearly payroll increment enough?

`df <- data.frame(df[df['Total.Payments'] >= 32240,])colnames_med <- c("Total.Payments",'Year')df_med <- df[colnames_med]df_med <- aggregate(df_med\$Total.Payments, by=list(df_med\$Year ), FUN=median)colnames(df_med)[1] = "Year"colnames(df_med)[2] = "actual_median_income"#adding inflation rates - check referenceinflation_usd <- c(1, 1.02, 1.02, 1.03, 1.05, 1.08)buying_power <- c(1, 0.99, 0.97, 0.96, 0.94, 0.91)df_med\$inflation_adjusted <- inflation_usd * df_med[1,2]df_med\$buying_power_adjusted <- buying_power * df_med[1,2]label <- c("","","","","","Actual Income","","","","","","Inflation \n Adjusted \n Income","","","","","","Adjusted \nBuying Power")df_med_melt <- melt(df_med, id = 'Year')ggplot(data=df_med_melt, aes(x=factor(Year), y=value, group=variable, color=variable)) +  geom_line(lwd=1) +  geom_point() +  geom_vline(aes(xintercept=factor(Year), alpha = 0.8))+  ggtitle("INFLATION AND WAGE GROWTH") +  ylab("Median Income (In USD)") +  xlab("") +  labs(colour="Benefit Type", subtitle = "Fig 2A") +   guides(alpha = "none", size = "none", colour = 'none') +  scale_linetype_manual(values = LINES,                        guide = FALSE) + labs(fill = "Benefit")+  scale_color_manual(values=c("purple","#E8A9FF","#8E7FFF"))+ theme_hc() +   geom_label_repel(aes(label = label),                   nudge_x = 1,                   na.rm = TRUE)+   theme(plot.subtitle=element_text(size=9, hjust=0, face="italic", color="black"))`

## Insight:

The parallel coordinate chart in Figure 2A shows that city council employees have received substantial increases in their total salary. This is significant because it indicates that their actual income has managed to outpace inflation from the beginning. For example, if an employee started with a salary of \$100,000 in 2013, by 2018 they would need to earn \$108,000 just to keep pace with inflation. In other words, the value of \$1000 in 2013 would have decreased to approximately \$910 in the present day. However, the chart demonstrates that the salary of city council employees has grown to an extent that they have effectively “beaten” inflation and maintained their purchasing power.

## Question 2: Is the pay gap between top 10% of earners and the bottom 10% of earners getting closer?

`df_lol <- df[colnames_med]# Determine the threshold valuethreshold_high = quantile(df_lol\$Total.Payments, 0.9)threshold_low = quantile(df_lol\$Total.Payments, 0.1)# Filter the dataframe to only include rows where A exceeds the thresholdtop_10_percent <- df_lol[df_lol['Total.Payments'] >= threshold_high,]bottom_10_percent <- df_lol[df_lol['Total.Payments'] >= threshold_low,]top_10_percent<- aggregate(top_10_percent\$Total.Payments, by=list(top_10_percent\$Year ), FUN=median)colnames(top_10_percent)[1] = "Year"colnames(top_10_percent)[2] = "top_median_income"bottom_10_percent <- aggregate(bottom_10_percent\$Total.Payments, by=list(bottom_10_percent\$Year ), FUN=median)colnames(bottom_10_percent)[1] = "Year"colnames(bottom_10_percent)[2] = "bottom_median_income"consolidated_df <- merge(top_10_percent, bottom_10_percent, by = "Year")ggplot(consolidated_df) +  geom_segment( aes(x=Year, xend=Year, y=bottom_median_income, yend=top_median_income), color="grey") +  geom_point( aes(x=Year, y=bottom_median_income), color="#E8A9FF", size=4 ) +  geom_point( aes(x=Year, y=top_median_income), color="#8E7FFF", size=4 ) +  ylim(0, 250000) +   theme_hc() +  theme(legend.position = "none") +  xlab("") +  ylab("Median Income (In USD)") +  ggtitle("PAYGAP BETWEEN TOP 10% EARNERS AND BOTTOM 10% EARNERS") +  labs(subtitle = "Fig 2B") +   theme(plot.subtitle=element_text(size=9, hjust=0, face="italic", color="black"))`

## Insight :

The Lollipop chart in Figure 2B shows that the pay gap between the top 10% highest earning employees and the bottom 10% lowest earning employees has slightly improved from 2013 to 2018. It is important to note that reducing the pay gap between top and bottom earners is crucial for several reasons. Firstly, it helps to promote equality and fairness in the workplace. When the pay gap is reduced, employees are more likely to feel valued and motivated, which can improve their overall job satisfaction and performance. Secondly, reducing the pay gap can have positive impacts on the wider economy and society. When lower-earning employees have more disposable income, they are able to contribute more to the economy through spending on goods and services, which can stimulate economic growth and create jobs. Finally, reducing the pay gap is important for addressing issues of income inequality and poverty. When the pay gap is reduced, lower-earning employees are better able to support themselves and their families and have more financial security and stability. Overall, while the marginal improvement shown in the Lollipop chart is a step in the right direction, continued efforts to reduce the pay gap between top and bottom earners are important for promoting equality, fairness, and economic growth.

# Section 3: Addressing Controversies Using Statistics

This section employs statistical techniques to answer questions regarding fair pay among LA city department employees with the aim of addressing questions related to equitable compensation for employees across various departments in the city of Los Angeles.

## Question 1 — Is Temporary Bonus Pay more likely to be received by Police Officer-II employees?

`#population mean, stdpop_mean <- mean(df\$Temporary.Bonus.Pay)pop_std <- sd(df\$Temporary.Bonus.Pay)#sample mean, stddf_pol<-subset(df, Job.Class.Title=='Police Officer II')index <- sample(1:nrow(df_pol), size = 120)df_pol_sample <- df_pol[index,]sample_mean = mean(df_pol_sample\$Temporary.Bonus.Pay)sample_std = sd(df_pol_sample\$Temporary.Bonus.Pay)ts = ((sample_mean - pop_mean)*sqrt(120))/sample_stdt.test <- t.test(df\$Temporary.Bonus.Pay, mu=pop_mean, alt="greater", conf=0.95)#T-value lies in the rejection regiondata_t <- data.frame(group = c("Population", "Police Officer II"),                    mean = c(pop_mean, sample_mean),                    sd = c(pop_std, sample_std))ggplot(data_t,aes(x=group, color=group)) +  geom_boxplot(aes(lower=mean-sd,                   upper=mean+sd,                   middle=mean,                   ymin=mean-3*sd,                   ymax=mean+3*sd),               stat="identity")+   theme_hc() +  labs(colour="Concentration Group", subtitle = paste("Fig 3A | p-value = ",round(t.test\$p.value, 4))) +  ggtitle("Mean Bonus of Police Officer II Vs. Population") +  xlab("") +  ylab("Mean Bonus") +    theme(plot.subtitle=element_text(size=9, hjust=0, face="italic", color="black")) +  scale_color_manual(values=c("purple","#E8A9FF","#8E7FFF"))`

## Insight :

We conducted one sample t-test where the hypothesis is as follow:

• Null Hypothesis -> H0 : mu = pop_mean

• Alternate Hypothesis -> Ha: mu > pop_mean

• Confidence Level — 95%

According to the results of the t-test shown in Figure 3A, we observed that the sample mean is greater than the population mean, with a p-value of 0.5. However, this won’t allow us to reject the null hypothesis and conclude that Police Officer II employees are more likely to receive a Temporary Bonus compared to other employees in both the same and different departments. There isn’t enough data to support the alternate hypotheses.

## Question 2 — Was the data justified in supporting the statement that Recreation and Parks Department Employees were denied Longevity Bonus Pay in 2014?

`#Population and their mean, stdp <- subset(df,df\$Year=='2014' )pop_mean <- mean(p\$Longevity.Bonus.Pay)pop_std <- sd(p\$Longevity.Bonus.Pay)#Sample and their mean, stddataset <- subset(df,df\$Department.Title=="Recreation And Parks" & df\$Year=='2014' )sample = sample(1:nrow(dataset), size=120)longevity_s <- dataset[sample,]sample_mean <- mean(longevity_s\$Longevity.Bonus.Pay)sample_std <- sd(longevity_s\$Longevity.Bonus.Pay)# t-distributiont_dataset <- ((sample_mean-pop_mean)*sqrt(120))/sample_std#Function of t-testt.test2 <- t.test(longevity_s\$Longevity.Bonus.Pay, mu=pop_mean, alt="greater", conf=0.95)data_t <- data.frame(group = c("Population", "Recreation And Parks"),                      mean = c(pop_mean, sample_mean),                      sd = c(pop_std, sample_std))ggplot(data_t,aes(x=group, color=group)) +  geom_boxplot(aes(lower=mean-sd,                   upper=mean+sd,                   middle=mean,                   ymin=mean-3*sd,                   ymax=mean+3*sd),               stat="identity")+   theme_hc() +  labs(colour="Concentration Group", subtitle = paste("Fig 3B | p-value = ",round(t.test2\$p.value, 4))) +  ggtitle("Longevity Bonus for Population Vs. Recreation And Parks Department Employees") +  xlab("") +  ylab("Mean Longevity Bonus") +    theme(plot.subtitle=element_text(size=9, hjust=0, face="italic", color="black")) +  scale_color_manual(values=c("purple","#E8A9FF","#8E7FFF"))`

## Insight :

We conducted one sample t-test where the hypothesis is as follow:

• Null Hypothesis -> H0 : mu = pop_mean

• Alternate Hypothesis -> Ha: mu < pop_mean

• Confidence Level — 95%

The results of the t-test displayed in Figure 3B revealed that the sample mean was lower than the population mean, with a p-value of 0.99. This supports the acceptance of the null hypothesis and suggests that employees in the Recreation and Parks Department received similar Longevity Bonus Pay. Though we can still agree that, on average, Recreation and Parks Department employees are paid marginally less compared to population mean.

# Conclusion :

The data analysis of the city’s budget allocation and departmental payouts provides a comprehensive understanding of the allocation of resources within Los Angeles. Through the use of treemaps, parallel coordinate charts, and t-tests, the study sheds light on some of the key findings related to employee compensation and benefits. This analysis of budget allocation, departmental payouts, and employee compensation can be beneficial in several ways.

Firstly, it provides an in-depth understanding of the distribution of payroll among different departments and council districts. This information can be useful for stakeholders, such as taxpayers, who want to understand where their money is going and if it is being used efficiently.

Secondly, insights into employee compensation can help individuals assess the fairness of pay and determine if they are being paid equitably in comparison to their peers. The findings on the growth of salaries and reduction of the pay gap between top and bottom earners can provide a positive outlook on the current state of employee compensation and serve as a motivation for continued efforts to promote equality and fairness.

Finally, the statistical analysis on controversial topics like Temporary and Longevity 9 Bonus Pay can provide valuable insights into the allocation of bonuses and help address any discrepancies that may exist. These findings can be useful in addressing any workplace issues and promoting the equitable treatment of employees. Further analysis and efforts to address the issues highlighted in the study would lead to a more equitable and efficient allocation of resources and promote fair treatment of employees.