Stability, Socioeconomics, and the Role of Virtual Learning in South Carolina Public High Schools

Applying SQL and Tableau to Analyze Academic Trends Following COVID-19

28 min readJun 12, 2024

The public education system is one the oldest and most essential pillars of the United States’ infrastructure. Where implemented effectively, it can have a long lasting impact on a student’s success, health, and the influence they have in their community.¹ In many cases it’s the cornerstone of an individual’s societal progress. Where young men and women begin to discover their identities, collaborating with educators, coaches, and friends to assemble the building blocks for their future. And until 2019, this framework was largely unaltered.

The COVID-19 pandemic, brought unprecedented disruptions to various aspects of society, including academia. In response to this public health crisis, schools across the globe were compelled to adapt rapidly, implementing remote learning strategies and employing new curriculums. Given the modern world had never experienced an international event to this scale, understanding the impact these adaptations had on students is critical for us to understand if we want to move forward, better.

Objectives

For my first data analysis project, I chose to explore a subject that is important to me: the state of education in South Carolina following the COVID-19 pandemic. To assess its impact, I will be addressing the following questions:

What impact did COVID-19 have on traditional public and charter high school graduation rates?
What impact does a school’s poverty level have on their graduation rate and how did COVID-19 influence this relationship?
What was the impact of a school’s student-teacher ratio on their graduation rate?
Which schools demonstrated the greatest improvement following COVID-19? Conversely, which schools experienced the most significant setbacks?

To accomplish this project, I used the SQL platform, Google BigQuery for dataset management, querying, and analysis, and Tableau for analysis and visualization.

Data Collection

First party data was collected from the South Carolina Department of Education’s website. It was determined that nominal data such as school name and their respective unique identifier would be needed as well as geographical information. This unique identifier or ID, would be the primary key and relate each dataset to one another. Additionally, quantitative metrics like enrollment, number of teachers, poverty index, and graduation rate were the key figures to answer the stated questions. This study decided to analyze data from school years ending in 2017, 2018, 2020, 2022, and 2023. This will provide data preceding COVID-19, during its peak, and the subsequent years up until now. From each year, three different datasets published by the Department of Education will be utilized for analysis (report card, poverty index, graduation rate). In total, 15 different datasets will be used. These are housed in GitHub and can be found here.

Throughout the study period, many of the academic metrics varied except for graduation rate, which was consistently recorded every year. The reliability of this data point is why it was chosen as the measure of student performance. The reported poverty index is a school-level measurement designated by the percentage of pupils-in-poverty (PIP). PIP is defined with the help of several data sources and is classified as a family with any of the following criteria:

Served through Medicaid within the last three years
Served by the Department of Social Services (SNAP, TANF, or Foster) within the last three years
PowerSchool data relating to current year to include homeless, foster, and migrant status
Direct certification and direct certification by extension (sibling match in the same household)²

A school where 68% of students fulfilled one of more of these criteria would have a poverty index of 68.

The strategy for this project will be to process, clean, and perform initial analysis on each year independently, then JOIN all tables at the end for a comparative analysis in BigQuery and Tableau.

Data Processing and Cleaning

Data Processing:

All raw tables were stored as a .xlsx file, and converted to a .csv to be compatible with BigQuery. These raw .csv files were then uploaded to BigQuery and housed in datasets broken down by year. A new, empty table was created that contained the necessary fields for analysis. During this table creation, data types were assigned to each field. The query shown in Figure 1.01 was the template used for 2018 then modified for other years.

CREATE TABLE `scedu-418601.2022.combined2018` (

  reportcardyear int64,
  distname string, 
  schoolname string,
  id int64,
  type string, 
  street string,
  city string,
  state string, 
  zipcode int64, 
  enrollment int64, 
  teachernum int64
  )

#Figure 1.01: Creation of an empty table with field names

These fields were then populated by migrating data from the raw, ‘report card’ table and into the combined table, as seen in Figure 1.02. This combined table was where all raw data from a given year’s multiple datasets would be collected.

INSERT INTO `scedu-418601.2022.combined2018`

SELECT
  ReportCardYear, 
  DistrictNm, 
  SchoolNm, 
  SCHOOLID, 
  Report_Card_Type,
  street, 
  city, 
  state,
  zip,
  enrollment,
  TeacherCount

FROM `scedu-418601.2018.raw-2018-reportcard`


#Figure 1.02: Raw data fields being migrated to the combined2018 table

The raw ‘report card’ table contained all necessary fields except for graduation rate and poverty index. These fields were added to the combined table through the ALTER TABLE and ADD COLUMN command, then updated using the query in Figure 1.03.

UPDATE `scedu-418601.2022.combined2018` as c

SET c.grad_rate = g.GPERCENT_OVERALL

FROM `scedu-418601.2018.raw2018graduation` as g

WHERE c.ID = g.BEDSCODE


#Figure 1.03: Migration of graduation rate and poverty index into combined2018 table

This same workflow was applied to poverty index and for all subsequent years.

Prior to data cleaning, all tables were reviewed to ensure they were structurally consistent. During this exercise, it was found that the city, state, and zip code fields from 2017 were held in a single column named citystzi. Though this will not directly impact any analysis, aligning its format with tables from all other years, where geographical data is held in individual city, state, and zip code columns, is preferred. To accomplish this, the citystzi column was SPLIT into an array and delimited by a comma. The two arrays, [0] and [1], were city and stzip, respectively. These were queried along with the rest of the table and saved as a new table, as seen in Figure 1.04. This same strategy was to be implemented for the SPLIT of stzip to state and zip.

SELECT 
  reportcardyear, 
  distname, 
  schoolname, 
  id, 
  type,
  street, 
  split(citystzi, ', ')[offset(0)] as city,
  split(citystzi, ", ") [offset(1)] as stzip, 
  enrollment, 
  teachernum,
  grad_rate,
  pov_index

FROM `scedu-418601.2017.combined2017_copy`


#Figure 1.04: Splitting of the field citystzp to facilitate indexing city and stzip into seperate columns

A visual representation on what is occurring when splitting strings can be seen in Figure 1.04. In Row 1, original_string for Lee Central High School was broken into substrings that can be individually indexed.

Figure 1.05: Splitting and indexing of a string via the SPLIT function

While attempting to SPLIT these fields further, an inconsistency was found in the stzip column. Here, South Carolina was primarily listed as ‘SC’ but had several instances of its long form, ‘South Carolina’. A query to identify the frequency of this occurrence revealed 11 instances of ‘South Carolina’. These were updated utilizing the CASE function when ‘South Carolina’ was present, ELSE the field was left the same. Seen in Figure 1.06.

SELECT 
  reportcardyear, 
  distname, 
  schoolname, 
  id, 
  type,
  street,
  city,
  CASE
    when stzip like '%South Carolina%' then 'SC'
    else stzip
    end as statezip,
  enrollment, 
  teachernum,
  grad_rate,
  pov_index

FROM `scedu-418601.2017.combined2017_city_stzp`


#Figure 1.06: Replacement of ‘South Carolina’ with ‘SC’ using CASE

Indexing the state and zip in one query like Figure 1.04 resulted in an error because several of the statezip values were just ‘SC’. This left no delimiter to separate those values with. To work around this, a SUBQUERY was utilized to isolate statezip fields containing zip code, then separated them with the SPLIT clause. The filter, ’%2%’ was utilized because all South Carolina zip codes have a ‘2’ present. This query can be seen in Figure 1.07.

SELECT
  *,
  split(statezip, ' ')[offset(0)] as state,
  split(statezip, ' ')[offset(1)] as zipcode


FROM 
  (
    SELECT *
    FROM `scedu-418601.2017.combined2017_city_stzp_recent`
    WHERE statezip like '%2%' 
  )


#Figure 1.07: Splitting of the statezip string to state and zipcode

These query results were saved as a temporary table and used to update the state and zipcode fields of the primary, 2017 table where the primary key, ID, matched. The state and zip fields of unmatched records remained null.

After processing, it was confirmed that all tables were structurally aligned with the same fields. The names of these tables are:

combined2017, combined2018, combined2020, combined2022, combined2023

The fields at this point in the analysis are:

year, distname, schoolname, id, type, street, city, state, zipcode, enrollment, teachernum, gradrate, povindex

With their introduction, a brief explanation of terms is provided below. The field, ‘year’ denotes the graduation year, ‘distname’ refers to the district where that school resides, ‘schoolname’ is the school’s official name, ‘id’ the primary key for this project and is the school’s unique identifier, and ‘type’ is the classification level of school. All schools in this study are high schools and classified by an “H”. The geographical information of ‘street’, ‘city’, ‘state’, and ‘zipcode’ were also included. Lastly, the term ‘enrollment’ refers to the student population of the school, ‘teachernum’ is the total number of teachers on staff, ‘gradrate’ is the number of graduating seniors that year, and ‘povindex’ is the poverty index of that school. As stated in the Data Collection section, a high poverty index corresponds to a high poverty level in that school’s community.

All of this data came directly from the South Carolina Department of Education’s website. Later in this project, charter and public schools will be notated and student-teacher ratio will be calculated.

Data Cleaning

A systematic process was developed and applied to each table to ensure data was accurately and reproducibly cleaned. This process is as follows:

Copy the dataset
Delete rows with null or zero values in graduation rate
Isolate high schools
Check for duplicate entries utilizing DISTINCT
Confirm all necessary fields are populated
Confirm consistent formatting and data types
Trim extra spaces
Validate accuracy of information against raw data

This process was first applied to the 2017 data then completed chronologically. To illustrate the steps above, this study will highlight the 2020 table.

A backup copy was made of the table in BigQuery.
Schools with a zero or an unreported graduation rate will be omitted from this analysis as these are deemed outliers due to faulty reporting. The query seen in Figure 1.08 was used to facilitate this.

DELETE FROM `scedu-418601.2020.combined2020`

WHERE gradrate is null or gradrate = 0

#Figure 1.08: Deleting of null and zero values from the table

Many of the tables still contained data from middle and elementary schools, this was denoted by type M and E, respectively in the type field. High schools were isolated though filtering by type = ‘H’ with the WHERE clause, and saving this query as a new table.
To determine if duplicate records were present for a given school, a query was used to compare the total COUNT of IDs and the COUNT of DISTINCT IDs. If these two numbers were the same, one could confirm that there were no duplicate records present in the table.
Through visual inspection and querying for null values this study confirmed that all fields necessary for analysis were not null. When confirming nominal data fields were correctly populated, it was found that in 2020 forward, several of the records in distname, schoolname, street, and city had strings spread across separate lines with no space between characters. The absence of a uniform string and space between separate lines, prevents us from splitting and indexing the string like we did previously in Figure 1.04. These abnormal strings can be seen in Figure 1.09.

Figure 1.09: Sample of the 2020 table, showing strings spread across multiple lines in the same cell

To rectify this, the correctly formatted, 2018 nominal data will be JOINED with the 2020 year specific, quantitative data. Specifically, RIGHT JOIN was used to retain all records that were present in 2020 but not in 2018. For the records that were present in both tables, COALESCE was used to populate the field with 2018 data, if available. Otherwise, it was filled with the unformatted 2020 data. This merging left several records that were not present in 2018 incorrectly formatted; these were updated manually. This query can be seen in Figure 1.10 and the results were saved as a new table.

SELECT
  t.reportcardyear,
  coalesce(e.distname, t.distname) as distname,
  coalesce(e.schoolname, t.schoolname) as schoolname,
  t.id,
  t.type,
  coalesce(e.street, t.street) as street,
  coalesce(e.city, t.city) as city,
  t.state,
  t.zipcode,
  t.enrollment,
  t.teachernum,
  t.gradrate,
  t.povindex

FROM `scedu-418601.2018.cleaned_combined2018` as e

RIGHT JOIN `scedu-418601.2020.combined2020_step5_comp` as t

ON e.id = t.id


#Figure 1.10: RIGHT JOIN clause, merging 2018 nominal data to 2020 quantitative data

This cadence was followed for future tables, using RIGHT JOIN to merge a target year’s quantitative data with the previous year’s nominal data.

Through visual inspection, consistent formatting was confirmed across all fields. The data types were also established and confirmed during initial data migration, seen in Figure 1.01.
Once all table manipulation had been completed, spaces were trimmed from strings, and the absence of any trailing spaces was confirmed in numerical fields. Trimming of strings can be seen in Figure 1.11. The result of the query was saved as a new table.

SELECT
  reportcardyear,
  trim(distname) as distname,
  trim(schoolname) as schoolname,
  id,
  trim(type) as type,
  trim(street) as street,
  trim(city) as city,
  trim(state) as state,
  trim(zipcode) as zipcode,
  enrollment,
  teachernum,
  gradrate,
  povindex

FROM `scedu-418601.2020.combined2020_step6_comp`


#Figure 1.11: Cleaning of strings for the 2020 table

The following query, as seen in Figure 1.12 was used to confirm if there were any spaces present in any numerical fields. In a SUBQUERY, this study CAST the numerical value as a string, then in the outer query, counted the number of fields where a space was present. This was executed several times for each table, inspecting each of the numerical fields independently (ID, enrollment, teachernum, gradrate, and povindex). Throughout the entire dataset it was confirmed that no spaces were present. The accuracy of this query was confirmed by testing it on a field with a known space.

SELECT
  count(id)

FROM 
  ( 
   SELECT cast(id as string) as id
   FROM `scedu-418601.2020.combined2020_strings_trimmed`
  )

WHERE id like '% %'


#Figure 1.12: Query to confirm presence of trailing spaces in numerical fields

This cleaned data was cross-verified with the raw data for three schools to ensure accuracy.

Merging Tables, Aligning Unique Identifiers, and Recleaning

With the data cleaned, this study’s strategy was to work backward from 2023 and JOIN each school’s yearly, quantitative data into one combined table. When this JOIN was performed between 2023 and 2022, 15 records were left unmatched. This was surprising as one would expect the number of schools to be generally consistent year to year. Following an investigation of the data, it was determined that these unmatched IDs were slight variations of other IDs with the same school name. Many of these being charter schools. The field, ‘id’ was established as the primary key in the beginning of this project and is now found to be mostly reliable, but not universally. To ensure there was no error made through the cleaning process, these differing IDs were verified against raw data and confirmed to be correct.

The approach to resolving this problem was to first build a principal table where primary keys matched between years. This principal table will be in a wide format and built on 2023 nominal data as this is assumed to be the most accurate, then each preceding year’s quantitative data will be JOINED to it. The unmatched keys are then isolated in another query that will be appended to the principal table. This will create some duplicate entries for a given school that will be consolidated once all years are complete.

The following three steps were repeated for all years. First, each preceding year will be linked via LEFT JOIN to the principal table where their IDs match. The result of the query was saved as its own table. This excludes any record where the ID of the master table does not match the preceding years table. The query to facilitate this can be seen in Figure 1.13.

SELECT 
  three.distname,
  three.schoolname,
  three.id,
  three.type,
  three.street,
  three.city,
  three.state,
  three.zipcode, 
  three.enrollment as enrollment_2023,
  three.teachernum as teachernum_2023,
  three.gradrate as gradrate_2023,
  three.povindex as povindex_2023,
  two.enrollment as enrollment_2022,
  two.teachernum as teachernum_2022,
  two.gradrate as gradrate_2022,
  two.povindex as povindex_2022


FROM `scedu-418601.2023.cleaned_combined2023` as three 

LEFT JOIN `scedu-418601.2022.cleaned_combined2022` as two 

ON three.id = two.id


#Figure 1.13: LEFT JOIN to build the principal table of correctly formatted nominal data with year specific quantitative data

Second, the excluded records from this LEFT JOIN, unique to the preceding years data will be isolated. This will be facilitated through a RIGHT JOIN of the master table WHERE the master tables ID is null. I refer to this type of JOIN as an OUTER RIGHT JOIN. The result of this query was also saved as its own table. The query can be seen in Figure 1.14.

SELECT 
  two.reportcardyear,
  two.distname,
  two.schoolname,
  two.id,
  two.type,
  two.street,
  two.city,
  two.state,
  two.zipcode, 
  two.enrollment as enrollment_2022,
  two.teachernum as teachernum_2022,
  two.gradrate as gradrate_2022,
  two.povindex as povindex_2022

FROM `scedu-418601.2023.cleaned_combined2023_copy` as three
 
RIGHT JOIN `scedu-418601.2022.cleaned_ combined2022` as two 

ON three.id = two.id

WHERE three.id is null


#Figure 1.14: OUTER RIGHT JOIN to isolate remaining records

The JOINED relationships can be better visualized through the Figure 1.15.

Figure 1.15: LEFT JOIN, Figure 1.13 (left) and OUTER RIGHT JOIN, Figure 1.14 (right)³

Lastly, UNION ALL was used to combine these two tables, including all null values. This workflow as continued by adding the 2020, 2018, and 2017 tables to the principal table.

After compiling all data into a wide format, there was a total of 291 schools. Several of these are the same school but with differing IDs from year to year. The dataset was queried to determine the count of duplicate school names. This resulted in the 22 school with two or three instances in the table.

The following was the process for cleaning and consolidating these duplicate schools:

Update a seperate query that depict duplicate school names.
Isolate individual schools through filtering with WHERE. A sample of this result can be seen in Figure 1.16.
Manually consolidate older data into the more recent record. A query based off the sample table in Figure 1.16 can be seen in Figure 1.17.
Confirm that the old ID was not used for another school.
Delete the record with the old ID.

Figure 1.16: 2020 and 2018 data from the same school documented under different IDs

UPDATE `scedu-418601.2022.all_years.hs_consolidation`

SET
  enrollment_2018 = 665
  teachernum_2018 = 48

WHERE id = 3809024


#Figure 1.17: Updating and consolidating numerical data

Through the clause ORDER BY, school names and streets were organized alphabetically. From this, six other instances of duplicate entries of the same school were found, but with varying spelling or abbreviations. These were corrected in the same manner as above.

The final number of South Carolina high schools that tracked senior graduation and existed between 2018 and 2023 is 257.

Data Analysis

To clarify this study’s goals again, the initial questions are:

What impact did COVID-19 have on traditional public and charter high school graduation rates?
What impact does a school’s poverty level have on their graduation rate and how did COVID-19 influence this relationship?
What was the impact of a school’s student-teacher ratio on their graduation rate?
Which schools demonstrated the greatest improvement following COVID-19? Conversely, which schools experienced the most significant setbacks?

In order to draw a distinction between charter and traditional high schools, a field was created in BigQuery to denote their respective classification as ‘C’ and ‘P’. The student-teacher ratio for each school was calculated and also included in the dataset.

In the early stages of analysis, this study encountered the problem of not being able to categorize data chronologically. This was the result of the dataset being in a wide format with each individual school having one row. To categorize by year, the wide data would need to be TRANSPOSED into long. This was accomplished by first transposing each year’s data by itself then appending them to each other using UNION ALL to create a long table. The query used to accomplish this can be seen in Figure 2.01.

SELECT distname, county, schoolname, id, type, street, city, state, zipcode, '2023' as year, 
enrollment_2023 as enrollment, teachernum_2023 as teachernum, gradrate_2023 as gradrate, povindex_2023 as povindex
FROM `scedu-418601.final_dataset.student teacher ratio included` 

UNION ALL

SELECT distname, county, schoolname, id, type, street, city, state, zipcode, '2022' as year, 
enrollment_2022 as enrollment, teachernum_2022 as teachernum, gradrate_2022 as gradrate, povindex_2022 as povindex
FROM `scedu-418601.final_dataset.student_teacher_ratio_included` 

UNION ALL

SELECT distname, county, schoolname, id, type, street, city, state, zipcode, '2020' as year, 
enrollment_2020 as enrollment, teachernum_2020 as teachernum, gradrate_2020 as gradrate, povindex_2020 as povindex
FROM `scedu-418601.final_dataset.student_teacher_ratio_included` 

UNION ALL

SELECT distname, county, schoolname, id, type, street, city, state, zipcode, '2018' as year, 
enrollment_2018 as enrollment, teachernum_2018 as teachernum, gradrate_2018 as gradrate, povindex_2018 as povindex
FROM `scedu-418601.final_dataset.student_teacher_ratio_included` 

UNION ALL

SELECT distname, county, schoolname, id, type, street, city, state, zipcode, '2017' as year, 
enrollment_2017 as enrollment, teachernum_2017 as teachernum, gradrate_2017 as gradrate, povindex_2017 as povindex
FROM `scedu-418601.final_dataset.student_teacher_ratio_included` 


#Figure 2.01: Transposition of wide data into long

Prior to investigating any relationships between variables, this study wanted to establish a general understanding of the graduation trends present since the beginning of our dataset. From Figure 2.02, it can be seen that while traditional public high school graduation rates remained consistent over the sampling period, charter schools have been improving year over year since 2017 until 2022. An 11% increase in charter school’s overall graduation rate was observed during this time.

Figure 2.02: South Carolina, public and charter high school graduation rates between 2017 and 2023

With a better understanding of overall graduation trends, this study wanted to investigate the factors that impacted this behavior, the first of these being a school’s poverty index. To determine the degree which graduation rate and poverty index were related, the Pearson coefficient of correlation (r) was calculated utilizing the CORR function in BigQuery. The Pearson coefficient is an analytical tool that measures the strength and direction of the relationship between two continuous variables. It ranges from 1 to -1 where 1 indicates a perfect positive linear relationship and -1 indicates a perfect negative linear relationship.⁴ Meaning that the closer the value is to 1 or -1, the stronger the two variables relationship.

The coefficient for these variables across the whole sample, can be seen below. This value represents a moderate negative correlation between the two variables.

This negative or inverse relationship between graduation rate and poverty index is expanded on in Figure 2.03. Here, the data was aggregated by pre-COVID (2017 and 2018) and post-COVID (2022 and 2023) values then further segmented by a school’s classification as public or charter. From this figure, we can see that both public and charter schools exhibited higher graduation rates in communities where the poverty index was lower. Put differently, students from more affluent communities had a higher likelihood of graduating high school. Comparing these two tables, one can observe a much more dispersed distribution of public school graduation rates post-COVID, while the overall relationship to poverty index remained consistent. This can be contrasted with a general increase in graduation rates observed from charter schools. This aligns with the data presented in Figure 2.02. It should be noted that for the charter school cohort, the number of schools increased from 30 to 42 between these time periods.

Figure 2.03: Relationship between graduation rate and poverty index, preceding and following COVID-19

To gain a better idea of the impact COVID-19 had on this data, the change in each school’s graduation rate from 2020 to 2023 was plotted against its poverty level from that same period. In Figure 2.04, one can see that as poverty increases, the variance in graduation rates increase as well. An analysis of those which experienced the greatest changes in graduation rate, will be discussed later in the study.

Figure 2.04: Change in graduation rates since 2020, relative to their poverty index

Next, this study wanted to look at the relationship between student-teacher ratio and a school’s graduation rate. As with poverty index, the degree of correlation between student-teacher ratio and graduation rate was calculated and can be seen below. Due to the variation in these figures, pre-COVID, post-COVID, public, and charter schools were segmented independently. Though none of these ratios are exceptionally strong, it should be noted that prior to COVID-19, there was a positive relationship between a school’s student-teacher ratio and their graduation rate. This counterintuitive figure indicates that pre-COVID, South Carolina public high schools with a higher ratio of students to teachers would also exhibit higher graduation rates. Post-COVID, this relationship did shift to where there was almost no correlation between these variables.

These relationships are represented graphically in Figure 2.05. South Carolina Virtual Charter School and Cyber Academy of South Carolina were omitted from pre-COVID data due to their extraordinarily high student-teacher ratio that distorted the rest of the data, 173.4 and 134.5 respectively. Their ratio fell within a normal distribution post-COVID, and were included in the visualization.

Figure 2.05: Relationship between graduation rate and student teacher ratio, before and after COVID-1

The aggregate numbers for pre-COVID (2017 & 2018) and post-COVID (2022 & 2023) statewide student enrollment and teacher count, broken down by type of school can be found in Figure 2.05.a. This data was extrapolated to calculate student teacher ratio and can be found in Figure 2.05.b. From these tables one can observe a considerable, 56% increase in charter school enrollment following COVID-19. This was accompanied by the addition of 70% more teachers resulting in an improvement in their overall student teacher ratio.

Figure 2.05.a: Change in enrollment and teacher count

Figure 2.05.b: Change in student-teacher ratio

To recognize the underlying factors that drove some schools to excel, while others digressed in the years following COVID-19, this study will look at the most extreme cases for both of these scenarios. From Figure 2.06, it can be seen that the graduation rates from Odyssey Online and Greg Mathis Charter High School improved almost 40% from 2020. South Carolina Virtual Charter School and Great Falls High School had the next best improvements with 23.7% and 19.7%, respectively. It should be noted that from this group, the only traditional public high school is Great Falls High School. Now looking at the other end of the spectrum. Allendale-Fairfax High and Ridgeland-Hardeeville High School experienced the greatest drop in graduation rates since 2020, with the number of successfully graduating seniors falling over 20% in both cases. Following these two were Mullins High School and Swansea High School, where graduation rates that fell by 14.3% and 12.3%.

Figure 2.06: Schools impacted the most by remote learning

Conclusion

The COVID-19 pandemic destabilized the lives of everyone on a global scale, but no two people were impacted the same. Some presented little to no symptoms while others spent weeks in the hospital. Regardless of their own condition, the world, and how they interacted with it changed dramatically. From this study, we can see that academic institutions, much like people, were each affected in their own, unique way. Many South Carolina public schools, especially those in affluent communities, experienced little disruption in the years following 2020. Meanwhile, some leveraged the advances in remote learning to improve the quality of the education they provided and saw their graduation rates grow. Others, however, adjusted poorly. This was indicated by falling rates of graduating seniors, and in extreme cases the closure of two schools, Phoenix Charter High School and Richland 2 Charter High School. These schools, like some of those who contracted the virus, experienced a fatal encounter that left a gap in their community.

From Figure 2.02, one can see that the average graduation rate for public schools remained constant while charter schools experienced improved graduation rates in the years preceding COVID-19, then continued this trend until 2022 where they were roughly equivalent with their public school counterpart. To contextualize this, two factors were analyzed in regard to their impact on graduation rate: poverty index and student teacher ratio.

Shared by both public and charter schools, an inverse relationship between a school’s poverty index and their graduation rate can observed in Figure 2.03. With a Pearson coefficient of -0.412 we can conclude that statistically, these two variables have a fairly strong degree of correlation. Though the pre-COVID and post-COVID data initially appear very similar, upon closer inspection, the distribution of both school types is markedly different. Post-COVID, the distribution of public schools can be seen to be more disperse across the range of graduation rates, while charter schools consolidated greatly above 90% in comparison to their pre-COVID values. This can be visualized more clearly in Figure 3.01, found in the appendix. The behavior of public schools during this time period can be seen to be highly diverse, with some schools maintaining graduation rates while others greatly improved or declined. But for charter schools, what could cause such a dramatic shift in performance over just a three-year period? Contributing in part to this were the numerous virtual high schools that thrived thanks to their existing infrastructure. Equipped with an established online curriculum, these schools transitioned effectively to widespread remote learning, providing students with flexible schedules and personalized coursework.

While socioeconomics is a good indicator of academic achievement, within this data, another trend arose. Seen in Figure 2.04, the variability in graduation following COVID-19 was significantly lower for schools with a lower poverty index. This observation, that affluent communities facilitated stable academic results in the midst of uncertainty, is unsurprising. But as you move higher in the spectrum, the observed change becomes much more dispersed, in both a positive and negative fashion. It’s clear that while there is a correlation between economics and performance, it is far from the rule. A stronger conclusion to be drawn from this data may be that of stability. Economically advantaged communities don’t always have the best performance, but throughout COVID-19 they did foster a more stable environment which produced more stable results.

Along with poverty index, this study analyzed the impact student-teacher ratio had student success. From the Pearson coefficient, it was determined that charter schools demonstrate an inverse, moderate relationship between the number of successfully graduating seniors and the ratio they hold with their teachers. This behavior is to be expected. As the ratio of students to teachers becomes more favorable, classroom size will diminish, attention becomes more personalized, and the success of students should scale appropriately. But with public schools, an inverse pattern arises. Here, an inconsistent coefficient was calculated; pre-COVID there was a positive correlation between student-teacher ratio while post-COVID these two were not correlated at all, 0.354 and -0.061, respectively. This suggests that prior to 2020, in the state of South Carolina, public schools with the most favorable student-teacher ratios also exhibited the lowest graduation rates; a finding that seems to challenge prevailing narratives in education.

Through further analysis and seen in Figure 3.02 in the appendix, it was identified that many public schools with the lowest student-teacher ratios also were the most disadvantaged economically. And as determined earlier in this discussion, these economically disadvantaged schools are often the most susceptible to low graduation rates. From this, we can conclude that the state of South Carolina has asymmetrically allocated teacher personnel to help support chronically underperforming, economically disadvantaged schools. This initiative helps explain the pre-COVID, positive correlation between graduation rates and student teacher ratio and is an inspiring deviation from historical norms. One where classrooms in low-income communities often suffered from the overwhelming numbers of students.⁵ While at face value this appears to be a favorable statistic, unfortunately it’s at these schools where the highest rates of teacher turnover are reported. In fact, poverty level was found to be the strongest corollary to teacher retention of all school-level factors.⁶ Quantity is being utilized as a remedy for quality as many of these instructors are early in their career. And where instability is observed in graduation rate, the product of education, instability can also be found in the educators, the infrastructure that student success is built on.

Following COVID-19, we can see in Figure 2.05.a, that there was a departure of almost 500 public school teachers while the number of students increased by over 9,300. This resulted in the overall student-teacher ratio to increase by 9% while the public school graduation rate remained consistent over this time. From this, it can be concluded that student-teacher ratio at public schools in South Carolina has very little impact on graduation rates. Charter schools on the other hand exhibited a moderate correlation between student-teacher ratio and graduation rate, pre-COVID. Post-COVID, this relationship strengthened while adding 56% more students, 70% more teachers, and seeing graduation rates continue to rise. Given this growth compared to the stagnation of their public school counterpart, one can see a trend of students and staff migrating to charter schools, specifically virtual charter schools, during the pandemic. This was influenced in part by concerns surrounding virus transmission in traditional schools, prompted families to turn to virtual options for a safer learning environment. These schools also provided access to innovative teaching methods, eliminated geographic barriers, and provided stability in uncertain times.

Lastly, this study examined the distinguishing characteristics of schools that experienced the most significant improvement and regression in the wake of COVID-19. Many of the conclusions drawn by this study are further illustrated by these extreme cases. In the top four schools, three were charter schools with two of these, South Carolina Virtual Charter School and Odyssey Online Learning, being virtual programs. In the years following the pandemic, these schools saw both graduation rates and student enrollment drastically climb. While their success should not be discounted, the context of their initial academic standing is important to understand. The virtual schools above and the other charter school, Greg Mathis Charter High School comprised three of the bottom five worst performing schools in the state of South Carolina, in 2020. Beside stagnation, progress upwards was the only avenue for these institutions. But they did, to a tremendous degree. Greg Mathis Charter High School, only graduated a quarter of their students in 2020. Despite having the highest poverty index in the state, they improved the number of graduating seniors to almost 70%. Though this is still quite far off from the state average in 2023 of 85%, given their socioeconomic status, this is a monumental step in the right direction.

Greg Mathis may have been the poorest institution in South Carolina, but economic insecurity is a through line for all schools in this sub-analysis. This includes the only public school with a substantial positive change during this period, Great Falls High School. Ranking 54th in the state’s poverty index, Great Falls improved from 76.7% of seniors successfully receiving diplomas in the beginning of COVID-19, to 96.4%, three years later. In this time, their student teacher ratio increased as well. In fact, an increase in the number of students to faculty was observed in all eight of these schools, except the South Carolina Virtual Charter School. These findings further support that student-teacher ratio itself does not directly promote academic achievement. But then what does?

Looking now at the high schools who regressed the most following COVID-19. In 2020, these schools had a near average graduation rate, favorable student-teacher ratios less than the state average of 17:1, and reported poverty indexes in the top 30%. Geographically, they are all located in rural communities and besides for Swansea High School, are centralized to the low country of South Carolina.

They also reported a similar perception of school climate. This metric, which began recording after 2020, is an aggregate measure of the instructional focus, safety, and environment of the school, taken from a sample of students and teachers.⁷ On a scale of 0.0 to 5.0, with 5.0 being excellent, these schools averaged 2.5 and is considered below average. Contrast this with the four schools on the other end of the spectrum, collectively reporting a 3.8 and is considered above average. Self-reported data like this is incredibly vulnerable to manipulation, but these figures still paint an important picture of how students see their school and in turn, their level of respect for it. Characteristics like the disrepair of facilities or a lack of safety can be signals to the general population of the school’s quality, or lack thereof. Left unchecked, disorder like this leads to greater disorder and can permeate into classrooms and how students approach their coursework. In some situations, student success and graduation rate can become a product of the environment instead of the quality of the education itself.

In conclusion, the COVID-19 pandemic significantly disrupted academic institutions, revealing stark disparities in how schools managed the crisis. In South Carolina, affluent schools experienced little disruption in graduation rates due to their robust infrastructure and stable support systems. Conversely, high poverty index schools faced substantial challenges, including higher teacher turnover and lack of support, which correlated with declining graduation rates. Charter schools, especially those with established virtual programs, demonstrated remarkable resilience and growth, capitalizing on their pre-existing remote learning frameworks. These schools saw a consolidation of graduation rates above 90%, contrasting with the more dispersed outcomes of public schools. This analysis highlights the lack of impact that student-teacher ratio has on graduation rates. Instead, it is the support structures within a community that plays the most critical role in facilitating academic success. Ultimately, this study underscores the importance of stability in education, particularly during times of crisis, and suggests that targeted efforts to bolster teacher longevity and student perception at economically disadvantaged schools could improve future outcomes.

Limitations

Graduation rate was analyzed because of its consistency between datasets. While beneficial, it only provides a narrow view of education. One that focuses on completion without considering educational quality or student engagement. For a comprehensive assessment, multiple metrics including academic achievement, student well-being, and post-graduation success should be considered.

Future Considerations

One trend I found interesting was the low graduation rates observed at schools with the favorable student-teacher ratios. We identified that many teachers at these schools have a relatively short tenure. If available, I would like to incorporate the average tenure of teachers at each school so its impact can be better understood.

Also, the perception of school climate I believe could be one of the strongest indicators of overall performance. The only reason it was not a primary focus of this study was because its data collection only began after 2020. This metric can be related to the Broken Windows theory developed by Dr. James Wilson and Dr. George Kelling, then popularized through its implementation by the New York City Police Department in the 1990’s. Simply stated, a remedally maintained environment creates a negative feedback loop of worsening conditions and attitudes, unless purposeful and targeted intervention is introduced. These new metrics, along with poverty index and student teacher ratio, I believe could contribute to a better understanding of what drives student success.

Lastly, as I progress in my data analytics journey, I would like to develop a model using the variables on hand that can be used to predict the success of future classes.

Lessons Learned

This being my first data analysis project, there were many learning experiences. The initial strategy of creating individual datasets for each year then joining them at the end was incredibly inefficient. This created more work for me overall and also hid the inconsistent IDs, that were used as the primary key. Best practice moving forward will be to consolidate all data into one dataset to begin with.

I also wasted time cleaning data that was not necessary for analysis, specifically the oddly formatted district and street names (Figure 1.09). Working from one dataset would have mitigated some of this but the exercise was unnecessary to accomplish my goals. The alignment of formatting really only impacted the cosmetics of my SQL query. In a broader sense, the fields selected should be the minimum critical to successfully perform an analysis. Superfluous columns such as zip code and state took time to clean but provided no value. Best practice moving forward will be to only select the fields that are necessary for analysis and validation.

Lastly, the narrative of this project was entirely too long. As this was an exercise to showcase my analytical and software capabilities, that should have been my primary focus. While I accomplished those goals, I also wanted to craft a beautiful essay. And though I thoroughly enjoyed the data journalism above, its application to industry is limited. Best practice moving forward will be to keep all prose concise.

References

Zajacova, A., Lawrence, EM. “The Relationship Between Education and Health: Reducing Disparities Through a Contextual Approach”. Annu Rev Public Health. 2018 Apr 1;39:273–289. doi: 10.1146/annurev-publhealth-031816–044628. Epub 2018 Jan 12. PMID: 29328865; PMCID: PMC5880718
“The Pupil in Poverty Indicator”. The ORDA Download. Volume 1, Issue 4, August 2018. https://ed.sc.gov/data/information-systems/accountability-resources/the-orda-download/the-orda-download-volume-1-issue-4/
“SQL Joins”. Tech Agilist. https://www.techagilist.com/mainframe/db2/outer-join-step-by-step-walkthrough-with-examples/
“Pearson’s Correlation Coefficient”. In: Kirch, W. (eds) Encyclopedia of Public Health, 2008.Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-5614-7_2569
Harris, D.N., Phillip Levine, L.P. and Carly D. Robinson, K.M. (2023) Do school districts allocate more resources to economically disadvantaged students?, Brookings. https://www.brookings.edu/articles/do-school-districts-allocate-more-resources-to-economically-disadvantaged-students/
Starrett, A. Ph.D., Barth, S., Gao, R. Liu, J. Ph.D., DiStefano, C. Ph.D. (2023) South Carolina Teacher Retention Rates for the 2020–2021 Academic Year: One-Year and Three-Year Averages. SCTeacher. https://sc-teacher.org/wp-content/uploads/2023/02/FINAL-SCT-2023-004-Teacher-Retention-Report.pdf
“South Carolina Department of Education’s Annual School and District Report Card System for South Carolina Public Schools and School Districts 2022–2023”. South Carolina Department of Education, Office of Research and Data Analysis. (2023). https://screportcards.com/files/2023//data-files/