Demystifying California State Tests Part 2: The Biases

13 min readFeb 10, 2023

J. M. Urrutia, Ph.D.

February 8, 2023

In my goal to unpack the rhetoric about California test scores, I explained in “Demystifying California State Tests. Part 1: The Sky Is Not Falling” that politicians’ demands of 100% “proficiency” are not just a fool’s errand but an impossible goal because of the design of the tests currently in use in California. I will examine the biases inherent in the tests in this part.

The documentation of the Smarter Balanced Assessment Consortium (SBAC) tests demonstrates they are generated from a “large pool of test questions statistically calibrated on a common scale to cover the ability range.” It is therefore not surprising that the technical reports produced by the SBAC vendor, Educational Testing Service (ETS) , do contain a narrative describing in detail the manipulation of the raw scores produced by the testing.

There is no doubt that ETS personnel have the technical expertise required to design, generate, and properly process the outcomes of the test. However, their product is meant to compare the results from one year to the next and is therefore mathematically impossible to expect an annual increase in scores as politicians and others have been demanding for many years.

It must be noted that the conversion of raw scores to scaled scores is a product of considerable statistical manipulation which likely is completely opaque to the non-specialist. Despite all these caveats, ETS does publish the distribution of scores for each grade tested, which are different for each grade since the battery of questions is different.

One can, therefore, explore the information made available for each test administration and search for patterns. The data is made available by ETS in what they call “research files.” The website’s root URL, https://caaspp-elpac.ets.org/caaspp/Default, is static, but it responds dynamically when accessed. The results can be obtained after clicking on the button labeled English Language Arts/Literacy and Mathematics. A link labeled “Research Files” is found within this page and the data for all administrations since 2015 is available there.

The structure and contents of the research files is also given there and informs the researcher that the student performance is available for 54 subgroups for each educational entity (state, county, district, school) for a total of 11,340 entities in 2022. The research file for 2022 is a text file containing 3,855,782 lines and is 442.6 MB after expansion (85.1 MB when compressed). As noted in Part 1, the distribution of scores is different for each grade tested and the data chosen for this analysis was that from the 3rd graders ELA administration for 2019. For the purposes of this section, the data for this population will be examined to compare how different student subgroups responded to the test. Hence, I cannot offer any opinion on whether or not the 3rd grade SBAC test is a “reasonable” test. If one desires, the CDE offers a website where sample tests can be taken on-line: https://www.caaspp.org/practice-and-training/.

The simplest question that can be asked of the data is what is the difference in academic achievement as measured by the SBAC between children labeled as “disabled” and their non-disabled peers. Of course, the disability label is rather broad and covers everything from mild learning disabilities to severe disability conditions. Disabled students are provided with accommodations in order to participate in the SBAC administrations, but the scores are still processed the same way and similarly included in the final results. There is an expectation that disabled students cannot perform on par with non-disable children.

Separating the data confirms this expectation as shown in Fig. 1b: the proficiency level of non-disabled 3rd graders in ELA for the entire state increases to 52.1% after the disabled students are disaggregated. The proficiency of disabled students is 21.32%, and a gap of 30.78% exists between the two groups. While the levels are similar between the entire state and the entire County of Los Angeles, the proficiencies for LAUSD students are lower: 46.99% for non-disabled students and 14.66% for disabled students. Despite the lower levels, the gap is roughly the same: 32.33%.

**Figure 1:** (a) Percent of 3rd grade students in the defined achievement bands for the 2019 ELA administration as obtained from the research file. (b) Same as (a) but disaggregated for disability. © Percent of proficient and non-disabled 3rd grade students in California schools as a function of school poverty. (d) Same as © but for disabled students. (e) Same as © but for students attending LAUSD schools. (f) Same as (e) but for disabled students.

As noted in Part 1, there is an expressed concern that the proficiency levels of LAUSD students are lower than the state’s as a whole. True. But there is plenty of unfounded criticism that students in the District are not well served by their teachers and their administrators. It is true that activists in the community claim that not enough resources are invested in schools and often demand considerable increases in local school funding. They argue that more funding goes to wealthy areas of the District to the detriment of underserved areas. This is not true today. Historically, yes, funding did favor the wealthier areas of LAUSD up until the late 1970s. But the passage of Proposition 13 upended the way schools were funded. Since then, LAUSD has funded all school operations based on a school’s enrollment, a practice that was ruthlessly reinforced during the Grand Recession (those interested on this issue should take a look at the staffing bulletins found at this URL: https://achieve.lausd.net/Page/18003). The purse strings were to be loosened after the full implementation of the Local Control Funding Formula, which would drive more funding to underserved schools through the Supplemental and Concentration Grants. This did not happen at LAUSD and it led to the UTLA strike of 2019.

As the schools are currently funded at the same insufficient level, what then accounts for the difference in scores between the state and LAUSD? Analysis of the data suggests a very simple answer: it is the socioeconomic environment in which the students have been raised and live. The proficiency level of a school for any student subgroup can be examined as a function of the poverty of the school. However, the poverty level of a school is not available in the proficiency research files. That information can be extracted from the Free or Reduced-Price Meal (Student Poverty) Data files maintained by the CDE.

As shown in Fig. 1c, the proficiency of a school’s non-disabled students is related to the school’s poverty. Yes, there’s a spread around the “trend-line,” but it is undeniable that the proficiency is higher when the school community is less poor. (For those interested, the least-squares fit for Fig. 1c is proficiency = 82.43–0.575 × school poverty, and the R-squared value is 0.561.) Figure 1e shows that the schools in LAUSD’s territory (including all charter schools authorized by LAUSD) have a greater proportion of schools with high poverty whose low proficiencies lower the District’s overall proficiency as show in Fig. 1b. The relationship between the school’s poverty on the proficiency level of disabled students is starkly evident in Fig. 1f where the majority of schools reporting proficiency levels have lower levels of proficiency.

Figure 1d makes it evident that if the poverty of the school’s community is lower then a higher achievement is possible for its disabled students. Since the educational activities of all schools are funded equally, it is likely that these increases are due to greater access to enriching activities within the community itself.

This, sadly, is unlikely to happen in underserved communities.

Before continuing this exploration, it is worthwhile to inquire if there is a gender bias in the SBAC 3rd grade test. Not surprisingly, there is. But is not what most would expect: The data shows that female students meet the ELA standards more readily than males. This is shown in Fig.2b, where the gap between female and male is roughly 7% regardless of which regional cohort is examined.

**Figure 2:** (a) Percent of 3rd grade students in the defined achievement bands for the 2019 ELA administration as obtained from the research file. (b) Same as (a) but disaggregated for gender. (c)Percent of proficient 3rd grade male students in California schools as a function of school poverty. (d) Same as (c) but for female students.

Figures 2c-d show that the community’s poverty is still a factor but the achievement gap is due to the entirety of the female population scoring higher then its male counterpart. The spread around the trend-line is also smaller for the female cohort.

In theory, the ELA tests are an inquiry into the ability of children to use the English language to effectively communicate as well as to understand texts, whether they be informational or narrative. There is abundant proof, easily accessed by searching for “importance of reading to children,” that introducing children to narrative texts as early as possible increases their vocabulary and communication ability.

Since the availability of reading material correlates highly with the level of formal education of the parents as well as with their disposable income, it is reasonable to compare how the scaled scores are distributed depending on parental education and economic disadvantage. This is done in Figs. 3a, c, and e and Figs. 3b, d and f, respectively. California children whose parents have less of a high school education have a proficiency of 27.53% while 75.16% of children whose parents have a postgraduate degree are proficient, resulting in a gap of 47.63%. This gap is higher than the gap attributed to economic disadvantage, 31.51%. The results are similar for LAUSD: the gap due to parental education, 45.39%, is higher than the gap due to economic disadvantage, 36.55%.

**Figure 3:** (a) Percent of 3rd grade students in the defined achievement bands for the 2019 ELA administration as obtained from the research file disaggregated by level of parental education. (b) Same as (a) but disaggregated by economic status. (c) Percent of proficient 3rd grade students in California schools as a function of school poverty who have parents with a college diploma. (d) Percent of proficient 3rd grade students in California schools as a function of school poverty who are not economically disadvantaged. (e) Same as (c) but for students with parents with a high school diploma. (f) Same as (d) but for students who are economically disadvantaged.

Displaying the proficiency of the students as a function of the school’s poverty confirms what the performance bands indicate: 3rd graders score higher as a group if their parents have a higher educational background. These graphs also indicate that the poverty of the school community as a whole is not the main determinant of the proficiency level of the subgroup. That is, a child with a parent with college education is likely to be proficient even while attending a school where the poverty is high.

Similarly, a child with a parent with a high school diploma is not as likely to be proficient even though s/he is attending a school where the community is not economically disadvantaged. Hence, the argument that the school’s poverty is dragging down the scores is more than likely not true because the root reason is the cultural and linguistic poverty in which the child has been raised. This suggests that the best way to reduce the performance gap is to focus on programs that help children to develop higher levels of language and listening skills. While these programs should include early childhood components, they must certainly be implemented in the early grades.

The alternative to implementing such programs is to pour resources into increasing the number of college graduates in the state. As noted in Fig. 3a, 151,236 3rd graders in California had college-educated parents while 246,428 did not. Increasing the income level of the non-college educated would cost far more than investing in literacy programs that repair the damage to language skills that poverty causes.

The patterns displayed when proficiency of 3rd graders is categorized according to economic disadvantage and plotted against the school’s poverty suggest other generalizations. For instance, the schools with lowest poverty show a narrowing of the spread in proficiency when the students are not economically disadvantaged, likely due to their parents having access to greater educational resources outside the school. Conversely, these same schools are helping some but not all of their economically disadvantaged students to perform at higher levels. This achievement is, however, scatter-shot, no pun intended, as the scatter of the points indicates that not all offer the same type of programs that help these students. Nevertheless, this suggests an avenue of research to identify the programs generating these apparent increases.

The media invariably focuses on the lower proficiency of African American and Latino students in comparison to White and Asian American students. Here’s how the Los Angeles Times characterized the 2022 results:

The test results are even more devastating for Black, Latino, low-income and other historically underserved students — 84% of Black students and 79% of Latino and low-income students did not meet state math standards in 2022.

EdSource does no better:

The pandemic’s effects were widespread; the scores fell roughly the same — 5 to 7 percentage points among most racial and ethnic groups. But disparities in scores among those groups were already chasmic, and the declines in 2022 wiped out six years of slow, steady progress since Smarter Balanced was introduced in 2014–15. The 69.4% of Asian students who scored at or above standard in 2022 is more than triple the rate for Latino and Black students.

Indeed, the proficiency rates dropped when compared to 2019 as well as prior years. But are the drops devastating? To put these interpretations in the context explored above, one needs to know what the gaps were prior to the pandemic as well as what is the influence of economic disadvantage. This is done in Fig. 4 where the proficiency performance of the four major ethnic groups in California are compared to the total population, to each other and to themselves. Figure 3b is repeated in Fig. 4a to allow comparison of how proficiency responds to economic disadvantage to how when proficiency changes if the separation is done by ethnic group in Fig. 4b, which is also done by the Los Angeles Times and EdSource. Yes, there is a gap between the ethnic groups but when economic disadvantage is included, the gap is not between ethnic groups but between economically different groups.

**Figure 4:** (a) Percent of 3rd grade students in the defined achievement bands for the 2019 ELA administration as obtained from the research file. (b) Same as (a) but disaggregated for the four major ethnic groups (White, Asian American, Latino and African American). (c) Same as (b) but for students in the selected ethnic groups who are not economically disadvantaged. (d) Same as (c) but for students who are economically disadvantaged. (e) Same as (a) but for students identified as White. (f) Same as (a) but for students identified as African American.

Figure 4b includes what the media focuses on: Latinos and African American students. These groups perform significantly lower than Whites who themselves perform lower than Asian Americans. The gap between Whites and Asian Americans is interesting because in 2019 it varied from 5.5% when comparing both state populations, but increased to 10.56% for not economically disadvantaged and 10.79% for economically disadvantaged. Why focus on these populations? Because economically disadvantaged 3rd grade Latinos and African Americans were 213,743 in 2019, more than twice the number of not economically disadvantaged Whites and Asian Americans (93,466). Given that the distribution of scaled scores requires nearly 50% of the population to not be proficient, is it a surprise that half of the population is comprised of poor Latinos and African Americans?

Comparing students of different economic circumstances but belonging to the same ethnic group shows that the gap between poor and not poor students is also informative. It is observed that the gap between the two groups is between 20 and 34%, with the biggest gap, 33.27%, between poor and not poor White students at LAUSD. Why hasn’t the Los Angeles Times, the paper of record for the city, ever looked into this fact? Why are there no articles about school issues where worker wages and educational level are included? Is it because historically the Los Angeles Times has put the blame for poor educational outcomes at the feet of the teachers? Why does the Los Angeles Times and other papers in the city not question the biases of these Standardized Tests?

It is clear that the SBAC tests are biased: they need half the students taking the test to be not proficient in order for the other half to be proficient. The design of the test “volunteers” economically disadvantaged students, who are also highly likely to have parents who do not have a college education, to “fail” the test and perpetually be the scapegoat. This is not hyperbole because the same results were obtained during the California Standards Test era when the tests were not adaptive. As discussed in Part 1, the new-and-improved test is supposed to be better at judging a student’s ability because it “requires fewer questions to obtain an equally precise estimate of a student’s ability.” If the results are the same even after changing the test, doesn’t this lead to the inescapable conclusion that using such a test cannot be appropriate to judge schools?

Nothing in the media nor the statements from CDE and district administrators indicate that testing is all about accountability. But that is exactly what is stated in Educational Code Section 60602.5(a):

It is the intent of the Legislature in enacting this chapter to provide a system of assessments of pupils that has the primary purposes of assisting teachers, administrators, and pupils and their parents; improving teaching and learning; and promoting high-quality teaching and learning using a variety of assessment approaches and item types. The assessments, where applicable and valid, will produce scores that can be aggregated and disaggregated for the purpose of holding schools and local educational agencies accountable for the achievement of all their pupils in learning the California academic content standards.

(emphasis mine). This is the same stick that was waved by the federal No Child Left Behind law: get every student to be proficient or else. Since achieving that aspirational goal has been proven by pesky facts to be impossible, no politician talks about this stick anymore. That does not stop some, including Superintendent Carvalho, from still calling for harsh measures for not increasing scores, either at the local or state level.

The data presented in the graphs of this and Part 1 demonstrate that the SBAC tests are not a metric of students mastering the standards. Instead, it is a method for how to generate a distribution of scaled scores based on a set of questions that force students to fall into pre-defined bins. Leaders in Los Angeles, as well as all educational leaders across the state, must have a serious conversation on what should be the true outcome of a public education in California. The state has been forcing students to take one or another standardized test since the mid 1990s and the results have been the same. In the meantime, parents at LAUSD and the rest of the state should opt out of the SBAC tests as permitted by Educational Code Section 60615:

Notwithstanding any other provision of law, a parent’s or guardian’s written request to school officials to excuse his or her child from any or all parts of the assessments administered pursuant to this chapter shall be granted.

By doing this one act, parents will be saving their child, their school, and our community from wasting resources and the test insanity will be stopped. We are tired of politicians beating their chests and newspaper articles wasted on test scores that prove nothing. It’s time to start fresh. Unplug From Test Scores.

Finally, for the reader’s information, Table I has the proficiency levels of all the subgroups discussed in this writing. It also includes the data for 2022 so that readers can judge for themselves if the sky is or is not falling.

Demystifying California State Tests Part 2: The Biases

Written by Tracy in LAUSDLand