How data analysis helped uncover the “cheating teachers” in Chicago Public Schools

Aneri Sheth
Technology titbits
Published in
6 min readSep 3, 2017

I would like to share with you an interesting application of data analysis in uncovering the malpractices followed by a few teachers in the Chicago Public Schools to portray their students as being more knowledgeable in order to enhance their own reputation as teachers.

This is a true occurrence of the late 90s and is elaborately discussed in the famous book Freakonomics by Steven Levitt. What amazes me about this incident is not only the immaculate application of data analysis, but also, the meticulous systematic thinking that was put into solving the problem.

Chicago Public Schools is a huge system educating more than 400,000 students a year. During the 1990s a new concept of “high-stakes” testing was being debated in the US educational system. The testing was called high-stakes because instead of only testing the students on their progress, schools are held accountable for the results. The Chicago Public School system embraced high-stakes testing in 1996. Under the new policy, a school with low reading scores would be placed on probation and face the threat of being shut down, its staff to be dismissed or reassigned. The CPS also did away with what is known as social promotion. In the past, only a dramatically inept or difficult student was held back a grade. Now, in order to be promoted, every student in third, sixth, and eighth grade had to manage a minimum score on the standardized, multiple-choice exam known as the Iowa Test of Basic Skills.

Although it served to raise the standards of learning and incentivize the students to study harder, it also tempted the students to cheat more as now, their promotion to the next grade was on stake. Children, of course, have had the incentive to cheat for as long as there have been tests. But “high-stakes” testing brought about one radical change — it provided a huge incentive for the teachers to cheat as now their personal evaluation and growth were directly linked to it. If her student performs poorly on test, a teacher may not be considered for a raise or promotion. If the entire school tests poorly their federal funding can be withheld and the staff fired. The state of California at one point introduced bonuses of $25,000 for teachers who produced big test-score gains.

As suspicions for cheating teachers surfaced there was a need to devise a means to uncover the activities going on around the schools.

Chicago Public Schools made available a database of the test answers for every CPS student from third grade through seventh grade from 1993 to 2000. This amounts to roughly 30,000 students per grade per year, more than 700,000 sets of test answers, and nearly 100 million individual answers. The data, organized by classroom, included each student’s question-by-question answer strings for reading and math tests.

Let me take you through the data analysis process that was followed, using some excerpts of this data.

Consider now the answer strings from the students in two sixth-grade Chicago classrooms who took the identical math test. Each horizontal row represents one student’s answers.

The letter a, b, c, or d indicates a correct answer

A number indicates a wrong answer, with 1 corresponding to a, 2 corresponding to b, and so on.

A zero represents an answer that was left blank.

One of these classrooms almost certainly had a and the other did not. Try to tell the difference — although be forewarned that it’s not easy with the naked eye.

Classroom A

Classroom B

If you guessed that classroom A was the cheating classroom, congratulations!! Here again are the answer strings from classroom A, now reordered by a computer that has been asked to apply the cheating algorithm and seek out suspicious patterns.

Classroom A (with cheating algorithm applied)

As can be seen by the answers marked red, the data analysis algorithm managed to come up with a very clean obvious pattern — 15 of the 22 students have given the exactly same 6 consecutive answers correct which seems more than a coincidence when clubbed with the following information they had:

  1. These questions,coming near the end of the test, were harder than the earlier questions.
  2. This was a group of average students and very few of them have got 6 consecutive correct answers anywhere else on the test, making it even more unlikely that they would get 6 continuous answers right in the harder part of the test.
  3. Upto this point in the test, the fifteen students’ answers were virtually uncorrelated.
  4. Three of the students (row numbers 1, 9, and 12) left more than one answer blank before the suspicious string and then ended the test with another string of blanks. This suggests that a long, unbroken string of blank answers was broken not by the student but by the teacher.

The algorithm also uncovered another important pattern — six correct answers are preceded by another identical string, 3-a-1–2, which includes three of four incorrect answers. And on all fifteen tests, the six correct answers are followed by the same incorrect answer, a 4. Why on earth would a cheating teacher go to the trouble of erasing a student’s test sheet and then fill in the wrong answer? Perhaps she is merely being strategic leaving a trail of wrong answers to avoid suspicions of forgery.

Another indication of teacher cheating in classroom A is the class’s overall performance. As sixth graders who were taking the test in the eighth month of the academic year, these students needed to achieve an average score of 6.8 to be considered up to national standards. (Fifth graders taking the test in the eighth month of the year needed to score 5.8, seventh graders 7.8, and so on.) The students in classroom A averaged 5.8 on their sixth-grade tests, which is a full grade level below where they should be. So plainly these are poor students. A year earlier, however, these students did even worse, averaging just 4.1 on their fifth-grade tests. Instead of improving by one full point between fifth and sixth grade, as would be expected, they improved by 1.7 points, nearly two grades’ worth.

Its mesmerizing how a logical and meticulous application of data analysis can bring out facts and trends from humongous data sets in a way that would never be possible through the naked eye.

In addition to detecting cheaters, the algorithm could also identify the best teachers in the school system. A good teacher’s impact was nearly as distinctive as a cheater’s. Instead of getting random answers correct, her students would show real improvement on the easier types of questions they had previously missed, an indication of actual learning. And a good teacher’s students carried over all their gains into the next grade.

In early 2002, the new CEO of the Chicago Public Schools, Arne Duncan,wanted to go through this analysis and take some action against the cheating teachers. The best way to get rid of cheating teachers, Duncan had decided, was to readminister the standardized exam. He only had the resources to retest 120 classrooms, however, so he asked the creators of the cheating algorithm to help choose which classrooms to test.

In order to make the retest results convincing, 120 classrooms for retest were chosen such that more than half of them were those suspected, by the algorithm, of having cheating teachers. The remaining were those predicted to have excellent to mediocre non-cheating teachers.

When the retest was conducted, the results were as compelling as the cheating algorithm had predicted. In the classrooms where no cheating was suspected, scores stayed about the same or even rose. In contrast, the students of the classrooms suspected to have cheating teachers scored far worse than the initial “adjusted” scores.

This is how data analysis, complemented by a logical thinking approach, helped Chicago Public School system gather sufficient evidence against and fire the teaching teachers, thereby providing the benefit of improvising the educational system.

--

--