My current growth as a teacher is partly measured by quarterly student evaluations that ask students to assess my ability as a consistent, respectful, and capable teacher. When I gave my students the links to the surveys, I had at least four students who (semi?) jokingly told each other that they would “tear me apart” on the surveys, and “rate me as low as possible” on all of the measures.
I’m working hard as a first-year teacher. I’m not perfect, but I’m not bad. I do what I can to teach well and maintain a consistent classroom environment. But, unfortunately, part of that is pushing students with an active learning curriculum that is likely to frustrate students, while also doing what I can to keep students to high expectations so that everyone can learn as much as possible.
So when I was faced with the fact that my teaching ability was both going to be evaluated by my students and that my students would simultaneously not take it seriously or intentionally give me poor ratings because of practices that are good for learning, I couldn’t help but wonder whether student evaluations were a good measure of teacher effectiveness. So I dove into a little bit of the research.
The socially desirable conclusion to the question is that students obviously know what’s good for them, and that they are the best at telling how much they are learning and whether their teacher is “good.” I have been told this many different times by different instructional coaches and educational consultants. And in some cases that is obviously true. Student evaluations could be a helpful measure of a lot of different and important things in a classroom. What I’m worried about is whether they are consistently effective as measures of the actual teacher’s ability in terms of how good the teacher is at teaching students how to conceptually understand content and apply it in different circumstances.
Anecdotally, when I was a student I was often not diligent about truthfully and carefully answering the surveys. I’ve often given the roughest critiques to teachers that pushed me to learn the most material in their class. For instance, I ended up having two of the “hardest” chemistry teachers when I was in college. Both of them didn’t take any b*llshit, and both of them pushed us to really understand our material. It was sort of known that if you made it through the course, you would be successful in the chemistry classes that follow, but, nonetheless, I gave them poor reviews and they had consistently bad reviews on RateMyProfessor. I didn’t realize how wrongheaded that was till years later, but, in the moment, my frustration with them was enough for me to neglect the amount I learned in the course. Same goes on the other side. I knew plenty of professors who were popular just for being easy.
This is all up to what how we define “a good teacher.” In some cases, you may want to define “good teachers” as people who are motivating and inspiring, who are a friend to the student and give the student extra interest in the subject they teach. Those intangibles are valuable even if they aren’t measurable. But if we define “good teacher” as someone who helps students gain a deeper, conceptual understanding of the content that they can take and apply to different subjects, then a lot of the research shows that student evaluations either have zero correlation with student learning or have a negative correlation with student learning. The first study was a meta analysis of many studies on teacher effectiveness and the second study was a careful, randomized study at Bocconni University in Italy that tried to correlate performance on the exams of followup courses with student evaluations. The poorest rated professors ended up preparing their students the best for their future coursework.
On top of the poor connection between teacher effectiveness and student evaluations, there is also a good deal of evidence that shows that student evaluations are extremely biased about the identity of their teachers. The primary study on this topic tried to measure gender bias in student evaluations. They had “assistant instructors in an online class each operate under two different gender identities, students rated the male identity significantly higher than the female identity [regardless of the actual identity of the teacher].”
It is at least clear that student evaluations are extremely biased against female teachers. And although there are no formal studies (I could find) on the racial bias of student evaluations, there is some evidence that students are also racially biased against their teachers. These results are dependent on the identity of the student and are bound to be different across different countries and cultures. It seems like their needs to be more research on it, however, on first glance, it seems convincing that many students can be biased against their teachers for many reasons that are separate from their actual teaching ability.
One of the interesting trends in a few studies was that long-term learning was negatively correlated with student evaluations. One study from the Air Force Academy (recently profiled in Range) elaborated on the results from the aforementioned Bocconni study. Across ten years, the academy randomized students into different calculus classes that all cadets took in their first two years on campus. The researchers controlled for a number of different relevant attributes for student personality and performance, and then gave the students the same, protected exams and syllabi for each of the courses. The unusual result that was similar to the Bocconni study was that long term learning, as measured by future performance on the next calculus classes, was negatively correlated with student evaluations. This was despite there being a weak positive correlation with student evaluations and performance on the first calculus course. In other words, the best teachers for long-term understanding of the material were the most hated teachers by the students.
In Range, Epstein explains that this might be due to the fact that long-term conceptual learning can take time to foster and that students are often frustrated when they are pushed to gain a deep and non-procedural understanding of the content (situations where they aren’t just given processes to solve a derivative in a few different types of circumstances, they are pushed to have a deep and more widely applicable understanding of a derivative). This deep and non-procedural understanding of the material may not even be well correlated with short-term performance which can be influenced by an “overfitting” by the teacher where they teach mostly to the test with heavy emphasis on procedural understanding which can take away from the work needed for a deeper understanding of the content. Conceptual knowledge takes time and frustration to foster and if we evaluated teachers with short-term data alone we will likely wrongly identify some teachers as being “bad.”
One other possible explanation is that students are very resistant to the learning processes that are best for forming long-term conceptual knowledge. Many educators distinguish between passive learning and active learning. In passive learning, the teacher does the most talking and students are taking notes and writing down processes they can use to solve questions. In active learning, students are doing most of the work and teachers are mostly guiding students through questions without giving hints. Although there is a lot of nuance to this process and it isn’t always clear what method is best for different situations, one study has demonstrated that students learn the most from active learning. The catch is that they don’t feel like they are learning at all. Active learning is hard and frustrating and requires students to make a ton of mistakes in order to build a deeper understanding of content. Students don’t feel like they learn much from the process despite them doing better on tests and having a more flexible and applied understanding of the content. I’ve noticed the same thing in my own classes. My students tend to be very quiet and relaxed during passive learning lessons, but they become more agitated when I push them with good active learning. Though certain students become more likely to act out during active learning, it still serves more students than doing passive alone.
If we were to measure teachers by student evaluations, teachers would be implicitly incentivized to use learning techniques that are more optimized to students “feeling” like they learned something. But if we optimize for learning and test performance in itself, teachers may be more free to use techniques that are more effective despite how they can frustrate students and give students an incorrect and negative view of the competency of their teachers.
Most of the frustration over student evaluations has come primarily from colleges where student evaluations can be used to determine tenure. On top of that, most of the research on student evaluations has been done on college students. However, many public K12 charter networks, public K12 school districts, and K12 teaching education programs (Teach For America and Relay Graduate School of Education) are becoming more reliant on student evaluations to assess the effectiveness of their teachers. Since most of the research has been done on college students, we should be careful about overgeneralizing these findings to K12. More research on the use of student evaluations in K12 is needed and there is a chance that it could come out differently than the college studies. But, as of now, the current research suggests that we should be more careful of student evaluations across the board. From what we can tell, they are racially biased and gender biased, they are biased towards certain ineffective forms of instruction, and biased against the best teachers.