Student Evaluations of Teaching — A Valuable Tool or A Waste of Time? Part 5

Part 5 — Bias

In all research, whether it’s academic or market research, there is always some sort of bias that needs consideration. This is generally well known and nothing out of the ordinary. Bias is always a factor.
Handling bias can be done in two ways; 1) by limiting it from ever entering your research, or 2) by limiting the effects.

Limiting from entry is, of course, the best option out of the two, but there is no way to completely escape bias. People could interpret questions in different ways, you may ask the wrong question or a respondent might be missing from the target group. More than often, you don’t realize this is the case until you have a look at the responses.

What’s left then is to work with the second option; limiting effects. The better bias can be handled, the more reliable the results are. As simple as that.

Student evaluations of teaching are no exception and bias can be expected to occur in surveys quite frequently. What is important is how you handle bias.

In this post, we will have a look at what bias can be expected in student ratings of teaching and how it might impact results. Scholars still debate this topic intensely and many have contributed to the field. We have studied a large chunk of the available literature and will summarize findings in this blog post.

Different reports claim to have found bias in a wide spread of areas from personality and sense of humor to the attractiveness and physical appearance of the teacher. We have chosen to focus on the most well-known biases, which also happens to be most understood by science. We’ve found six biases standing out.

Firstly, there is the notorious grade bias.

Remember the story from an earlier blog post about the teacher who had to water out his course to get better ratings? Grade bias is to blame for that.

The theory is that students who receive better grades from the teacher also give more positive feedback in the teacher rating as sort of a reward. As the example shown, this can also lead to teachers purposely giving higher grades and expect better ratings in return.

Multiple scientific reports have actually found a positive correlation between increased rating and higher grades. BUT, the theory of students rewarding teachers who gives good grades with high evaluation scores is just one out of four plausible explanations as to why this bias exists. In fact, most scientists consider grade leniency to be the least likely explanation.
Interestingly, another hypothesis is that this correlation can come from what is known as the ‘validity hypothesis’. This hypothesis goes to show that one reason this bias exists is that the method is working as intended. 
Let me explain:

If we can all agree on a strong correlation between learning and grades, a correlation between a high rating on teaching effectiveness and grades is nothing but natural. It is meant to be that way and just verifies that learning, grading and teacher effectiveness all go hand in hand.

Thus; the correlation between grades and rating is not considered a bias in a majority of research reports.

Workload bias

Another rigorously researched bias is the workload bias. All students know that the workload between different courses can differ widely, but how does the workload affect evaluations? Is it true that less intense courses get higher evaluation scores?

As it turns out. Nope. Contrary to what many believe, workload is actually positively correlated to ratings; when workload goes up, ratings follow. And vice versa. It’s been proven that students who perceive hard workloads as valuable and relevant have a tendency to appreciate it. Expecting a lot from students is actually an aspect of good teaching.

Therefore; workload isn’t a bias.

Student Interest and Effort Bias

How does student interest in the subject taught impact evaluations? And effort? Will the SET score go down if student effort is low?

Research shows that often when student interests are high, ratings tend to be high. Whether or not student interest can be considered a bias is a bit tricky without knowing the reason that caused the student to show interest.

A truly great teacher should be able to inspire students to develop an interest in the subject and put more effort into it. If this is the case, then student interest is highly related to teacher effectiveness and should, therefore, be expected to rise with rankings on teacher effectiveness. To complicate things further there is, of course, many other things that could have an impact on student interest besides the teacher.

The verdict to whether this is a bias or not is therefore; maybe.

Many studies also suggest ways that this eventual bias can be handled. The best way is often to control for prior interest ahead of starting the course or to simply ask the student the reason why he or she is taking the course, how much effort is exerted in other courses and so on.

In case a student’s interest for a specific course is so high he or she would enroll regardless of who is teaching it, the student rating can be expected to be a little higher unrelated to the teacher.

Class size bias

Many studies have shown that with increasing class sizes the ratings tend to go down. This bias is also a bit tricky in its definition. It is commonly known that learning is often of greater quality when class size is small. However, this parameter is at most times, not something that the individual teacher can control for and is thus unfair towards the teacher.

Class size bias then; might be a bias.

Though, there are lots of other factors in play here, including the border between a large and a small class is a bit fuzzy.

Discipline bias

Scholars and university faculty have noticed a peculiar trend in student ratings; namely that humanities teachers often have a higher SET-average over social science teacher who in their turn have a higher average over STEM-teachers.

Scientists have had a hard time of explaining where this phenomenon comes from. Some say that humanities professors might actually be generally better teachers than social science dittos. Another hypothesis is that STEM-students could be stricter in their ratings. Either way, it is recommended that SET-results of STEM-teachers only compares to other STEM-teachers of this reason.

So, yes, when different disciplines are compared this is a bias.

Gender bias

Some studies suggest that SETs are generally more beneficial towards female teachers. Other more recently studies show that there might actually be the case that female student’s rate female instructors higher and that male student’s rate male professors higher. Even though this is, in fact, a small bias it should still be accounted for.

Ultimately: Gender bias can exist to some extent and should be adjusted for.

Considering how often student evaluations of teaching is blamed for being unreliable due to heavy bias, the bias found in the majority of studies is really low. It’s important to recognize that very few have found a correlation of more than 0.30 for any of these biases.

To sum things up

Yes, bias exists in student evaluations of teaching. As it does in academic research and in market research. However, student rating seems to have gotten an undeserved reputation of containing more bias than other types of research. Studies show that many of the biases SETs have been accused of is totally irrelevant and that the bias that do exists have a relatively small impact on results. 
Still, as teacher ratings are sometimes used in high stake decisions it is crucial to recognize and adress the biases in order to develop a valid tool for assessments and to be able to interpret results accurately.

Next up in this series: What factors needs to be handled when developing a teacher evaluation and feedback tool? Don’t miss out.


About the author:

Hubert.ai is developing the next generation of teacher insights tools. Shortly we will introduce the first ever purpose built chatbot for teacher feedback, and it’s completely free!