Teaching A Class With A Bimodal Distribution — If You Have One!

Andrew Robinson
Precarious Physicist
6 min readSep 28, 2016
Bimodal distribution with two Gaussian curves superimposed. Generated in Matlab.

On his blog Small Pond Science, Terry McGlynn, posed a question “If you have a bimodal grade distribution, does this change the way you teach the class?”. This is a great question, and since I have taught classes which I believe to have been bimodal in distribution, I thought that I would throw around a few ideas.

So, what to we mean by a bimodal distribution? Let’s go back to the classic unimodal distribution, called variously the bell curve, the normal distribution or a Gaussian function. (Disclaimer: I’m a spectroscopist, so Gaussian is my favourite term!)

Here is a simulated normal distribution. This one is centred around a mean mark of 50%.

Normal distribution (the bell curve or gaussian function). Centred with a mean value of 50%.

These days, with the dreaded grade inflation, this tends to get shifted off towards higher marks. It’s still a symmetric distribution, but truncated at the high end. When people talk about “grading to a curve”, this is the curve they mean.

The same distribution, but shifted to a mean value of 80%.

Now if we have a bimodal distribution, then we get two of these distributions superimposed on each other, with two different values of the mean score. The Matlab simulation looks like this:

Bimodal Distribution. Twin Peaks. Cherry Pie.

You can see that in this particular example, there is a clear trough between the two peak maxima, and we could interpret this as two different populations within the class, one group of higher achievers, with a high mean score, and a second group of relative underperformers, with a lower mean score.

Of course, this is a nice, simulated and fairly unambiguous interpretation. If the two peaks get closer together, then things become a little messier:

Two peaks moved closer together. Interpretation of the overall shape as a bimodal distribution is now ambiguous

All I’ve done here is moved the two peaks a little closer together. As you can see, the overall shape is now rather a blobby mess (to use the technical term), but it’s not far removed from a normal distribution shape itself, and so ,speaking as a spectroscopist, who has spent many, many hours fitting curves to noisy experimental data, I would be hard put to justify fitting this distribution to two peaks — it might just be a single distribution with some statistical fluctuations on it. Of course in the real world of finite class sizes and smaller data sets, then there are always random fluctuations which might make the distribution look like a bimodal one, but without any statistical justification for that assertion. In the figure below, the simulated grade distribution looks like it might be a bimodal one, but because of the sample size, it is entirely possible that the fluctuations are such that it is really a unimodal distribution. Indeed, if we examine a grade distribution we might find in a real class, then we might see something like this:

Simulated grade distribution. Are the two peaks really just due to statistical “noise”?

Recently Elizabeth Patisas and co-workers have analysed a large number of Computer Science classes and found that in the vast majority of cases, the bimodal distribution interpretation (a widely held truth in CS circles) is purely due to statistical fluctuation.

So the moral of the story so far, is to exercise extreme caution in interpreting your grade distribution as bimodal, especially if you only have a limited number of students in the class. As a rough rule of thumb, I’d suggest 100 students is the absolute minimum to start making interpretations. But there are certainly cases where a bimodal distribution could occur. Remember the study by Patisas et al is covering a relatively homogeneous group of students all taking CS courses at various stages of their years. I don’t doubt the validity of their findings at all, but many of us teaching large classes, particularly service classes to non-majors have a vastly more heterogeneous composition to our classes. One of the general introductory physics classes which I teach in first year has biology, biochemistry, chemistry, earth sciences, neurosciences and budding physicians in it. What they have in common is the requirement to take a physics course, but they enter my classroom with a vastly different set of skills and knowledge in terms to their preparedness in mathematics and physics. So I have to remain open to the possibility that there might be a bimodal distribution (or even more peaks under the grade curve).

Can I see the various different groups within the class? Well, I can get a rough idea because each class is also broken down into laboratory groups of up to 65 students. Now these groups do tend to be lumped together by their program, due to timetabling constraints. It is noticeable that some groups are better performers in the laboratory than others. Programs with high grades entry, or a competitive entrance requirement often produce far better work in the laboratory than others. So I can see that the large number of students are certainly not a homogenous group, but are rather slip into several subgroups. These subgroups might not be large enough to see in the distribution curve. All my examples above have the smaller group being exactly half the size of the larger one, for reasons of clarity. If the smaller subgroup is only 10% of the class, it becomes difficult to make it out.

What to do if there is a bimodal curve

Firstly: Can you analyse the distribution to find a pattern. You might ask questions such as:

Are students in a particular program at a disadvantage? if so. why?

Are students from a program at an advantage, and why?

Could there be a language problem for ESL students?

Could there be a cultural barrier?

These are not always easy questions to answer, and often the instructor either doesn’t have the necessary data set to work with or simply does not have the time or resources to do the necessary amount of sleuthing to find out. Getting a breakdown of which student is in which program is virtually impossible from our learning management system — it’s relatively easy to view the background of an individual student, but difficult to pull it together for all students.

I now make sure that I carry out a pre-class survey of all my students to find out a little about their background in physics and mathematics. Consequently, I have developed problem sets and work sheets for those with a relatively weak level of preparedness in mathematics. I also try to make sure that technical language is explained thoroughly, and that if I ask questions on tests, then there are diagrams or pictograms to help in interpretation for non-native English speakers. The example below is from my introductory kinematics course.

Students might not be familiar with the word “rhinoceros”, adding a picture is worth a thousand words

You will note from these methods, I’m working to try and move the scores of the lower scoring students upwards, by providing extra support and assistance. I am not adjusting the general teaching methods, because I still have to push and challenge those people who are in the high scoring cohort. In general, I have found that the “Give the weaker students extra resources” strategy to be a successful strategy, in terms of student engagement, attainment and retention in class. I can’t state with certainty that this has prevented a bimodal distribution, because the classes where I did see that bimodal distribution were at my previous university, and I would not be comparing like for like student cohorts. In addition, of course, I’ve also become a more experienced teacher, and hopefully increased my ability to communicate and teach all members of the class. What I can say is that I do not see bimodality in my present classes, with these methods in place.

So, in summary:

Beware of finding bimodal distributions where there aren’t any!

See if there is an obvious reason why your class might have a group with a relative disadvantage.

Try to remedy that disadvantage without compromising the teaching for the better student.

Aim High.

--

--

Andrew Robinson
Precarious Physicist

Physics Teacher at Carleton University ; British immigrant; won some teaching awards. Physics Ninja Care Bear; Baker of Cakes; he/him