Teaching biostatistics to the math phobic

Francis Boscoe
The Startup
Published in
4 min readDec 16, 2019

I just finished teaching Introduction to Biostatistics online to master’s students in public health. Most of them were already employed in state and county health departments; most were seeking a certificate as opposed to an actual master’s degree. For most of those seeking a certificate, this was their final class — course 5 of 5, the one they had put off the longest. In the introductions that everyone posted, many indicated a fear of all things mathematical.

I wondered if it would be possible to teach the course in a way that might win some of these folks over. You can’t avoid math, of course. But you can stick to methods and concepts that are relevant to their jobs, and offload a lot of the computation onto simple R programs. At the beginning of the course, none of the students have ever used R before, and I thought I would just use it to help with a few of the knottier topics. But we ended up doing absolutely everything this way — nearly the final exam consisted of copying and pasting their RStudio console.

Though I had not begun with this in mind, I quickly got in the habit of writing a daily mini-lecture in as plain English as possible to accompany the current textbook section. A few students indicated they preferred learning through video lectures, so I did a few of those, too. By the end I probably had written 25,000 words, and spoken a few thousand more. I thought it might be useful to share some of them more widely, for the benefit of any introductory statistics students out there looking for answers.

I suppose this could be the seed of an eventual textbook. At the moment, that strikes me as overly ambitious, but many a textbook has grown out of lecture notes. My approach would involve:

  1. Limited emphasis on theory. Much of that was developed before computers existed, and represents clever shortcuts to avoid difficult or impossible calculations. Computers have no such limitations. You can get at least the same and sometimes a better answer through simulation.
  2. Heavy emphasis on R. Just as the invention of affordable calculators killed off the need to calculate logarithms (either by slide rule or hand), fast laptops mean we don’t need to waste time calculating sums of squares or consulting probability tables. R takes care of these things, leaving more time for interpretation of results.
  3. Heavy emphasis on data visualization. Tables of numbers cannot and should not be the sole means of communicating results. While I think this message is now widely understood, it still has not trickled down to the level of this course, where traditional approaches dominate, along with the traditional idea that numbers are objective.
  4. Heavy emphasis on relevance and practice. My textbook, as up-to-date as it was with recent developments and trends in the field, was still full of stodgy old statistical tests that are still taught but which I never encountered during a 20-year career in a public health department. Consequently, it is over 1300 pages. Just because it was named after your advisor’s advisor does not mean it deserves to be enshrined in textbooks forever.
  5. Some acknowledgement of controversy and uncertainty. Wherever appropriate, I mention how some statisticians like to (or used to) do things one way, while others do it another way. This comes as a surprise to most, who were taught that like in algebra, there is always a single correct answer that you are seeking. Of course, many of the statistical debates are too esoteric for students at this level.
  6. Use of clear English. I have worked with many statisticians over the years, and only a tiny handful have communicated in equations and Greek letters. The rest have used stories, analogies, toy examples, charts, diagrams, whatever it takes. Many abandon this when they write their papers, which I suppose is a combination of shorthand and meeting peer expectations. But I’ve always thought that nearly everything can be explained in a way accessible to the educated layperson. Even if you disagree with this, we are only talking about an introduction to statistics here.

I am not aware of any textbook that does all of these things, though all do some of them. Those that come the closest pertain more to general statistics than biostatistics, and maybe that distinction is less important than I make it. In teaching public health folks, I want the examples to be things they are likely to encounter, not just playing cards, balls in urns, widgets, characteristics of 1970s automobiles or characteristics of irises (the flowers). The principles are the same but it is an additional level of abstraction I’d rather avoid.

--

--

Francis Boscoe
The Startup

Francis Boscoe founded Pumphandle, LLC in 2019 to help clients solve their vexing data problems, primarily in the realm of public health.