How many people that use this term actually understand what it means?

If your experience is anything like mine, you’ve probably heard numerous people talk about ‘statistical power’ in conversations at work. I’m pretty sure that for the most part these people are pushing for larger sample sizes on the basis of some vague conception that a larger n is always good.

But how many of these people can actually define what statistical power is? In this article I want to take a look at the concept and definition of statistical power and identify where it is useful as a measure.

Hypothesis testing


How to move fluidly between R and Python in the same project or document

In this article I want to demonstrate how R Markdown is a nice setting for coding your project in both R and Python, allowing you to code some elements of your project in each language and manipulate objects created in one language using another language.

This could be valuable to you for a number of reasons:

  1. It will allow you to code in your native language but bring in features that might exist only in the other language.
  2. It will allow you to directly collaborate fluidly with a colleague who programs in the other language.
  3. It will give you the…


R Markdown is more versatile than you might think

Though I code in both R and Python, R Markdown is my only route for writing reports, blogs or books. It is incredibly flexible, has many beautiful design options and supports many output formats really nicely.

If you have never worked in R Markdown, I highly recommend it. If you have worked in it before, here are ten little tricks I’ve learned which have served me well in numerous projects, and which highlight how flexible it is.

1. Parameterizing documents


She would use tidymodels

The output of many models in R can be inconsistent. Often we are given more information than we need and in some cases we have less than we need. Arguments of function and formats of outputs can vary a lot, and we sometimes need to look in different parts of the output to see specific statistics that we seek.

The tidymodels meta-package is a collection of packages which collectively apply the principles of tidy data to the construction of statistical models. More information and learning resources on tidymodels can be found here. …


Stay up to date with these ten simple examples using a dataset of cute penguins

Whether or not you are a fan of the tidyverse, there is no doubt that this collection of R packages offers some neat and attractive ways of wrangling data that is often very intuitive to users. In the earlier versions of tidyverse packages, some elements of user control of output were sacrificed in favor of simpler functions which could be picked up and easily used by newbies. In recent updates to dplyr and tidyr, there has been significant progress to restoring some of this control.

This means that there are new functions and methods available in the tidyverse that you…


Communicating your work and results with laypeople can be a challenge — here are five things to watch out for

For many data scientists, often the coding and the analytics are the easy part. The challenge comes when you have to communicate the results of your work to non-data scientists. In many cases those individuals are clients or customers, or they hold positional superiority in the organization. This means that it’s important to get the communication right. If they leave the room or Zoom with the wrong conclusions, or just plain confused, you risk all your previous work being for nothing.

The goals of any communication of your work or results should be threefold:

  1. Ensure a common understanding of the…


Medium’s native code display capability is basically hopeless, so here’s what you should do…

Those of you who read my stuff will know that I publish a lot of code. And for a long time I made do with Medium’s native code blocks. But let’s face it — they are awful and the platform really needs to upgrade to be more coder friendly. Writing code in Medium code blocks is frustrating for the writer because there is no decent formatting, alignment or left-right scrolling capabilities. It’s also a pain for readers because it just looks like random strings of characters in a slightly different font.

One thing I do NOT recommend is to solve…


These examples show why R is now the go to language for intuitive data manipulation

It struck me recently through collaborating with a number of other users of the tidyverse that there are many people who are not aware of all the things that this collection of packages offers them to help with their day to day data wrangling. In particular, two critical packages have had major updates in the past year, and have introduced new features which I regard as transformative — allowing users to step up a gear in the control of their data and in the efficiency of their code.

In late 2019, tidyr 1.0.0 was released. Of many updates, the key…


I first heard of Learning Through Play when I sent my kids to pre-school, but now I realize it’s how all Data Scientists should learn

If there is one thing that really annoys me nowadays, it’s when people look at something I am working on and ask me, often in quite ‘holier than thou’ tones: What is your use case? What is the problem you are trying to solve? As a trained McKinsey consultant, nobody knows better than I do the principle of having to define your problem up front, laying out a use case for the work you are doing.

But if you are learning data science, I think you should throw that principle out the window and take up a new one: Learning…


Each character has their own mini-network, but what does it look like?

In this article I will use the community detection capabilities in the igraph package in R to show how to detect communities in a network. By the end of the article we will able to see how the Louvain community detection algorithm breaks up the Friends characters into distinct communities (ignoring the obvious community of the six main characters), and if you are a fan of the show you can decide if this analysis makes sense to you.

Note to reader: if you are finding it difficult to follow code in the Medium formatting environment, you can also rethis in…

Keith McNulty

Analytics leader at McKinsey. I am interested in Mathematics disciplines and People disciplines. Find me on LinkedIn or Twitter or at my blog drkeithmcnulty.com

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store