If your experience is anything like mine, you’ve probably heard numerous people talk about ‘statistical power’ in conversations at work. I’m pretty sure that for the most part these people are pushing for larger sample sizes on the basis of some vague conception that a larger n is always good.
But how many of these people can actually define what statistical power is? In this article I want to take a look at the concept and definition of statistical power and identify where it is useful as a measure.
The term ‘statistical power’ only has meaning when it is referring…
In this article I want to demonstrate how R Markdown is a nice setting for coding your project in both R and Python, allowing you to code some elements of your project in each language and manipulate objects created in one language using another language.
This could be valuable to you for a number of reasons:
Though I code in both R and Python, R Markdown is my only route for writing reports, blogs or books. It is incredibly flexible, has many beautiful design options and supports many output formats really nicely.
If you have never worked in R Markdown, I highly recommend it. If you have worked in it before, here are ten little tricks I’ve learned which have served me well in numerous projects, and which highlight how flexible it is.
So you write a lovely R Markdown document where you’ve analyzed a whole bunch of facts about dogs. And then you get told…
tidymodels
The output of many models in R can be inconsistent. Often we are given more information than we need and in some cases we have less than we need. Arguments of function and formats of outputs can vary a lot, and we sometimes need to look in different parts of the output to see specific statistics that we seek.
The tidymodels
meta-package is a collection of packages which collectively apply the principles of tidy data to the construction of statistical models. More information and learning resources on tidymodels
can be found here. …
Whether or not you are a fan of the tidyverse, there is no doubt that this collection of R packages offers some neat and attractive ways of wrangling data that is often very intuitive to users. In the earlier versions of tidyverse packages, some elements of user control of output were sacrificed in favor of simpler functions which could be picked up and easily used by newbies. In recent updates to dplyr
and tidyr
, there has been significant progress to restoring some of this control.
This means that there are new functions and methods available in the tidyverse that you…
For many data scientists, often the coding and the analytics are the easy part. The challenge comes when you have to communicate the results of your work to non-data scientists. In many cases those individuals are clients or customers, or they hold positional superiority in the organization. This means that it’s important to get the communication right. If they leave the room or Zoom with the wrong conclusions, or just plain confused, you risk all your previous work being for nothing.
The goals of any communication of your work or results should be threefold:
Those of you who read my stuff will know that I publish a lot of code. And for a long time I made do with Medium’s native code blocks. But let’s face it — they are awful and the platform really needs to upgrade to be more coder friendly. Writing code in Medium code blocks is frustrating for the writer because there is no decent formatting, alignment or left-right scrolling capabilities. It’s also a pain for readers because it just looks like random strings of characters in a slightly different font.
One thing I do NOT recommend is to solve…
It struck me recently through collaborating with a number of other users of the tidyverse that there are many people who are not aware of all the things that this collection of packages offers them to help with their day to day data wrangling. In particular, two critical packages have had major updates in the past year, and have introduced new features which I regard as transformative — allowing users to step up a gear in the control of their data and in the efficiency of their code.
In late 2019, tidyr 1.0.0
was released. Of many updates, the key…
If there is one thing that really annoys me nowadays, it’s when people look at something I am working on and ask me, often in quite ‘holier than thou’ tones: What is your use case? What is the problem you are trying to solve? As a trained McKinsey consultant, nobody knows better than I do the principle of having to define your problem up front, laying out a use case for the work you are doing.
But if you are learning data science, I think you should throw that principle out the window and take up a new one: Learning…
In this article I will use the community detection capabilities in the igraph
package in R to show how to detect communities in a network. By the end of the article we will able to see how the Louvain community detection algorithm breaks up the Friends characters into distinct communities (ignoring the obvious community of the six main characters), and if you are a fan of the show you can decide if this analysis makes sense to you.
Note to reader: if you are finding it difficult to follow code in the Medium formatting environment, you can also rethis in…
Analytics leader at McKinsey. I am interested in Mathematics disciplines and People disciplines. Find me on LinkedIn or Twitter or at my blog drkeithmcnulty.com