The word ‘storytelling’ has become more prevalent and popular in analytics circles recently. As I read various articles and advice on ‘storytelling’, I see a dangerous trend that encourages individuals away from a research-based approach and runs the risk of major organizations taking erroneous decisions based on a glossy but often inaccurate version of the facts.
I am not saying that how you present your results is not important. I regularly consider the most compelling ways of presenting results, with some formats and visualizations clearly more intuitive and understandable than others.
But when data professionals have been told that they…
There’s no doubt that Python has the widest range of ML algorithms of all the programming languages and in general Python is my first port of call if I am intending to do any form of predictive modeling. That said, I much prefer R for the tidying and preparation of data and would love to be able to import Python algorithms into R so I can have the best of both worlds. So I recently decided to see if I could easily run a Python ML algorithm in R. I opted to try a k-fold cross-validated XGBoost model.
Lately I’ve found myself doing more projects using both R and Python together. It has become more important to me to use the best tools for the job and not be constrained by a single language. There are things that Python does best and there are things that R does best, so if we can use both when we need to, we can produce our best. …
Many data scientists in business reporting settings are asked to pipe their data into Powerpoint presentations (whether they like it or not!). Most commonly these are numerous parameterized presentations where the format, charts and content are the same but the data changes — for example by organization unit. Frequently there can be a need to generate tens, hundreds or even thousands of such reports from a specified example template. This tutorial gives an example of how to code up and automate such a workflow.
All the code for this tutorial can be found in my Github repo, which is referenced…
Since last year’s release of dplyr 1.0.0 I’ve really enjoyed experimenting with what is possible with this seminal R package, and lately I’ve been considering how dplyr can be used to run any sort of function against inputs from a data frame. This means that you can use dplyr to perform as many actions as there are rows in your dataframe in a single (often very simple) command.
In this article I will show you how to use this concept to do the following using a single piped command in dplyr:
Hypothesis testing is one of the most fundamental elements of inferential statistics. In modern languages like Python and R, these tests are easy to conduct — often with a single line of code. But it never fails to puzzle me how few people use them or understand how they work. In this article I want to use an example to show three common hypothesis tests and how they work under the hood, as well as showing how to run them in R and Python and to understand the results.
Hypothesis testing exists because it is almost never the case that…
If your experience is anything like mine, you’ve probably heard numerous people talk about ‘statistical power’ in conversations at work. I’m pretty sure that for the most part these people are pushing for larger sample sizes on the basis of some vague conception that a larger n is always good.
But how many of these people can actually define what statistical power is? In this article I want to take a look at the concept and definition of statistical power and identify where it is useful as a measure.
The term ‘statistical power’ only has meaning when it is referring…
In this article I want to demonstrate how R Markdown is a nice setting for coding your project in both R and Python, allowing you to code some elements of your project in each language and manipulate objects created in one language using another language.
This could be valuable to you for a number of reasons:
Though I code in both R and Python, R Markdown is my only route for writing reports, blogs or books. It is incredibly flexible, has many beautiful design options and supports many output formats really nicely.
If you have never worked in R Markdown, I highly recommend it. If you have worked in it before, here are ten little tricks I’ve learned which have served me well in numerous projects, and which highlight how flexible it is.
So you write a lovely R Markdown document where you’ve analyzed a whole bunch of facts about dogs. And then you get told…
The output of many models in R can be inconsistent. Often we are given more information than we need and in some cases we have less than we need. Arguments of function and formats of outputs can vary a lot, and we sometimes need to look in different parts of the output to see specific statistics that we seek.
tidymodels meta-package is a collection of packages which collectively apply the principles of tidy data to the construction of statistical models. More information and learning resources on
tidymodels can be found here. …