Why R should be part of your marketing toolbox

From pivot table-like functions and graphing to web-scraping, sentiment analysis and machine learning. And it’s free.

12 min readJul 29, 2018

The problem with a lot of modern marketing is that we are supposed to be making our decisions on insights from the huge amount of data that we generate. While making the decision from the insight can often be straightforward — campaign A lifted sales 112% more than campaign B, let’s roll campaign A out to the world — sometimes getting to that insight can be difficult; hidden as it can be inside increasingly corpulent datasets that go beyond just ‘eyeballing’ the numbers.

In this world of multi-channel, multi-variate data with multiple touch points across multiple devices, taking control of your data and turning it into something you can action can be difficult, particularly if you are an SME and your marketing department resources are limited at best. Using R and a few simple statistical and database-inspired tools, it doesn’t have to be a %>%* dream…

What is R?

Developed from the earlier S, R is a programming language for statistical computing. Now, that might sound like exactly the thing you don’t need in your marketing life, but bear with me.

R can be expanded with a huge library of packages: add-on bundles of features that can take care of a whole bunch of common tasks, and some quite esoteric ones as well.

Perhaps the best way to think of R, from a data analysis point of view, is like a spreadsheet, but one you interact with through a command line, rather than a mouse. Admittedly, that does mean there is a bit of a learning curve, but believe me, as someone who came to R relatively late and is still learning, the initial few days might be hard work, but if you use it for a couple of hours a day for a few weeks, you won’t believe what you’ll be able to code at the end of that time.

Oh, and the thing that makes it risk-free to give it a go: it’s free.

Having said that, in some of my experiences of the commercial world, being free can often be a detriment. I’ve seen too many places committed to paid-for solutions; there often seems to be a culture of distrust for things that are free, and ‘open-source’ seems to be a term that strikes fear into the heart of many businesses. Hopefully the data science era — driven by R and Python — will change that culture.

RStudio: the R IDE

If you just download R, you can get to work straight away in its GUI, but if you grab RStudio as well, things can get a lot easier, a lot more organised and nicer to look at as well.

RStudio is an interactive developer environment for R. While that sounds fancy, it’s really just a nicer and more manageable way of working with R. RStudio takes all of R’s functionality, but wraps it in a more, dare I say, commercial-looking package. And it’s perfect for organising all your code, your projects, handling version control by integrating with Git / Github, preparing reports and presentations, and even more.

And, by the way, it’s free too.

The statistics

There’s no escaping it, if you want to do some analytics on your data, and you want to do it properly, you’re going to need to know some statistics. R makes it ridiculously easy to perform a lot of the statistical tests you’ll need, but you will need to know which test to use, when to use it, and how to interpret the results.

Once again though, to start to take advantage of your data, knowing just two or three tests will see you well on the way, and you’ll still be in a better place than if you simply eyeball graphs in Google Analytics.

How can R benefit my marketing?

Let me count the ways… Actually, there are so many, I’ll just list a few. This is neither an exhaustive list, nor a thorough ‘how-to’ guide, just a bit of an overview of some of the ways I used R in my job as a data-driven marketer.

While this article [clearly] is not a ‘how-to’ guide, it doesn’t need to be. As the software is free, lots, lots and lots more people use it, and contribute to the community via their own blogs. If you read this list and there is something you want to learn how to do, I’m pretty confident you can find a step-by-step guide fairly quickly. Between the bloggers, the R Documentation and the question-answerers on StackOverflow, you shouldn’t be short of information.

Okay, I’ve made the point that R is useful in marketing, but perhaps some examples might help. Without further ado:

Correlation and regression analysis

Want to see how footfall, PPC spend, social media posts etc. correlate with your sales? No problem, R’s very clever plot function can see to that. Give it a table with your input variables of interest and your revenue, and just feed it into plot. It really is just as simple as

plot(table_name)

and plot , clever as it is, will know that a scatterplot is the most appropriate thing to make, and give you a matrix of every column plotted against every other one, making it easy to see what might be going up as your revenue does.

Of course, you’re into data now, so eyeballing things isn’t how you roll. Let’s look at that with a proper, actual linear regression model. Let’s say your footfall correlates with revenue, but how strong is that correlation? How many more people do you need through the door to put £84.70 in the till? Easy. Let’s build a linear model called lin_mod:

lin_mod <- lm(revenue ~ footfall, data = table_name)

Want to know what’s going on with that model and get the p-value to find out if that correlation is statistically significant?

summary(lin_mod)

What about predicting the footfall you need to put that £84.70 in the till? The perfectly named predict function covers that for you. And if you’re ready for multiple regression and want to add in the effect of how loud you play your music in the store?

lin_mod <- lm(revenue ~ footfall + volume_van_halen, data = table_name)

Barely any code, potentially lots of insight.

Customer churn

Back when I worked in infectious disease research, survival analysis was just that: for looking at survival. The Kaplan-Meier survival curve was a staple of my journal reading. Want to compare survival post-infection between vaccinated and unvaccinated subjects? The Kaplan-Meier survival analysis was what you did.

These days though, you can use exactly the same test to look at whether particular groups of customers are likely to churn. The survival package provides a convenient way of performing survival analysis, allowing you to get a feel for how customer lifetime value may differ between various customer groups.

Put the pivot tables away

If you do anything more than a little bit of basic analysis in Excel, you may well be familiar with pivot tables. If you have your data in a sprawling, raw form, pivot tables can quickly allow you to group your data by date, sales-person, store etc. They’re quick and easy and very powerful. And they’re also something you can do very quickly in R using the dplyr package.

With its functions such as group_by and summarise, it’s a piece of cake to, well, group and summarise your data by whatever criteria you see fit, producing speedy summary reports and, if you need, you can use the mutate function to create new columns based on your summarised data. What’s the advantage in this method over pivot tables? We’ll get to that when we talk about quickly making reports…

Experiments and split testing

As a marketer, you probably spend a good amount of time A/B testing. Whether that’s email subject lines, landing page conversion rates or PPC campaigns, experimental design and analysis is part of the job these days.

R makes it straightforward to analyse the results of your experiments with an appropriate statistical rigour. From Chi-squared and Kruskal-Wallis tests to t-tests and analysis of variance and checking your data for normality and much more besides, R has all the tools you need to let you know how confident you can be in your results.

The ubiquitous and obligatory machine learning

Everything seems to be machine learning now, from optimising your display campaign target audience to putting your elbow in a blob of ice cream**, someone has probably built an AI solution to help you do it better. And don’t they all make it sound very technical?

While a lot of the underlying statistics is very technical, and developing a machine learning product to roll out for real-time use at scale is very technical, actually just making one to solve a specific problem in your day to day work is incredibly easy. You can do it in a line of code, although doing it thoroughly, tuning, testing, tuning again, brings us back to the state of ‘becoming increasingly technical’.

Let’s imagine you want to know increase your amount of repeat business from new customers. One way to do this might be to give new customers a voucher. However, some of those new customers might shop again anyway, so giving them the voucher is giving away margin unnecessarily. What you want to know is: given a number of criteria about the customer and the first transaction, how likely are they to shop again? Once you know that, you can make an informed decision about whether or not you need to incentive their return.

Say you have a historical set of data in which you have captured 6 pieces of information, A to F, about each customer and transaction (online vs in-store, value of transaction, items in transaction, customer gender, age etc.) and whether or not that customer bought again in the next year.

Using very similar code to how we built our regression model, you can build and train a machine learning model, using all sorts of techniques such as classification trees, logistic regression, random forests and myriad more algorithms, that you can use to build predictions on new customers, and work out to whom you should send a voucher.

Doing things with lists

Do you ever have to work with VLOOKUPS in Excel? They’re just a bit awkward, and then when you start nesting them inside IFERROR functions and things, they just don’t look that pretty. The dplyr package that we met when we were talking about pivot tables offers some database functions that you might be familiar with if you know some SQL. These can make cross-referencing lists a piece of cake.

As part of that whole GDPR thing, I had two lists I had to check: one really very large email list, and a smaller list of people from the larger list who had given us a further opt-in. I needed to remove those people from the first list, so that the remaining list could be unsubscribed from the email database. The Excel solution involved just one of those VLOOKUPs inside an IFERROR with an IFELSE involved as well, followed by a filtering stage. Let’s take a quick look at how we can do that the R and dplyr way with a quick example.

We’ll start with two lists of email addresses:

list_a: ray@doors.com, john@doors.com, robbie@doors.com

list_b: square@pusher.com, ray@doors.com, muddy@waters.com, john@doors.com, venetian@snares.com, robbie@doors.com, sarah@vaughan.com, andy@lamb.com

And let’s say we want to get the email addresses that are common between the lists:

inner_join(list_a, list_b)     email_address
1    ray@doors.com
2   john@doors.com
3 robbie@doors.com

Or if we want the addresses in list_b that don’t match to list_a:

anti_join(list_b, list_a)        email_address
1   square@pusher.com
2     muddy@waters.com
3 venetian@snares.com
4    sarah@vaughan.com
5       andy@lamb.com

And there are your lists ready to export as a .csv file to do with what it is you need to do with them. And with code that can often be a lot simpler than Excel.

Sentiment analysis

Want to get a feeling for how people are talking about you and perhaps how those feelings are changing over time? Sentiment analysis might be for you. Using the tidytext and dplyr packages, and not too many lines of code, you can take your customer reviews, social media mentions, live chat transcripts or any other source of text, and look to see whether you’re being spoken of positively or negatively.

Competitor pricing

In retail, keeping track of your competitors’ prices can be more than slightly useful, particularly if you are selling branded products where exactly the same item is available from multiple retailers. Yes, we’re all told, repeatedly, that it’s about the ‘value proposition’ and what you offer beyond the goods rather than it being about price, because that makes us feel good, but looking at the current state of retail, it seems as though it’s generally not.

There are a lot of commercially available platforms you can use to harvest data from your competitors’ websites and let you know when you’re too expensive (potentially losing sales) or too cheap (giving away margin perhaps unnecessarily), but they can often be beyond the budget of a small business. Why not build one yourself?

Using the rvest package, you can build yourself a tool to check pricing for particular products on your competitors’ websites, pull those into a table along with your own prices, and check to see where you’re looking less competitive than you might want.

Social media benchmarking

Okay, so in this whole post-Facebook / Cambridge Analytica world, this is something that is changing, but hopefully common sense will be restored. While you will need to create developer accounts and applications in the sites of interest, packages are available to allow to to query Facebook and Twitter via their APIs through R.

With a bit of crafty coding, you can build a script that takes a list of profiles, fetches the information, and reports tweets, retweets, engagement and lets you see how you compare with your competitors. You could always combine this with sentiment analysis and get some interesting insight into how you’re being talked about.

Streamline your reporting

Do you have to prepare regular reports? You know the ones I mean, the ones that are fundamentally the same each week or month, but that involve exporting the data from AdWords or Analytics, loading it into Excel, preparing the same usual graphs, pasting those into Word and then writing about them?

If that sounds like your life, you should take a look at R Markdown documents. R Markdown provides a convenient method of bringing formatted text and R code together in one document. And it’s great for report writing.

This brings us back to why R and dplyr make a great alternative to pivot tables. Let’s say you have to prepare a weekly web analytics report. You report on the same KPIs each week, calculating some summary statistics, drawing some graphs and writing about the key bullet-point findings.

In R Markdown, you can quickly set up a template document, with the first step being the importing of your dataset — perhaps a .csv file that you exported from a custom report in Google Analytics. Each week, all you need to do is change the file name and hit run, and R Markdown can work through all its pre-coded sections, producing the required summary statistics, tables and graphs. All you need to do is fill in the text. No more creating pivot tables, drawing graphs, amending pivot table, drawing graphs, copying and pasting into Word…

Or, if you want some output in the style of a dashboard, then you can spend some time with the flexdashboard package. Additionally, if you want to flex your ninja skills, get the data out of Analytics via R, and get around that whole custom report and export stage.

The take home messages

This bit seems to be popular: I see it a lot so who I am to buck the trend? Let’s hit the bullet points:

R offers a vast range of tools that can be applied to myriad marketing tasks, chores and functions
If you have ever written a formula in Excel, you’ve already done some coding, so don’t let a command line-based interface put you off!
R and RStudio are free to download and use, so you can give them a try without having to get the credit card out or raise a PO, or justify them to anyone
Yes, there is a learning curve and the first few days might be a bit challenging, but it’s not as bad as you might think…
In my experience, when you discover a new set of tools, you find ways to use them, just for fun, and often from the playground comes those unexpected findings…

Next steps

Where to go next with R? Well, what about creating your presentations in R? Or turning some of your workflows into interactive web applications using Shiny? There’s plenty of scope to get creative — it is a programming language after all — so you might find you’re just limited by your imagination. And your time of course. After all, there are still tweets to schedule and BingAds campaigns to set up.

If this article has piqued your curiosity, the best thing to do is download, install, and have a look. There are plenty of datasets built-in, included with packages or freely available if you want something to practise with.

As the wide-world of marketing continues to produce bigger and bigger datasets, if you’re in a position to take advantage of those, you’re in a good place to gain some competitive advantage. You just need the tools, a bit of knowledge, and a spark of inspiration to come up with that game-changing bit of insight.

Follow Chris on twitter

*dplyr joke; **Blackadder reference