# 10 Levels of ggplot2: From Basic to Beautiful

Dec 13, 2019 · 9 min read

Recently I discovered WIRED’s “Levels” series on YouTube. The concept is simple. An expert of some interesting skill (i.e., ice sculpture, origami, or knife making) explains the concept in many levels — from easy to complex.

This format is a wonderful way to explore many different skills.

One skill that is essential for anyone who works with data to learn is how to build a graph that tells a story. The `ggplot2` package is one of the best tools to do that. In this article I’ll explore how to build a graph with `ggplot2` from basic to beautiful.

# Data To Explore

Also, as we’re exploring the data, you can follow along with all of the code for it here:

# Level 0:A Basic Plot

• Data (which we see in the `tickets` object)
• Mapping (which we see in the `aes(x = violation_desc)` parameter)
• Geometry (which we see from `geom_bar`)

For more details about how this works, check out this article. Our code block ends up looking like:

`tickets %>%   ggplot(aes(x = violation_desc)) +  geom_bar()`

Here’s what we get from this: hot garbage. There’s good news, though. We can only get better from here!

# Level 1: Data Cleaning

The Level 0 graph has 95 distinct values on the x-axis. This is far too many for a person to be able to interpret. Trimming the number of distinct values to 10 will dramatically improve the interpretability of the graph. We can start doing this by inspecting the distinct values. Here’s a sample of the values with their counts:

` 1 METER EXPIRED CC     281060 2 METER EXPIRED        181329 3 OVER TIME LIMIT      156859 4 EXPIRED INSPECTION   138575 5 STOP PROHIBITED CC   115898 6 STOPPING PROHIBITED   47395 7 PARKING PROHBITED     47232 8 PARKING PROHBITED CC  45082 9 OVER TIME LIMIT CC    2458510 PASSENGR LOADNG ZONE  24359`

Immediately we start to see how to improve the number of distinct values that we have. First, for our purposes, we can remove any of the `CC` suffixes. We notice that there are some words that are misspelled like `PROHBITED`. There are also words that are similar to each other like `STOP` and `STOPPING`. Using the `stringr` package’s `str_replace` function can help us make those changes.

Once we make all of these changes, we’re still left with 81 distinct values. A great method to fix this is to use the `fct_lump` function from the `forcats` package. The function is simple enough — it allows you to decide how many distinct values you’d like to keep based upon some other value. Everything else is then designated as “Other”. Check those steps out here.

After implementing these changes we’re almost able to read all of the words on the x-axis. There are a bunch of ways to take care of that. We’ll get to it at Level 4.

# Level 4: fct_reorder + coord_flip()

The `coord_flip` function can then be included as part of our `ggplot2` code. This function also does what it says: it flips the axes so that our x-axis becomes our y-axis and vice versa. Our code ends up looking like this. After implementing it we end up with our first passable graph.

This would be a great graph to use internally or as part of a draft. I would want to make more changes if it were being shared externally.

# Level 5: Window Dressing

1. Change the titles of the axes
2. Add a title for the graph

We can take care of the first two issues by adding the `labs` function at the end of our `ggplot2` code. Any of the titles on the plot (`title`, `subtitle`, `x-axis`, `y-axis`, `caption`, `legends`, etc.) can be altered in this way.

To add color, we need to go to our `geom_col` function. We can pass an additional parameter to it called `fill`. We’re able to designate the fill color in with this parameter in several ways — for example, as a name (`“red”`) or a hex code (`“#E83536”`). Now we have a graph that is very presentable. We’re well past “basic” and on our way to “beautiful”!

# Level 6: Build Your Own Theme

At this point, we have something that looks like it could be our own theme. If we’re making multiple plots, it would be worthwhile to officially build our own theme. Here’s what that could look like:

`theme_compassred <- function () {   theme_minimal(base_size = 10, base_family = "Roboto") %+replace%     theme(axis.title = element_text(face = "bold"),          axis.text = element_text(face = "italic"),          plot.title = element_text(face = "bold",                                    size = 12)              )}`

Even further, we can set this theme for every plot automatically:

`theme_set(theme_compassred())`

# Level 7: Layer Multiple Geometries

When we have multiple `geom_` functions as part of our code, it is possible for the geometries to either use the same aesthetic mapping or to have different aesthetic mappings. In this case, `geom_label` can use the same mappings for its `x` and `y` coordinates, but requires an additional mapping called `label`. There are also several parameters that we must pass to `geom_label` in order to properly place the labels on the graph. Here’s the code and the graph:

# Level 8: Highlight a Key Field

One great way to accomplish this is by highlighting a key field. We can do so with four small changes:

• Transform the data by creating a new field called `highlight`
• Move the `fill` to inside the aesthetic mapping and set it to `highlight`
• Change colors with `scale_fill_manual`
• Remove the legend with `guides`

After those small code changes, we end up with:

# Level 9: Annotate

At a high level, here is how our code is reorganized:

`# Build an aggregated data frame of all of the ticketstickets_agg <-    mutate(...)# Create the note that we'd like to shareticket_note <-   paste("Your text here")# Create a dataframe to position your arrowarrow_position <-   data.frame(...)# Build your graphggplot() +   geom_col(data = tickets_agg,            mapping = aes(x    = field,                          y    = count,                          fill = highlight)) +   geom_label(data = tickets_agg,              mapping = aes(x     = field,                            y     = count,                            label = count)) +   geom_label(data = filter(tickets_agg,                            highlight = T),              mapping = aes(x     = field,                            y     = count,                            label = ticket_note)) +   geom_curve(data = arrow_position,              mapping = aes(x    = x_start,                            y    = y_start,                            xend = x_end,                            yend = y_end))`

In Level 7 we mentioned that when there are multiple `geom_` functions in our code that sometimes we need to give the different `geom_` functions different data and mappings. We take full advantage of this here.

# Level 10: Tidy It Up

• We don’t need all of the grid lines. Having labels for all of the bars makes the grid lines redundant.
• Similarly, we don’t need the values of our x-axis. These are also redundant because of the labels.
• We don’t need either of our axis labels. The title makes it very clear what each axis represents. They end up just being noise.

Putting all of this together, we end up with our final, Level 10 graph.

# Putting it All Together

And here’s how our final block of code ends up looking:

Written by

## CompassRed Data Blog

#### We live for data and analytics.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just \$5/month. Upgrade