# ggplot , dplyr, pipes: Few tips and tricks

### Dotplots : diamonds data

Before doing any kind of plots, ensure that your R has “tidyverse” installed. Use below lines to get that done and call library

`install.packages("tidyverse")library(tidyverse)`

Using diamonds data (part of tidyverse package), plot the price vs. carat and color the points according to cut. You may want to nuse `sample_n()` to get a subsample :

``ggplot (diamonds %>% sample_n(1000), aes (x = price, y = carat, color = cut)) +  geom_point ()``

Now plot all datapoints. To make it look better, try setting a small point size and transparency (alpha) value :

``# we use "alpha" for transparency and "size" for size of the point``
``ggplot (diamonds , aes (x = price, y = carat, color = cut)) +  geom_point (alpha = 0.1, size = 0.1)``

### Economics data

Using economics data, plot population saving rate (`psavert`) over time :

``ggplot (economics , aes (x = date, y = psavert)) +  geom_line(size = 0.3)``

Plot median unemployment duration (`uempmed`) versus unemployment rate (`unemploy`) over time. Normalize both variables so that they are both visible on the plot. (Hint : One easy scheme would be to normalize with respect to period mean) : something like `var` / `mean (var)`

``# Without normalization, if we simply plot the two separate lines, graph looks like this. Notice the uempmed line is so close to x-axis that this creates visibility issue and give a skewed graph``
``ggplot (economics , aes (x = date)) +  geom_line ( aes (y = uempmed)) +  geom_line ( aes (y = unemploy))``
``# With normalisation: Essentially here what I did is created two new variables using dplyr's `mutate` package. Both these variables ( "duration" & "norm_unemployed" ) are created from economics data by taking data using pipes and then using mutate to define them as ratio. #Note that since date is going to be a common thing for both the lines, I kept date in ggplot() function itself , and then created two separate geom_lines by taking only the respective y axis and just adding a color as well to differentiate them. ggplot (economics %>% mutate ( duration = uempmed / mean(uempmed), norm_unemployed = unemploy / mean(unemploy)), aes (x = date)) +  geom_line ( aes (y = duration), color = "gold4") +  geom_line ( aes (y = norm_unemployed), color = "purple")``

Plot median duration of unemployment (uempmed) vs. number of unemployed people (unemploy) using different aesthetic for marking time

``# Comment``
``ggplot (economics, aes ( x = uempmed , y = unemploy, color = date))+  geom_point()``

### Smoothers & Colors

Using diamonds data, plot “carat” vs. “price” across cuts using RcolorBrewer palette. For clarity purpose taken subsample of 1000:

``ggplot (diamonds %>% sample_n(1000), aes (x = carat, y = price))+  geom_point (aes(color = cut))+  scale_color_brewer()``

Using diamonds data, plot “carat” vs. “price” and add smoothers across cuts. Use an RcolorBrewer palette and try tinkering with the aesthetics to get a nice figure

``ggplot (diamonds, aes (x = carat, y = price, color = cut))+  geom_point (size = 0.3)+  scale_color_brewer(palette = "Spectral")+  geom_smooth(method = "loess", se = FALSE)+  ggtitle('Diamonds data set: "Spectral" Color palette + Smooth method "loess" ')``
``ggplot (diamonds, aes (x = carat, y = price, color = cut))+  geom_point (size = 0.3)+  scale_color_brewer(palette = "Spectral")+  geom_smooth(method = "lm", se = FALSE)+  ggtitle('Diamonds data set: "Spectral" Color palette + Smooth method "lm" ')``
``ggplot (diamonds, aes (x = carat, y = price, color = cut))+  geom_point (size = 0.3)+  scale_color_brewer()+  geom_smooth(method = "loess", se = FALSE)+  ggtitle('Diamonds data set: Default Color palette + Smooth method "loess" ')``
