1 Introduction
In the last session, we have gone through the basis of R. Instead of continuing to learn more about R programming, in this session we are going to jump directly to rendering plots.
We make this choice for three reasons:
- Rendering nice plots is direclty rewarding
- You will be able to apply what you learn in this session to your own data (given that they are correctly formated)
- We will come back to R programming later, when you have all the necessary tools to visualize your results
The objectives of this session will be to:
- Create basic plot with the
ggplot2library - Understand the
tibbletype - Learn the different aesthetics in R plots
- Compose complex graphics
1.1 Tidyverse
The tidyverse is a collection of R packages designed for data science that include ggplot2.
All packages share an underlying design philosophy, grammar, and data structures (plus the same shape of logo).
tidyverse is a meta library, which can be long to install with the following command:
install.packages("tidyverse")Luckily for your tidyverse is preinstalled on your Rstudio server. So you just have to load the library
library("tidyverse")── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
✓ ggplot2 3.3.5 ✓ purrr 0.3.4
✓ tibble 3.1.3 ✓ dplyr 1.0.7
✓ tidyr 1.1.3 ✓ stringr 1.4.0
✓ readr 2.0.1 ✓ forcats 0.5.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
1.2 Toy data set mpg
This dataset contains a subset of the fuel economy data that the EPA makes available on fueleconomy.gov. It contains only models which had a new release every year between 1999 and 2008.
You can use the ? command to know more about this dataset.
?mpgBut instead of using a dataset included in a R package, you may want to be able to use any dataset with the same format.
For that we are going to use the command read_csv which is able to read a csv file.
This command also work for file URL
new_mpg <- read_csv(
"http://perso.ens-lyon.fr/laurent.modolo/R/mpg.csv"
)You can check the number of line and column of the data with dim:
dim(new_mpg)[1] 42551 12
To visualize the data in Rstudio you can use the command View
View(new_mpg)Or by simply calling the variable.
Like for simple data type calling a variable print it.
But complex data type like new_mpg can use complex print function.
new_mpg# A tibble: 42,551 × 12
id make model year class trans drive cyl displ fuel hwy cty
<dbl> <chr> <chr> <dbl> <chr> <chr> <chr> <dbl> <dbl> <chr> <dbl> <dbl>
1 13309 Acura 2.2CL/3.0CL 1997 Comp… Auto… Fron… 4 2.2 Regu… 26 20
2 13310 Acura 2.2CL/3.0CL 1997 Comp… Manu… Fron… 4 2.2 Regu… 28 22
3 13311 Acura 2.2CL/3.0CL 1997 Comp… Auto… Fron… 6 3 Regu… 26 18
4 14038 Acura 2.3CL/3.0CL 1998 Comp… Auto… Fron… 4 2.3 Regu… 27 19
5 14039 Acura 2.3CL/3.0CL 1998 Comp… Manu… Fron… 4 2.3 Regu… 29 21
6 14040 Acura 2.3CL/3.0CL 1998 Comp… Auto… Fron… 6 3 Regu… 26 17
7 14834 Acura 2.3CL/3.0CL 1999 Comp… Auto… Fron… 4 2.3 Regu… 27 20
8 14835 Acura 2.3CL/3.0CL 1999 Comp… Manu… Fron… 4 2.3 Regu… 29 21
9 14836 Acura 2.3CL/3.0CL 1999 Comp… Auto… Fron… 6 3 Regu… 26 17
10 11789 Acura 2.5TL 1995 Comp… Auto… Fron… 5 2.5 Regu… 23 18
# … with 42,541 more rows
Here we can see that new_mpg is a tibble we will come back to tibble later.
1.3 New script
Like in the last session, instead of typing your commands direclty in the console, you are going to write them in an R script.

2 First plot with ggplot2
We are going to make the simpliest plot possible to study the relationship between two variables: the scatterplot.
The following command generate a plot between engine size displ and fuel efficiency hwy present the new_mpg tibble.
ggplot(data = new_mpg) +
geom_point(mapping = aes(x = displ, y = hwy))
Are cars with bigger engines less fuel efficient ?
ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
ggplot(data = <DATA>) +
<GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))
- you begin a plot with the function
ggplot() - you complete your graph by adding one or more layers
geom_point()adds a layer with a scatterplot- each geom function in
ggplot2takes amappingargument - the
mappingargument is always paired withaes()
What happend when you use only the command ggplot(data = mpg) ?
Make a scatterplot of hwy ( fuel efficiency ) vs. cyl ( number of cylinders ).
Solution
ggplot(data = new_mpg, mapping = aes(x = hwy, y = cyl)) +
geom_point()
What seems to be the problem ?
3 Aesthetic mappings
ggplot2 will automatically assign a unique level of the aesthetic (here a unique color) to each unique value of the variable, a process known as scaling. ggplot2 will also add a legend that explains which levels correspond to which values.
Try the following aesthetic:
sizealphashape
3.1 color mapping
ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy, color = class)) +
geom_point()
3.2 size mapping
ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy, size = class)) +
geom_point()
3.3 alpha mapping
ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy, alpha = class)) +
geom_point()
3.4 shape mapping
ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy, shape = class)) +
geom_point()
You can also set the aesthetic properties of your geom manually. For example, we can make all of the points in our plot blue and squares:
ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(color = "blue", shape=0)
What’s gone wrong with this code? Why are the points not blue?
ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy, color = "blue")) +
geom_point()
Solution
ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(color = "blue")
3.5 mapping a continuous variable to a color.
You can also map continuous variable to a color
ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy, color = cyl)) +
geom_point()
What happens if you map an aesthetic to something other than a variable name, like color = displ < 5?
Solution
ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy, color = displ < 5)) +
geom_point()
4 Facets
You can create multiple plot at once by faceting. For this you can use the command facet_wrap.
This command take a formula as input.
We will come back to formulas in R later, for now, your have to know that formulas start with a ~ symbol.
To make a scatterplot of displ versus hwy per car class you can use the following code:
ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
facet_wrap(~class, nrow = 2)
Now try to facet your plot by fl + class
Solution
Formulas allow your to express complex relationship between variables in R !
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
facet_wrap(~ fl + class, nrow = 2)
5 Composition
There are different ways to represent the information :
ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy)) +
geom_point()
ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy)) +
geom_smooth()
We can add as many layers as we want
ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth()
We can make mapping layer specific
ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(mapping = aes(color = class)) +
geom_smooth()
We can use different data for different layer (You will lean more on filter() later)
ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(mapping = aes(color = class)) +
geom_smooth(data = filter(mpg, class == "subcompact"))
6 Challenge !
6.1 First challenge
Run this code in your head and predict what the output will look like. Then, run the code in R and check your predictions.
ggplot(data = new_mpg, mapping = aes(x = displ, y = hwy, color = drv)) +
geom_point(show.legend = FALSE) +
geom_smooth(se = FALSE)- What does
show.legend = FALSEdo? - What does the
seargument togeom_smooth()do?
6.2 Second challenge
How beeing a 2seater cars impact the engine size versus fuel efficiency relationship ?
Make a plot colorizing this information
Solution
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
geom_point(data = filter(mpg, class == "2seater"), color = "red")
Write a function called plot_color_2seater that can take as sol argument the variable mpg an plot the same graph.
Solution
plot_color_2seater <- function(mpg) {
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
geom_point(data = filter(mpg, class == "2seater"), color = "red")
}
plot_color_2seater(mpg)
6.3 Third challenge
Recreate the R code necessary to generate the following graph

Solution
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) +
geom_point() +
geom_smooth(mapping = aes(linetype = drv))