R.1: Introduction to R and RStudio

Laurent Modolo laurent.modolo@ens-lyon.fr, Hélène Polvèche hpolveche@istem.fr

2021

1 Introduction

The goal of this practical is to familiarize yourself with R and the RStudio environment.

The objectives of this session will be to:

  • Understand the purpose of each pane in RStudio
  • Do basic computation with R
  • Define variables and assign data to variables
  • Manage a workspace in R
  • Call functions
  • Manage packages
  • Be ready to write graphics !

1.2 Some R background

is a programming language and free software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing.

  • Created by Ross Ihaka and Robert Gentleman
  • initial version released in 1995
  • free and open-source implementation the S programming language
  • currently developed by the R Development Core Team.

Reasons to use it:

  • It’s open source, which means that we have access to every bit of underelying computer code to prove that our results are corrects (which is always a good point in science)

  • It’s free, well documented, and runs almost everywhere

  • it has a large (and growing) user base among scientists

  • it has a large library of external packages available for performing diverse tasks.

  • 18082 available packages on https://cran.r-project.org/

  • 2041 available packages on http://www.bioconductor.org

  • >500k available repository using R on https://github.com/

1.3 How do I use R ?

Unlike other statistical software programs like Excel, SPSS, or Minitab that provide point-and-click interfaces, R is an interpreted language.

This means that you have to write instructions for R. Which means that you are going to learn to write code / program in R.

R is usually used in a terminal in which you can type or paste your R code:

But navigating between your terminal, your code and your plots can be tedious, this is why in 2021 there is a better way to do use R !

1.4 RStudio, the R Integrated development environment (IDE)

An IDE application provides comprehensive facilities to computer programmers for software development. Rstudio is free and open-source.

To open RStudio, you can install the RStudio application and open the app.

Otherwise you can use the link and the login details provided to you by email. The web version of Rstudio is the same as the application expect that you can open it any recent browser.

1.5 Rstudio interface

1.6 The same console as before (in Red box)

1.7 Errors, warnings, and messages

The R console is a textual interface, which means that you will enter code, but it also means that R is is going to write informations back to you and that you will have to pay attention at what is written.

There are 3 categories of messages that R can send you: Errors prefaced with Error in…, Warnings prefaced with Warning: and Messages which don’t start with either Error or Warning.

  • Errors, you must consider them as red light. You must figure out what is caussing it. Usually you can find usefull clue in the errors message about how to solve it.
  • Warning, warnings are yellow light. The code is running but you have to pay attention. It’s almost always a good idea to try to fix warnings.
  • Message are just frindly messages from R telling you how things are running.

2 R as a calculator

Now that we know what we should do and what to expect, we are going to try some basic R instructions. A computer can perform all the operations that a calculator can do, so let’s start with that:

  • Add: +
  • Divide: /
  • Multiply: *
  • Subtract: -
  • Exponents: ^ or **
  • Parentheses: (, )

Now Open RStudio. Write the commands in colors in a blue box in the terminal. The expected results will always be printed in white in a blue box.

You can copy-paste but I advise you to practice writing directly in the terminal. Like every langages you will become more familiar with R by using it.

To validate the line at the end of your command: press Return.

2.1 First commands

You should see a > character before a blinking cursor. The > is called a prompt. The prompt is chown when you can enter a new line of R code.

1 + 100

For classical output R will write the results with a [N] with N the row number. Here you have a one line results [1]

[1] 101

Do the same things but press (return) after typing +.

1 +

The console displays +.
The > can become a + in case of multi-lines code. As there are two side to the + opperator, R know that you still need to enter the right side of your formula. It is waiting for the next command. Write just 100 and press :

100
[1] 101

2.2 R keeps to the mathematical order

The order of opperation is the natural mathematical order in R:

3 + 5 * 2
[1] 13

You can use parenthesis ( ) to change this order

(3 + 5) * 2
[1] 16

But to much parenthesis can be hard to read

(3 + (5 * (2 ^ 2))) # hard to read
[1] 23
3 + 5 * (2 ^ 2)     # if you forget some rules, this might help
[1] 23

Note : The text following a # is a comment. It will not be interpreted by R. In the future, I advise you to use comments a lot to explain in your own words what the command means.

2.3 Scientific notation

For small of large numbers, R will automatically switch to scientific notation

2/10000
[1] 2e-04

2e-4 is shorthand for 2 * 10^(-4) You can use e to write your own scientific notation

5e3
[1] 5000

2.4 Mathematical functions

R is distributed with a large number of existing functions. To call mathematical function you must with function_name(<number>).

For example for the natural logarithm:

log(1)  # natural logarithm
[1] 0
log10(10) # base-10 logarithm
[1] 1
exp(0.5)
[1] 1.648721

Compute the factorial of 9 (9!)

9 * 8 * 7 * 6 * 5 * 4 * 3 * 2 * 1
[1] 362880

or

factorial(9)
[1] 362880

2.5 Comparing things

We have seen some examples that R can do all the things that a calculator can do. But when we are speaking of programming langage, we are thinking of writing computer programs. Programs are collections of instructions that performs specifics tasks. If we want our futur programs to be able to perform automatic choices, we need them to be able to perform comparisons.

Comparisons can be made with R. The result will return a TRUE or FALSE value (which is not a number as before but a boolean type).

Try the following opperator to get a TRUE then change your command to get a FALSE.

You can use the (upper arrow) key to edit the last command and go through your history of commands

  • equality (note two equal signs read as “is equal to”)
1 == 1
[1] TRUE
  • inequality (read as “is not equal to”)
1 != 2 
[1] TRUE
  • less than
1 < 2
[1] TRUE
  • less than or equal to
1 <= 1
[1] TRUE
  • greater than
1 > 0
[1] TRUE

Summary so far

  • R is a programming language and free software environment for statistical computing and graphics (free & opensource) with a large library of external packages available for performing diverse tasks.
  • RStudio is an IDR application that provides comprehensive facilities to computer programmers for software development.
  • R can be used as a calculator
  • R can perform comparisons

3 Variables and assignment

In addition to be able to perform a huge number of computation very fast, computers can also store information to memory. This is a mandatory function to load your data and store intermediate states in your analysis.

In R <- is the assignment operator (read as left member take right member value).

= also exists but is not recommended! It will be used preferentially in other cases. (We will see them later). If you realy don’t want to press two consecutive keys for assignement you can press alt + - to write <-. Rstudio provides lots of such shortcuts (you can display them by pressing alt + shift + k).

We assign a value to x, x is called a variable.

x <- 1/40

We can then ask R to display the value of x.

x
[1] 0.025

3.1 The environment

You now see the x value in the environment box (in red).

This variable is present in your work environment. You can use it to perform different mathematical applications.

log(x)
[1] -3.688879

You can assign another value to x.

x <- 100
log(x)
[1] 4.60517
x <- x + 1  # x become 101 (100 + 1)
y <- x * 2
y
[1] 202

A variable can be assigned a numeric value as well as a character value.

Just put our character (or string) between double quote " when you assign this value.

z <- "x"  # One character
z
[1] "x"
a <- "Hello world"  # Multiple characters == String
a
[1] "Hello world"

You cannot mix different types of variable together:

x + z

How to test the type of the variable?

is.character(z)
[1] TRUE
b <- 1/40
b
[1] 0.025
typeof(b)
[1] "double"

You can type is. and press tabulation. Rstudio will show you a list of function whose names start with is.. This is called autocompletion, don’t hesitate to spam your tabulation key as you write R code.

3.2 Variables names

Variable names can contain letters, numbers, underscores and periods.

They cannot start with a number nor contain spaces at all.

Different people use different conventions for long variable names, these include:

periods.between.words
underscores_between_words
camelCaseToSeparateWords

What you use is up to you, but be consistent.

Which of the following are valid R variable names?
min_height
max.height
_age
.mass
MaxLength
min-length
2widths
celsius2kelvin
Solution

min_height
max.height
.mass
MaxLength
celsius2kelvin

3.3 Functions are also variables

logarithm <- log

Try to use the logarithm variable.

A R function can have different arguments

function (x, base = exp(1))
  • base is a named argument are read from left to right
  • named arguments breaks the reading order
  • named arguments make your code more readable

To know more about the log function we can read its manual.

help(log)

or

?log

This block allows you to view the different outputs (?help, graphs, etc.).

Test that your logarithm function can work in base 10

Solution

10^logarithm(12, base = 10)

3.4 A code editor

We are know going to write our first function. We could do it directly in the R console, with multi-line commands but this process is tidyous.

Instead we are going to use the Rstudio code editor pannel, to write our code. You can go to File > New File > R script to open your editor pannel.

## Writing function

We can define our own function with :

  • function name,
  • declaration of function type: function,
  • arguments: between ( ),
  • { and } to open and close function body,

Here is a example of function declaration with two argumment a and b.

function_name <- function(a, b){


}
  • a series of operations,

The argument a and b are accessible from within the function body as the variable a and b.

function_name <- function(a, b){
  result_1 <- operation1(a, b)
  result_2 <- operation2(result_1, b)
  
}
  • return operation

At the end of a function we want to return a results, so function calls will be equal to this results.

function_name <- function(a, b){
  result_1 <- operation1(a, b)
  result_2 <- operation2(result_1, b)
  return(result_2)
}

Note: if you don’t use return by default the evaluation of the last line of your function body is returned

Try a function to test if a number is even? You can use the %% modulo opperator

Name this function even_test and use the == comparison to test if the results of the modulo is equal to 0.

Solution

even_test <- function(x){
  modulo_result <- x %% 2
  is_even <- modulo_result == 0
  return(is_even)
}
even_test(4)
[1] TRUE
even_test(3)
[1] FALSE

Note : A function can be written in several forms.
Solution

even_test2 <- function(x){
  (x %% 2) == 0
}
even_test(4)
[1] TRUE
even_test(3)
[1] FALSE

RStudio offers you great flexibility in running code from within the editor window. There are buttons, menu choices, and keyboard shortcuts. To run the current line, you can

  • click on the Run button above the editor panel, or
  • select “Run Lines” from the “Code” menu, or
  • hit Ctrl+Return in Windows or Linux or Cmd+Return on OS X. To run a block of code, select it and then Run.

If you have modified a line of code within a block of code you have just run, there is no need to reselect the section and Run, you can use the next button along, Rerun the previous region. This will run the previous code block including the modifications you have made.

3.5 Cleaning up

No We can now clean your environment

rm(x)

What appenned in the Environment panel ? Check the documentation of this command

Solution

?rm

ls()
 [1] "a"                     "b"                     "bioconductor_packages"
 [4] "biocPackages"          "cran_packages"         "even_test"            
 [7] "even_test2"            "logarithm"             "url"                  
[10] "y"                     "z"                    

Combine rm and ls to cleanup your Environment

Solution

rm(list = ls())

ls()
character(0)

Summary so far:

  • Assigning a variable is done with <-.
  • The assigned variables are listed in the environment box.
  • Variable names can contain letters, numbers, underscores and periods.
  • Functions are also variable and can write in several forms
  • An editing box is available on Rstudio.

4 Complex variable type

You can only go so far with the variables we have already seen. In R there are also complex variable type, which can be seen as combinaison of simple variable type.

4.1 Vector (aka list)

Vector are simple list of variable of the same type

c(1, 2, 3, 4, 5)
[1] 1 2 3 4 5

or

c(1:5)
[1] 1 2 3 4 5

A mathematical calculation can be performed on the elements of the vector:

2^c(1:5)
[1]  2  4  8 16 32
x <- c(1:5)
2^x
[1]  2  4  8 16 32

Note: this kind of opperation is called vectorisation and is very powerfull in R.

To determine the type of the elements of a vector:

typeof(x)
[1] "integer"
typeof(x + 0.5)
[1] "double"
x + 0.5
[1] 1.5 2.5 3.5 4.5 5.5
is.vector(x)
[1] TRUE

Vector can be extended to named vectors:

y <- c(a = 1, b = 2, c = 3, d = 4, e = 5)
y
a b c d e 
1 2 3 4 5 

We can compare the elements of two vectors:

x
[1] 1 2 3 4 5
y
a b c d e 
1 2 3 4 5 
x == y
   a    b    c    d    e 
TRUE TRUE TRUE TRUE TRUE 

Summary so far

  • A variable can be of different types : numeric, character, vector, function, etc.
  • Calculations and comparisons apply to vectors.
  • Do not hesitate to use the help box to understand functions!

We will see other complex variables type during this formation.

5 Packages

As we have seen

5.1 Installing packages

install.packages("tidyverse")

or click on Tools and Install Packages...

install.packages("ggplot2")

5.2 Loading packages

sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rvest_1.0.1

loaded via a namespace (and not attached):
 [1] knitr_1.33        xml2_1.3.2        magrittr_2.0.1    R6_2.5.1         
 [5] rlang_0.4.11      fansi_0.5.0       stringr_1.4.0     httr_1.4.2       
 [9] tools_4.1.1       xfun_0.25         utf8_1.2.2        htmltools_0.5.1.1
[13] ellipsis_0.3.2    yaml_2.2.1        assertthat_0.2.1  digest_0.6.27    
[17] tibble_3.1.3      lifecycle_1.0.0   crayon_1.4.1      bookdown_0.23    
[21] klippy_0.0.0.9500 vctrs_0.3.8       curl_4.3.2        evaluate_0.14    
[25] rmarkdown_2.10    stringi_1.7.3     compiler_4.1.1    pillar_1.6.2     
[29] rmdformats_1.0.2  pkgconfig_2.0.3  
library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
✓ ggplot2 3.3.5     ✓ purrr   0.3.4
✓ tibble  3.1.3     ✓ dplyr   1.0.7
✓ tidyr   1.1.3     ✓ stringr 1.4.0
✓ readr   2.0.1     ✓ forcats 0.5.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
x dplyr::filter()         masks stats::filter()
x readr::guess_encoding() masks rvest::guess_encoding()
x dplyr::lag()            masks stats::lag()
sessionInfo()

5.3 Unloading packages

unloadNamespace("tidyverse")
sessionInfo()

##See you to Session#2 : “Introduction to Tidyverse”