5 + 3
[1] 8
R is primarily a statistical programming language, although it has become a more general-purpose language in recent years. It is open source, free, cross-platform software, developed by volunteers.
Although it can be used through various graphical interfaces such as R Commander (Fox 2017), we can only take full advantage of its power by writing R scripts.
To get used to R, lets begin with simple arithmetic operations in the R console:
5 + 3
[1] 8
2^3
[1] 8
However, we usually interact with R using functions, as in the following example:
log(10)
[1] 2.302585
A function can have one or more (mandatory or optional) arguments that modify its behavior. For example, we can specify a given base in the calculation of a logarithm by using the argument “base
”:
log(10, base = 2)
[1] 3.321928
In R, the different arguments of a function are therefore separated by a comma.
To know what are the arguments of a given R function, just read its help page; for instance:
help(log)
R comes natively with a limited collection of basic functions, allowing you to carry out the most common tasks (usual graphical representations, basic tests, etc.). To implement more advanced (or simply less common) methods, there are nearly 20,000 (as of May 2024) additional packages freely downloadable from CRAN.
It is only necessary to install them once (using the install.packages()
function), but they must then be loaded (when you need them) each time the software is started (via the library()
function). Here is an example:
## Install the R package geomorph:
install.packages("geomorph")
## Load this package for the current R session:
library(geomorph)
Generally speaking, we never write instructions in the R console1, but in a separate plain-text file, which is then called an R script (or source code file). An R script is thus a sequence of R statements stored in a plain-text file, whose the extension must be .R
.
If you are not used to programming, and have only used graphical interfaces so far, R introduces a new paradigm, in the sense that we no longer save results, but instructions allowing these results to be obtained:
“The source code is real. The objects are realizations of the source code” — Manual of Emacs Speaks Statistics
A script must remain “clean”: well organized, clear, understandable. It must be both complete (contain all necessary commands) and minimal (contain nothing superfluous).
For instance in the IDE RStudio2, you can create a new R script by visiting the menu File > New File > R Script. The keyboard shortcut Ctrl + Shift + N
is another quick way to do it.
When writing an R script, it is useful (essential?) to enter comments, to explain in plain language at least the most technical parts of the code. The comment character in R is #
:
sqrt(2) # this is the function square root
Now, let’s begin with a first concrete use case of R!
We will work on a fictional dataset based on the TV show Kaamelott.
First, let’s create objects with specific values. We say that we assign values to these objects, using the assignment operator <-
3. You can then display the value stored in an object with the function print()
:
<- 'Arthur'
King <- 'Guenievre'
Queen print(King)
[1] "Arthur"
You can create an object from other pre-existing objects, so that:
<- c('Arthur', 'Guenievre')
RoyalCouple print(RoyalCouple)
[1] "Arthur" "Guenievre"
is equivalent to:
<- c(King, Queen)
RoyalCouple print(RoyalCouple)
[1] "Arthur" "Guenievre"
Note that we used above the function c()
to create vectors, i.e. unidimensional and ordered sequences of values. This name “c()
” stands for “combine”, “concatenate” or “collection”. Since vectors are ordered, we can extract, say, the first element of a vector by using the following syntax:
print(RoyalCouple[1])
[1] "Arthur"
Now, let’s create a vector of three characters of the TV show, along with their gender and their (not so) arbitrary scores in courage and intelligence:
<- c('Arthur', 'Bohort', 'Karadoc', 'Séli')
characters <- c('m', 'm', 'm', 'f')
gender <- c(500, 0, 100, 300)
courage <- c(500, 400, 100, 400) intelligence
We can combine these three vectors into a dataframe, which is the standard way in R to build a data structure with \(n\) individuals described by \(p\) variables:
<- data.frame(
kaamelott
characters,
gender,
courage,
intelligence
)print(kaamelott)
characters gender courage intelligence
1 Arthur m 500 500
2 Bohort m 0 400
3 Karadoc m 100 100
4 Séli f 300 400
In R, it’s easy to select of subset of individuals according to one of several conditions:
## Select only males:
subset(kaamelott, gender == "m")
characters gender courage intelligence
1 Arthur m 500 500
2 Bohort m 0 400
3 Karadoc m 100 100
## Select only brave people:
subset(kaamelott, courage >= 300)
characters gender courage intelligence
1 Arthur m 500 500
4 Séli f 300 400
Dataframes are 2-dimensional data structures: each element stored in a dataframe can be extracted by specifying the indices of its row and column. For instance:
## Display the value stored in row 1, column 3:
print(kaamelott[1, 3])
[1] 500
Note that if you only specify a column index and no row index, the whole column is displayed4:
## Display the whole 3rd column:
print(kaamelott[, 3])
[1] 500 0 100 300
Another data structure5 in R, which is of a very frequent use in morphometrics, is the array. Arrays can be seen as generalized matrices: in morphometrics, you’ll often find 3D-arrays \(A=(a_{ijk})\) where each value \(a_{ijk}\) has three indices. It’s particularly useful for representing landmark data, since \(a_{ijk}\) will correspond to the \(j\)-th coordinate of the \(i\)-th landmark for the \(k\)-th individual. We will see examples of arrays later on.
We will use the following packages for subsequent chapters:
library(geomorph)
library(Morpho)
library(rgl)
library(Rvcg)
library(shapes)
The package {shapes}
is the historical package for shape analysis in R, and is exhaustively documented in Dryden and Mardia (2016). However, {geomorph}
and {Morpho}
have a more “modern” design, and more helpers to make analyses easier. The choice of one package or another is purely a matter of taste.
Some other more general packages will also be used for multivariate statistical analysis and graphical representations:
library(factoextra)
library(FactoMineR)
library(ggpubr)
Except for “one-shot” instructions you don’t want to keep track of, such as installing packages, asking for help, etc.↩︎
There exists a lot of other excellent interfaces to interact efficiently with R: Emacs ESS, JupyterLab, etc. Rstudio is not the only way!↩︎
You can also use =
as the assignment operator, so that x <- 2
and x = 2
are exact synonyms for R. However, the recommendation is to prefer <-
.↩︎
And similarly, of course, kaamelott[1, ]
is a way to extract the whole first row of the dataframe.↩︎
There are many other data structures in R, such as factors, matrices, lists and so on. We will not cover them here, but you can find a good description here.↩︎