1 General introduction to R

Authors

Affiliation

Samuel Bédécarrats

CNRS, Univ. Bordeaux, MCC – UMR 5199 PACEA

Floriane Rémy

CNRS, Univ. Bordeaux, MCC – UMR 5199 PACEA

Frédéric Santos

CNRS, Univ. Bordeaux, MCC – UMR 5199 PACEA

1.1 What is R?

R is primarily a statistical programming language, although it has become a more general-purpose language in recent years. It is open source, free, cross-platform software, developed by volunteers.

Although it can be used through various graphical interfaces such as R Commander (Fox 2017), we can only take full advantage of its power by writing R scripts.

1.2 First interactions with R

To get used to R, lets begin with simple arithmetic operations in the R console:

5 + 3

[1] 8

2^3

[1] 8

However, we usually interact with R using functions, as in the following example:

log(10)

[1] 2.302585

A function can have one or more (mandatory or optional) arguments that modify its behavior. For example, we can specify a given base in the calculation of a logarithm by using the argument “base”:

log(10, base = 2)

[1] 3.321928

In R, the different arguments of a function are therefore separated by a comma.

You can know what are the arguments of a given R function by reading its help page; for instance:

help(log)

1.3 R and its packages

R comes natively with a limited collection of basic functions, allowing you to carry out the most common tasks (usual graphical representations, basic tests, etc.). To implement more advanced (or simply less common) methods, there are more than 22,000 (as of June 2025) additional packages freely downloadable from CRAN.

It is only necessary to install them once (using the install.packages() function), but they must then be loaded (when you need them) each time the software is started (via the library() function). Here is an example:

## Install the R package geomorph:
install.packages("geomorph")
## Load this package for the current R session:
library(geomorph)

1.4 General workflow to use R

Generally speaking, we never write instructions in the R console¹, but in a separate plain-text file, which is then called an R script (or source code file). An R script is thus a sequence of R statements stored in a plain-text file, whose the extension must be .R.

If you are not used to programming, and have only used graphical interfaces so far, R introduces a new paradigm, in the sense that we no longer save results, but instructions allowing these results to be obtained:

“The source code is real. The objects are realizations of the source code” — Manual of Emacs Speaks Statistics

A script must remain “clean”: well organized, clear, understandable. It must be both complete (contain all necessary commands) and minimal (contain nothing superfluous).

For instance in the IDE RStudio², you can create a new R script by visiting the menu File > New File > R Script. The keyboard shortcut Ctrl + Shift + N is another quick way to do it.

When writing an R script, it is useful (essential?) to enter comments, to explain in plain language at least the most technical parts of the code. The comment character in R is #:

sqrt(2) # this is the function square root

Now, let’s begin with a first concrete use case of R!

1.5 RthuR, GuineveRe et al.

We will work on a fictional dataset based on the TV show Kaamelott.

1.5.1 Creating R objects

First, let’s create objects with specific values. We say that we assign values to these objects, using the assignment operator <-³. You can then display the value stored in an object with the function print():

king <- 'Arthur'
queen <- 'Guenievre'
print(king)

[1] "Arthur"

You can create an object from other pre-existing objects, so that:

royalCouple <- c('Arthur', 'Guenievre')
print(royalCouple)

[1] "Arthur"    "Guenievre"

is equivalent to:

royalCouple <- c(king, queen)
print(royalCouple)

[1] "Arthur"    "Guenievre"

1.5.2 Vectors

Note that we used above the function c() to create vectors, i.e. unidimensional and ordered sequences of values. This name “c()” stands for “combine”, “concatenate” or “collection”. Since vectors are ordered, we can extract, say, the first element of a vector by using the following syntax:

print(royalCouple[1])

[1] "Arthur"

1.5.3 Dataframes

Now, let’s consider four characters of the TV show, along with their gender and their (not so) arbitrary scores in courage and intelligence.

We can combine all these features into a dataframe, which is the standard way in R to build a data structure with \(n\) individuals described by \(p\) variables:

kaamelott <- data.frame(
    character = c('Arthur', 'Bohort', 'Karadoc', 'Séli'),
    gender = c('M', 'M', 'M', 'F'),
    courage = c(500, 0, 100, 300),
    intelligence = c(500, 400, 100, 400)
)
print(kaamelott)

  character gender courage intelligence
1    Arthur      M     500          500
2    Bohort      M       0          400
3   Karadoc      M     100          100
4      Séli      F     300          400

In R, it’s easy to select of subset of individuals according to one of several conditions:

## Select only males:
subset(kaamelott, gender == "M")

  character gender courage intelligence
1    Arthur      M     500          500
2    Bohort      M       0          400
3   Karadoc      M     100          100

## Select only brave people:
subset(kaamelott, courage >= 300)

  character gender courage intelligence
1    Arthur      M     500          500
4      Séli      F     300          400

Dataframes are 2-dimensional data structures: each element stored in a dataframe can be extracted by specifying the indices of its row and column. For instance:

## Display the value stored in row 1, column 3:
print(kaamelott[1, 3])

[1] 500

Note that if you only specify a column index and no row index, the whole column is displayed⁴:

## Display the whole 3rd column:
print(kaamelott[, 3])

[1] 500   0 100 300

But generally, we will select a column by giving its name rather than its index:

## Display the whole column "Courage":
print(kaamelott$courage)

[1] 500   0 100 300

1.5.4 Arrays

Another data structure⁵ in R, which is of a very frequent use in morphometrics, is the array. Arrays can be seen as generalized matrices: in morphometrics, you’ll often find 3D-arrays \(A=(a_{ijk})\) where each value \(a_{ijk}\) has three indices. It’s particularly useful for representing landmark data, since \(a_{ijk}\) will correspond to the \(j\)-th coordinate of the \(i\)-th landmark for the \(k\)-th individual. We will see examples of arrays later on.

1.6 Useful packages for doing morphometrics

We will use the following packages for subsequent chapters:

library(geomorph)
library(Morpho)
library(rgl)
library(Rvcg)
library(shapes)

The package {shapes} is the historical package for shape analysis in R, and is exhaustively documented in Dryden and Mardia (2016). However, {geomorph} and {Morpho} have a more “modern” design, and more helpers to make analyses easier. The choice of one package or another is purely a matter of taste.

Some other more general packages will also be used for multivariate statistical analysis and graphical representations:

library(factoextra)
library(FactoMineR)
library(ggpubr)

Except for “one-shot” instructions you don’t want to keep track of, such as installing packages, asking for help, etc.↩︎
There exists a lot of other excellent interfaces to interact efficiently with R: Emacs ESS, JupyterLab, etc. Rstudio is not the only way!↩︎
You can also use = as the assignment operator, so that x <- 2 and x = 2 are exact synonyms for R. However, for several reasons, it is better to use <-.↩︎
And similarly, of course, kaamelott[1, ] is a way to extract the whole first row of the dataframe.↩︎
There are many other data structures in R, such as factors, matrices, lists and so on. We will not cover them here, but you can find a good description here.↩︎