1  General introduction to R

Authors
Affiliation

Samuel Bédécarrats

Chloé Lauer

CNRS, Univ. Bordeaux, MCC – UMR 5199 PACEA

Floriane Rémy

Frédéric Santos

CNRS, Univ. Bordeaux, MCC – UMR 5199 PACEA

1.1 What is R?

R is primarily a statistical programming language, although it has become a more general-purpose language in recent years. It is open source, free, cross-platform software, developed by volunteers.

Although it can be used through various graphical interfaces such as R Commander (Fox 2017), we can only take full advantage of its power by writing R scripts.

1.2 First interactions with R

To get used to R, lets begin with simple arithmetic operations in the R console:

5 + 3
[1] 8
2^3
[1] 8

However, we usually interact with R using functions, as in the following example:

log(10)
[1] 2.302585

A function can have one or more (mandatory or optional) arguments that modify its behavior. For example, we can specify a given base in the calculation of a logarithm by using the argument “base”:

log(10, base = 2)
[1] 3.321928

Therefore, the general synopsis for using a function in R is as follows:

function(argument1 = value1, argument2 = value2, ...)

In particular, the different arguments of a function are therefore separated by a comma.

You can know what are the arguments of a given R function by reading its help page; for instance:

help(log)

1.3 R and its packages

R comes natively with a limited collection of basic functions, allowing you to carry out the most common tasks (usual graphical representations, basic tests, etc.). To implement more advanced (or simply less common) methods, there are more than 23,000 (as of November 2025) additional packages freely downloadable from CRAN.

It is only necessary to install them once (using the install.packages() function), but they must then be loaded (when you need them) each time the software is started (via the library() function). Here is an example:

## Install the R package geomorph:
install.packages("geomorph")
## Load this package for the current R session:
library(geomorph)

1.4 General workflow to use R

Generally speaking, we never write instructions in the R console1, but in a separate plain-text file, which is then called an R script (or source code file). An R script is thus a sequence of R statements stored in a plain-text file, whose the extension must be .R.

If you are not used to programming, and have only used graphical interfaces so far, R introduces a new paradigm, in the sense that we no longer save results, but instructions allowing these results to be obtained:

“The source code is real. The objects are realizations of the source code” — Manual of Emacs Speaks Statistics

A script must remain “clean”: well organized, clear, understandable. It must be both complete (contain all necessary commands) and minimal (contain nothing superfluous).

For instance in the IDE RStudio2, you can create a new R script by visiting the menu File > New File > R Script. The keyboard shortcut Ctrl + Shift + N is another quick way to do it.

When writing an R script, it is useful (essential?) to enter comments, to explain in plain language at least the most technical parts of the code. The comment character in R is #:

sqrt(2) # this is the function square root

Now, let’s begin with a first concrete use case of R!

Figure 1.1

1.5 Basic operations with R

1.5.1 Creating R objects

First, let’s create objects with specific values. We say that we assign values to these objects, using the assignment operator <-3. You can then display the value stored in an object with the function print():

## Here we define an R object, as a sequence of numeric values:
x <- c(2.5, 3.2, 5.7, 1.8)
## And here we display it:
print(x)
[1] 2.5 3.2 5.7 1.8

1.5.2 Vectors

In the previous example, we used the function c() to create a vector, i.e. an unidimensional and ordered sequence of values. This name “c()” stands for “combine”, “concatenate” or “collection”. Since vectors are ordered, we can extract, say, the first element of the vector x by using the following syntax:

print(x[1])
[1] 2.5

1.5.3 Dataframes

A dataframe is a two-dimensional data structure, where \(n\) individuals are described by \(p\) variables. Here is for instance how we can define (by hand) a \(5 \times 2\) dataframe giving \((x, y)\) coordinates for 5 points:

coord <- data.frame(
    x = c(5, 8, 10, 2, 14),
    y = c(2, 7, 6, 5, 3)
)
print(coord)
   x y
1  5 2
2  8 7
3 10 6
4  2 5
5 14 3

Thus, for instance, a dataframe can be an appropriate way on storing 2D-coordinates of all landmarks for a given individual.

1.5.4 Arrays

Now, how can we store the data corresponding to the 2D-coordinates of all landmarks for a whole sample of individuals?

An appropriate data structure4 for that, which is therefore of a very frequent use in morphometrics, is the array. Arrays can be seen as generalized matrices: in morphometrics, you’ll often find arrays with a three-way indexing system, \(A=(a_{ijk})\), where each value \(a_{ijk}\) will correspond to the \(j\)-th coordinate of the \(i\)-th landmark for the \(k\)-th individual. We will see examples of arrays later on.

1.6 Useful packages for doing morphometrics

We will use the following packages for subsequent chapters:

library(geomorph)
library(Morpho)
library(rgl)
library(Rvcg)
library(shapes)

The package {shapes} is the historical package for shape analysis in R, and is exhaustively documented in Dryden and Mardia (2016). However, {geomorph} and {Morpho} have a more “modern” design, and more helpers to make analyses easier. The choice of one package or another is purely a matter of taste.

Some other more general packages will also be used for multivariate statistical analysis and graphical representations:

library(factoextra)
library(FactoMineR)
library(ggpubr)

  1. Except for “one-shot” instructions you don’t want to keep track of, such as installing packages, asking for help, etc.↩︎

  2. There exists a lot of other excellent interfaces to interact efficiently with R: Emacs ESS, JupyterLab, etc. Rstudio is not the only way!↩︎

  3. You can also use = as the assignment operator, so that x <- 2 and x = 2 are exact synonyms for R. However, for several reasons, it is better to use <-.↩︎

  4. There are many other data structures in R, such as factors, matrices, lists and so on. We will not cover them here, but you can find a good description here.↩︎