Module I · Day 1 · 3 hours
Department of Quantitative Methods in Economics and Management · ULPGC
Department of Quantitative Methods in Economics and Management · ULPGC
April 23, 2026
| What it is | You write… | |
|---|---|---|
| R | The language and engine. | Code that computes. |
| RStudio / Positron | The IDE where you work. | Nothing R-specific — it’s the editor. |
| Quarto | The publishing system. | Reports, slides, web pages, books. |
| tidyverse | A family of R packages with consistent design. | library(tidyverse) at the top of every script. |
Install R from https://cloud.r-project.org, RStudio from https://posit.co/download/rstudio-desktop/.
┌───────────────────────────────┬──────────────────────────┐
│ │ │
│ Editor │ Environment / History │
│ (your .R or .qmd files) │ (objects currently │
│ │ in memory) │
│ │ │
├───────────────────────────────┼──────────────────────────┤
│ │ │
│ Console │ Files / Plots / │
│ (live R session) │ Packages / Help │
│ │ │
└───────────────────────────────┴──────────────────────────┘
.Rprojquantitative-methods-master-tides.Rproj to open the project.Tip
Never run setwd("C:/Users/yourname/..."). If you see that in any tutorial, close the tab.
Base R is a small kernel. Everything useful — tidyverse, plotting, model fitting, reading Excel, hitting the Eurostat API — ships as a package that you install separately from CRAN, the canonical mirror.
Tip
install.packages() downloads and compiles — slow, but only once. library() is fast and goes at the top of every script. One single library(tidyverse) pulls in eight core packages (dplyr, ggplot2, tibble, readr, tidyr, purrr, stringr, forcats) — the workhorses for the rest of the course.
[1] 4
[1] 2
[1] 1024
[1] 9
[1] 1
R evaluates each line top-to-bottom and prints the result. Nothing else happens.
[1] 446250
Use <- (Alt + - in RStudio inserts it) for assignment. Names can contain letters, digits, _ and ., but must start with a letter.
Functions take inputs and return an output:
[1] 4
[1] 7.6
[1] 2024 2025 2026
Tip
Type mean( and press F1 in RStudio to open the help page for the function. ?mean does the same from the console.
Sites you will use a lot:
[r].[1] "double"
[1] "integer"
[1] "character"
[1] "logical"
[1] "factor"
[1] "Date"
We lump numeric (double + integer) together in practice.
[1] TRUE
[1] 3
Operators return logicals:
[1] "Gran Canaria" "Tenerife" "Lanzarote" "Fuerteventura"
[5] "La Palma" "La Gomera" "El Hierro"
island
Gran Canaria Tenerife Lanzarote Fuerteventura La Palma
2 1 1 0 0
La Gomera El Hierro
0 0
Factors matter when the order of categories is meaningful (e.g. low, medium, high) or when all levels should appear in a table even if some have zero observations.
[1] "2026-04-29"
[1] "Date"
[1] "2026-05-29"
Time difference of 40 days
[1] "Wednesday, 29 April 2026"
Use the lubridate package (inside the tidyverse) for anything more serious — we’ll cover it in Data wrangling.
[1] "1" "2" "three"
[1] 1 2 1
[1] 1.00 2.00 3.14
R’s rule: when mixing types in one vector, coerce up to the most flexible type (character > double > integer > logical).
An atomic vector is homogeneous: every element must be of the same type — all double, or all integer, or all character, or all logical, etc.
[1] "double"
[1] "integer"
[1] "character"
[1] "logical"
Important
Mixing types is not forbidden — R silently coerces everything to one common type (see Coercion gotchas below), which is almost never what you want. If you need several types in one object, use a data frame: one column per type.
[1] 1 2 3 4 5
[1] 0.00 0.25 0.50 0.75 1.00
[1] "GC" "GC" "GC"
[1] 5
[1] 164 475 770 1000 1092
[1] 700.2
[1] 3501
[1] 164 1092
This is the point of R. Whenever you are tempted to write a for loop, ask yourself whether vectorisation solves it.
[1] 2 5 7 10 14
[1] 2
[1] 2 7 14
[1] 5 7 10 14
[1] 7 10 14
The logical form (x[condition]) is the one you will use 95 % of the time.
Tenerife
95
Tenerife Lanzarote Fuerteventura
95 110 100
Named vectors are a stepping stone to data frames.
# A tibble: 5 × 4
island stars price nights
<chr> <int> <dbl> <dbl>
1 Gran Canaria 4 82 12.5
2 Tenerife 5 95 18.3
3 Lanzarote 4 110 9.8
4 Fuerteventura 3 100 11.2
5 La Palma 3 78 6.4
[1] 5 4
[1] 5
[1] 4
[1] "island" "stars" "price" "nights"
# A tibble: 5 × 5
island stars price nights revenue
<chr> <int> <dbl> <dbl> <dbl>
1 Gran Canaria 4 82 12.5 1025
2 Tenerife 5 95 18.3 1738.
3 Lanzarote 4 110 9.8 1078
4 Fuerteventura 3 100 11.2 1120
5 La Palma 3 78 6.4 499.
Later, with dplyr, we’ll write this as:
# A tibble: 3 × 5
island stars price nights revenue
<chr> <int> <dbl> <dbl> <dbl>
1 Gran Canaria 4 82 12.5 1025
2 Tenerife 5 95 18.3 1738.
3 Lanzarote 4 110 9.8 1078
# A tibble: 3 × 5
island stars price nights revenue
<chr> <int> <dbl> <dbl> <dbl>
1 Gran Canaria 4 82 12.5 1025
2 Tenerife 5 95 18.3 1738.
3 Lanzarote 4 110 9.8 1078
Pick one style and stick to it. We’ll standardise on dplyr from Data wrangling onwards.
The repo does not ship CSV/XLSX files — they are gitignored and fetched on demand from the source APIs. After cloning, run this once from the R console at the project root:
The script writes Eurostat and ISTAC files into datasets/raw/ and a MANIFEST.md recording filenames, source URLs and download date.
Warning
If install.packages("eurostat") fails (CRAN occasionally archives it when a transitive dependency drops), use the R-universe binary build:
Fallback if R-universe is unreachable: remotes::install_github("rOpenGov/eurostat") (slower, compiles).
| Format | Package | Function |
|---|---|---|
| CSV | readr |
read_csv() |
Excel (.xlsx, .xls) |
readxl |
read_excel() |
SPSS (.sav) |
haven |
read_sav() |
Stata (.dta) |
haven |
read_dta() |
| JSON | jsonlite |
read_json() |
| Eurostat | eurostat |
get_eurostat() |
All of these are part of — or play nicely with — the tidyverse.
readr::read_csvTip
Always go through here::here(). Your script will work on the lab machines, on your laptop, and on my laptop with zero changes.
readxlExcel files often have merged cells, titles and footers. Use skip and range = "B5:K120" to carve out the actual data table.
havenUse as_factor(survey) to convert labelled numeric variables into R factors in one go.
The eurostat package caches downloads locally, so the slow step only happens the first time.
here::here()here() always resolves relative to the project root (the folder with the .Rproj), regardless of which subfolder your .R or .qmd file lives in.
library(dplyr)
library(tibble)
hotels <- tribble(
~island, ~month, ~nights, ~beds,
"Gran Canaria", "2024-06", 142500, 165000,
"Gran Canaria", "2024-07", 168300, 165000,
"Tenerife", "2024-06", 198200, 220000,
"Tenerife", "2024-07", 231100, 220000,
"Lanzarote", "2024-06", 88100, 105000,
"Lanzarote", "2024-07", 98400, 105000
)
hotels |>
mutate(occupancy = nights / beds) |>
group_by(island) |>
summarise(mean_occupancy = mean(occupancy), .groups = "drop") |>
arrange(desc(mean_occupancy))# A tibble: 3 × 2
island mean_occupancy
<chr> <dbl>
1 Tenerife 0.976
2 Gran Canaria 0.942
3 Lanzarote 0.888
This is essentially every report in tourism statistics: load · clean · group · summarise. Tomorrow we unpack each verb.
numeric, character, logical, factor, Date) and you understand 90 % of what R prints.tibble / data.frame is the 99 %-of-the-time workhorse.readr, readxl, haven, eurostat.here::here() for paths.Short break, then one hour on Git & GitHub to set up your submission workflow.
After that, the Day 1 exercise: open exercises/day1/exercise-template.R, load one ISTAC CSV and one Eurostat dataset, and inspect them with glimpse(), summary() and head().
Quantitative Methods in Tourism · Master in Tourism and Sustainable Development · TIDES · ULPGC