Basic statistics with R

Topic 1 · Day 3 · 2 hours

Christian González Martel

Department of Quantitative Methods in Economics and Management · ULPGC

Juan M. Hernández Guerra

Department of Quantitative Methods in Economics and Management · ULPGC

April 29, 2026

Outline

  • Univariate descriptives: mean, median, variance, quantiles.
  • Frequency tables and cross-tabulations.
  • Correlation and covariance.
  • Communicating results with nicely formatted summaries.

Hypothesis testing (t-test, chi-squared, regression inference) is covered in Module III with Juan — not here.

Setup · sample data

Run this once at the start of the session — every example below uses either hotels or tourists.

library(tibble)

hotels <- tibble(
  island = c("Gran Canaria", "Tenerife", "Lanzarote",
             "Fuerteventura", "La Palma"),
  stars  = c(4L, 5L, 4L, 3L, 3L),
  price  = c(82, 95, 110, 100, 78),
  rating = c(8.2, 9.1, 7.5, 6.9, 8.0),
  nights = c(12.5, 18.3, 9.8, 11.2, 6.4)
)

tourists <- tibble(
  origin_country = c("DE", "DE", "UK", "UK", "UK",
                     "ES", "ES", "FR", "DE", "UK"),
  island         = c("Gran Canaria", "Tenerife", "Lanzarote",
                     "Tenerife", "Lanzarote", "Gran Canaria",
                     "Gran Canaria", "Tenerife", "Gran Canaria",
                     "Lanzarote"),
  main_purpose   = c("leisure", "leisure", "leisure", "business",
                     "leisure", "leisure", "business", "leisure",
                     "leisure", "leisure")
)

Descriptives in one line

library(dplyr)

hotels |>
  summarise(
    n        = n(),
    mean     = mean(price, na.rm = TRUE),
    sd       = sd(price,   na.rm = TRUE),
    median   = median(price, na.rm = TRUE),
    q25      = quantile(price, 0.25, na.rm = TRUE),
    q75      = quantile(price, 0.75, na.rm = TRUE)
  )

Frequency tables

library(janitor)

tourists |>
  tabyl(origin_country) |>
  adorn_totals("row") |>
  adorn_pct_formatting()

Cross-tabs

tourists |>
  tabyl(island, main_purpose) |>
  adorn_percentages("row") |>
  adorn_pct_formatting()

Correlation

hotels |>
  select(price, rating, nights) |>
  cor(use = "pairwise.complete.obs")

Recap

  • summarise() + a handful of functions covers most univariate reporting.
  • janitor::tabyl() beats base table() for presentable frequency tables.
  • cor() with use = "pairwise.complete.obs" handles missing values sensibly when summarising relationships among numeric variables.

Hands-on

Using the hotels and tourists tibbles from the Setup slide:

  1. Summary table of mean price, mean rating and mean nights, grouped by stars.
  2. Cross-tabulation of origin_country × island with row percentages, and one sentence on a striking cell.
  3. Correlation matrix of price, rating and nights with one sentence on the strongest pair.

The graded version with real Eurostat data is in exercises/day3/exercise-template.R.