Replication Package – “Testing Zipf’s and Gibrat’s Laws in the Spanish University System: Evidence from a decade of institutional transformation” Last update: 2026-03-04 1. Overview This replication package contains the dataset and R code required to reproduce the descriptive statistics (Table 1), the main estimations (Zipf’s Law and Gibrat’s Law), and the additional disaggregated analyses by field of study, university type and modality. 2. Data source The dataset is constructed from official student enrolment information extracted from the Spanish Ministry of Science, Innovation and Universities (Ministerio de Ciencia, Innovación y Universidades, Spain). The variable “Valor” represents university size measured as the number of enrolled students. File provided: - data/datos.xlsx Sheet used by the code: “Rama Enseñanza” 3. Folder structure (keep this structure) replication_package/ ├── code/ │ └── run_all.R ├── data/ │ └── datos.xlsx ├── output/ (created automatically, or you can create it manually) │ ├── tables/ │ └── figures/ └── README.txt 4. Software requirements - R version: R 4.x (or later recommended) - Required R packages: readxl, dplyr, tibble, purrr, tidyr, openxlsx, ggplot2 If needed, install packages with: install.packages(c(“readxl”,“dplyr”,“tibble”,“purrr”,“tidyr”,“openxlsx”,“ggplot2”)) 5. Working directory (IMPORTANT) To ensure outputs (tables/figures) are saved correctly, you must set the working directory to the root folder of the replication package (the folder that contains /code, /data and /output). Option A (recommended): open the replication_package folder in RStudio as a Project and run: setwd(“path/to/replication_package”) Option B: set the working directory manually before running the script: setwd(“C:/…/replication_package”) # Windows example setwd(“/Users/…/replication_package”) # macOS/Linux example After setting the working directory, run the script located in: code/run_all.R 6. How to run From the root directory of the replication package: source(“code/run_all.R”) 7. Output files The script generates the following outputs (filenames may include): - descriptive_statistics.xlsx (Table 1 descriptive statistics) - zipf_resultados.csv - gibrat_resultados.csv - zipf_resultados_rama.csv - gibrat_resultados_rama.csv - zipf_resultados_tipo.csv - gibrat_resultados_tipo.csv - zipf_resultados_modalidad.csv - gibrat_resultados_modalidad.csv - figures created by ggplot (if enabled/saved in the script) NOTE: If the script does not create output subfolders automatically, please create them manually: - output/ - output/tables/ - output/figures/ 8. Notes on variables Key variables used in the analysis include: - Curso: academic year - Universidad: institution identifier/name - Nivel: degree level (e.g., “Grado”, “Máster”) - Rama de Enseñanza: field of study - Tipo 1: modality (e.g., on-campus vs. online) - Tipo 2: university type (e.g., public vs. private) - Valor: university size measured as number of enrolled students 9. Contact For questions about this replication package, please contact: Juan Manuel Martín Álvarez Email: juanmanuel.martin@unir.net