Comprehensive guide to the PISA data underlying EduStrat (Educational Stratification in PISA)
EduStrat is designed to study the intergenerational transmission of educational achievement—specifically, how parental characteristics (education, occupational status, and household wealth, captured through the ESCS index) relate to children's academic performance at age 15 in mathematics, reading, and science across more than 100 countries worldwide.
The Programme for International Student Assessment (PISA) is a triennial international survey coordinated by the Organisation for Economic Co-operation and Development (OECD) since 2000. PISA assesses the extent to which 15-year-old students near the end of compulsory education have acquired the knowledge and skills essential for full participation in modern societies.
PISA has been conducted in the following years, with each cycle focusing on one major domain while assessing all three:
| Year | Major Domain | Countries/Economies | Students Assessed | Status |
|---|---|---|---|---|
| 2000 | Reading | 43 | ~265,000 | Available in App |
| 2003 | Mathematics | 41 | ~276,000 | Available in App |
| 2006 | Science | 57 | ~398,000 | Available in App |
| 2009 | Reading | 65 | ~475,000 | Available in App |
| 2012 | Mathematics | 65 | ~510,000 | Available in App |
| 2015 | Science | 72 | ~540,000 | Available in App |
| 2018 | Reading | 79 | ~600,000 | Available in App |
| 2022 | Mathematics | 81 | ~690,000 | Available in App |
This application includes data from all PISA cycles from 2000-2022 (8 assessment cycles: 2000, 2003, 2006, 2009, 2012, 2015, 2018, 2022), covering 513 country-year combinations across 101+ unique countries/economies.
PISA mathematics assesses students' capacity to formulate, employ, and interpret mathematics in a variety of contexts. It includes reasoning mathematically and using mathematical concepts, procedures, facts, and tools to describe, explain, and predict phenomena.
PISA reading literacy assesses students' capacity to understand, use, evaluate, reflect on, and engage with texts in order to achieve goals, develop knowledge and potential, and participate in society.
PISA science literacy assesses the ability to engage with science-related issues and with the ideas of science, as a reflective citizen. It includes understanding natural phenomena, designing scientific enquiry, and interpreting evidence.
PISA targets students who are between 15 years 3 months and 16 years 2 months at the time of assessment, regardless of their grade level. This age range was chosen because students at this age are approaching the end of compulsory schooling in most OECD countries.
PISA employs a sophisticated two-stage stratified sampling design:
PISA provides several types of sampling weights to ensure representative estimates:
EduStrat uses data processed through the learningtower R package (Wang et al., 2024), which provides harmonized, analysis-ready PISA data in a consistent format.
The learningtower package is available on CRAN:
install.packages("learningtower")
library(learningtower)
The package provides easy access to PISA data:
# Load all student data for a specific year
data_2018 <- load_student(2018)
# Load data for specific countries
data_usa <- load_student(2018, countries = "USA")
# Access codebook
codebook <- load_codebook()
If you use data from this application, please cite the learningtower package:
Wang, K., Yacobellis, P., Siregar, E., Romanes, S., Fitter, K., Dalla Riva, G. V., Cook, D., Tierney, N., Dingorkar, P., Sai Subramanian, S., & Chen, G. (2024). learningtower: OECD PISA datasets from 2000–2022 in an easy-to-use format (R package version 1.1.0). https://doi.org/10.32614/CRAN.package.learningtower
EduStrat pre-generates 513 country-year specific JSON files (e.g., USA_2018.json) for efficient progressive loading. Each chunk contains:
{
"country": "USA",
"year": 2018,
"n_students": 4838,
"data_quality": {
"missing_math": 0,
"missing_reading": 0,
"missing_science": 0,
"missing_escs": 0,
"complete_cases": 4838
},
"students": [
{
"year": 2018,
"country": "USA",
"school_id": "USA_2018_84000001",
"student_id": "USA_2018_84000250",
"mother_educ": "ISCED 3A",
"father_educ": "ISCED 3A",
"gender": "male",
"computer": "yes",
"internet": "yes",
"math": 534.751,
"science": 488.669,
"stu_wgt": 646.6246,
"desk": "yes",
"room": "no",
"television": "3+",
"computer_n": "1",
"car": "3+",
"book": "11-25",
"wealth": 0.7125,
"escs": -0.4811,
"reading": 590.33
}
]
}
The application also includes a metadata.json file that catalogs all available data:
{
"countries": ["ALB", "ARG", "AUS", ...],
"years": [2000, 2003, 2006, 2009, 2012, 2015, 2018, 2022],
"variables": {
"math": "Mathematics achievement score",
"reading": "Reading achievement score",
"science": "Science achievement score",
"escs": "PISA index of economic, social and cultural status",
...
}
}
All variable names in the JSON data files are taken directly from the learningtower R package (Wang et al., 2024), with the sole exception that read is renamed to reading during chunk generation. The following variables are used by EduStrat in its analyses. The JSON data files also contain additional learningtower fields (wealth, computer, internet, desk, room, book, television, car, computer_n) that are not currently used by the application.
Note: PISA uses plausible values to account for measurement error. EduStrat uses the first plausible value (PV1) for each domain. Full analyses should consider all plausible values (5 in earlier cycles, 10 from PISA 2015 onward) and combine results using Rubin's rules.
Note: EduStrat derives a parent_edu variable at runtime by taking the higher of mother_educ and father_educ (converted to years of schooling via ISCED mapping). This serves as the alternative predictor option in the application.
The official source for all PISA data is the OECD PISA Data Portal:
Main Portal: https://www.oecd.org/pisa/data/
PISA data contain missing values due to:
This application uses complete-case analysis by default. Advanced users should consider multiple imputation methods for handling missing data.
The following 38 OECD countries are available (availability varies by year):
Australia (AUS)
Austria (AUT)
Belgium (BEL)
Canada (CAN)
Chile (CHL)
Colombia (COL)
Costa Rica (CRI)
Czech Republic (CZE)
Denmark (DNK)
Estonia (EST)
Finland (FIN)
France (FRA)
Germany (DEU)
Greece (GRC)
Hungary (HUN)
Iceland (ISL)
Ireland (IRL)
Israel (ISR)
Italy (ITA)
Japan (JPN)
Korea (KOR)
Latvia (LVA)
Lithuania (LTU)
Luxembourg (LUX)
Mexico (MEX)
Netherlands (NLD)
New Zealand (NZL)
Norway (NOR)
Poland (POL)
Portugal (PRT)
Slovak Republic (SVK)
Slovenia (SVN)
Spain (ESP)
Sweden (SWE)
Switzerland (CHE)
Turkey (TUR)
United Kingdom (GBR)
United States (USA)
An additional 60+ partner countries and economies are available, including:
Albania (ALB)
Argentina (ARG)
Brazil (BRA)
Bulgaria (BGR)
China (CHN)*
Croatia (HRV)
Hong Kong (HKG)
India (IND)*
Indonesia (IDN)
Jordan (JOR)
Kazakhstan (KAZ)
Macao (MAC)
Malaysia (MYS)
Peru (PER)
Qatar (QAT)
Romania (ROU)
Russia (RUS)
Serbia (SRB)
Singapore (SGP)
Chinese Taipei (TWN)
Thailand (THA)
Uruguay (URY)
Vietnam (VNM)
* Note: Some countries participate through specific regions or provinces rather than nationally