Data Sources

Comprehensive guide to the PISA data underlying the Educational Stratification in PISA tool

This application is designed to study the intergenerational transmission of educational achievement—specifically, how parental characteristics (education, occupational status, and household wealth, captured through the ESCS index) relate to children's academic performance at age 15 in mathematics, reading, and science across more than 100 countries worldwide.

1. The PISA Programme

1.1 Programme Overview

The Programme for International Student Assessment (PISA) is a triennial international survey coordinated by the Organisation for Economic Co-operation and Development (OECD) since 2000. PISA assesses the extent to which 15-year-old students near the end of compulsory education have acquired the knowledge and skills essential for full participation in modern societies.

Key Facts:

1.2 PISA Assessment Cycles

PISA has been conducted in the following years, with each cycle focusing on one major domain while assessing all three:

Year Major Domain Countries/Economies Students Assessed Status
2000 Reading 43 ~265,000 Available in App
2003 Mathematics 41 ~276,000 Available in App
2006 Science 57 ~398,000 Available in App
2009 Reading 65 ~475,000 Available in App
2012 Mathematics 65 ~510,000 Available in App
2015 Science 72 ~540,000 Available in App
2018 Reading 79 ~600,000 Available in App
2022 Mathematics 81 ~690,000 Available in App

This application includes data from all PISA cycles from 2000-2022 (8 assessment cycles: 2000, 2003, 2006, 2009, 2012, 2015, 2018, 2022), covering 513 country-year combinations across 101+ unique countries/economies.

2. Assessment Framework

2.1 Mathematics Assessment

PISA mathematics assesses students' capacity to formulate, employ, and interpret mathematics in a variety of contexts. It includes reasoning mathematically and using mathematical concepts, procedures, facts, and tools to describe, explain, and predict phenomena.

Mathematics Competencies:

2.2 Reading Assessment

PISA reading literacy assesses students' capacity to understand, use, evaluate, reflect on, and engage with texts in order to achieve goals, develop knowledge and potential, and participate in society.

Reading Competencies:

2.3 Science Assessment

PISA science literacy assesses the ability to engage with science-related issues and with the ideas of science, as a reflective citizen. It includes understanding natural phenomena, designing scientific enquiry, and interpreting evidence.

Science Competencies:

3. Sampling Design

3.1 Target Population

PISA targets students who are between 15 years 3 months and 16 years 2 months at the time of assessment, regardless of their grade level. This age range was chosen because students at this age are approaching the end of compulsory schooling in most OECD countries.

3.2 Two-Stage Stratified Sampling

PISA employs a sophisticated two-stage stratified sampling design:

  1. Stage 1 - School Sampling: Schools are sampled with probability proportional to size (PPS), where size is the number of eligible 15-year-old students. Typically, 150-200 schools per country are selected.
  2. Stage 2 - Student Sampling: Within each selected school, approximately 35 students are randomly sampled from the complete list of eligible 15-year-old students.
Sampling Standards:

3.3 Sampling Weights

PISA provides several types of sampling weights to ensure representative estimates:

4. The learningtower R Package

4.1 Package Overview

This application uses data processed through the learningtower R package (Vaughan et al., 2021), which provides harmonized, analysis-ready PISA data in a consistent format.

learningtower Package Benefits:

4.2 Package Installation

The learningtower package is available on CRAN:

install.packages("learningtower")
library(learningtower)

4.3 Data Access via learningtower

The package provides easy access to PISA data:

# Load all student data for a specific year
data_2018 <- load_student(2018)

# Load data for specific countries
data_usa <- load_student(2018, countries = "USA")

# Access codebook
codebook <- load_codebook()

4.4 Package Citation

If you use data from this application, please cite the learningtower package:

Vaughan, B., Stanke, L., Teng, T., Hyndman, R., & O'Hara-Wild, E. (2021). learningtower: OECD PISA datasets from 2000-2018 in an easy-to-use format (R package version 1.0.1). https://CRAN.R-project.org/package=learningtower

5. Data Structure in This Application

5.1 JSON Chunk Format

This application pre-generates 513 country-year specific JSON files (e.g., USA_2018.json) for efficient progressive loading. Each chunk contains:

{
  "country": "USA",
  "year": 2018,
  "n_students": 4838,
  "data_quality": {
    "missing_math": 0,
    "missing_reading": 0,
    "missing_science": 0,
    "missing_escs": 0,
    "complete_cases": 4838
  },
  "students": [
    {
      "student_id": "USA_2018_00001",
      "math": 498.5,
      "reading": 505.2,
      "science": 502.9,
      "escs": 0.23,
      "gender": "male",
      "age": 15.5,
      ...
    }
  ]
}

5.2 File Organization

5.3 Metadata File

The application also includes a metadata.json file that catalogs all available data:

{
  "countries": ["ALB", "ARG", "AUS", ...],
  "years": [2000, 2003, 2006, 2009, 2012, 2015, 2018, 2022],
  "variables": {
    "math": "Mathematics achievement score",
    "reading": "Reading achievement score",
    "science": "Science achievement score",
    "escs": "PISA index of economic, social and cultural status",
    ...
  }
}

6. Variable Codebook

6.1 Achievement Variables

math Mathematics achievement score (plausible value 1)
reading Reading achievement score (plausible value 1)
science Science achievement score (plausible value 1)

Note: PISA uses plausible values to account for measurement error. This application uses the first plausible value (PV1) for each domain for simplicity. Advanced analyses should consider all 10 plausible values.

6.2 Socioeconomic Status Variables

escs PISA index of economic, social and cultural status (standardized to mean 0, SD 1 across OECD countries)
wealth Family wealth index derived from household possessions
books Number of books at home (categorical: 0-10, 11-25, 26-100, 101-200, 201-500, 500+)

6.3 Parental Education Variables

mother_educ Mother's education level (ISCED classification)
father_educ Father's education level (ISCED classification)
parent_edu Highest parental education (years of schooling)

6.4 Demographic Variables

gender Student gender (male/female)
age Student age in years
computer Has computer at home (yes/no)

6.5 Sampling Weights

w_fstuwt Final student weight (use for most analyses)
w_fsenwt Senate weight (equal country weighting)

6.6 Identification Variables

country Country code (ISO 3166-1 alpha-3, e.g., USA, DEU, JPN)
year PISA assessment year (2000-2022)
student_id Unique student identifier
school_id School identifier (for clustering standard errors)

7. Official OECD Data Access

7.1 OECD PISA Data Portal

The official source for all PISA data is the OECD PISA Data Portal:

Main Portal: https://www.oecd.org/pisa/data/

7.2 Cycle-Specific Data

7.3 Technical Documentation

8. Data Quality and Limitations

8.1 Strengths

8.2 Limitations

8.3 Missing Data

PISA data contain missing values due to:

This application uses complete-case analysis by default. Advanced users should consider multiple imputation methods for handling missing data.

9. Countries Included in This Application

9.1 OECD Countries

The following 38 OECD countries are available (availability varies by year):

Australia (AUS)

Austria (AUT)

Belgium (BEL)

Canada (CAN)

Chile (CHL)

Colombia (COL)

Costa Rica (CRI)

Czech Republic (CZE)

Denmark (DNK)

Estonia (EST)

Finland (FIN)

France (FRA)

Germany (DEU)

Greece (GRC)

Hungary (HUN)

Iceland (ISL)

Ireland (IRL)

Israel (ISR)

Italy (ITA)

Japan (JPN)

Korea (KOR)

Latvia (LVA)

Lithuania (LTU)

Luxembourg (LUX)

Mexico (MEX)

Netherlands (NLD)

New Zealand (NZL)

Norway (NOR)

Poland (POL)

Portugal (PRT)

Slovak Republic (SVK)

Slovenia (SVN)

Spain (ESP)

Sweden (SWE)

Switzerland (CHE)

Turkey (TUR)

United Kingdom (GBR)

United States (USA)

9.2 Partner Countries/Economies

An additional 60+ partner countries and economies are available, including:

Albania (ALB)

Argentina (ARG)

Brazil (BRA)

Bulgaria (BGR)

China (CHN)*

Croatia (HRV)

Hong Kong (HKG)

India (IND)*

Indonesia (IDN)

Jordan (JOR)

Kazakhstan (KAZ)

Macao (MAC)

Malaysia (MYS)

Peru (PER)

Qatar (QAT)

Romania (ROU)

Russia (RUS)

Serbia (SRB)

Singapore (SGP)

Chinese Taipei (TWN)

Thailand (THA)

Uruguay (URY)

Vietnam (VNM)

* Note: Some countries participate through specific regions or provinces rather than nationally

10. References and Further Reading

10.1 Primary Sources

  1. OECD. (2023). PISA 2022 database. Organisation for Economic Co-operation and Development. https://www.oecd.org/pisa/data/
  2. OECD. (2019). PISA 2018 technical report. OECD Publishing. Technical Report
  3. Vaughan, B., Stanke, L., Teng, T., Hyndman, R., & O'Hara-Wild, E. (2021). learningtower: OECD PISA datasets from 2000-2018 in an easy-to-use format (R package version 1.0.1). CRAN

10.2 Methodological References

  1. OECD. (2009). PISA data analysis manual: SPSS (2nd ed.). OECD Publishing. https://doi.org/10.1787/9789264056275-en
  2. OECD. (2017). PISA 2015 assessment and analytical framework. OECD Publishing. https://doi.org/10.1787/9789264281820-en

10.3 Documentation in This Application

← Back to Application