OECD PISA Data Analysis

Educational Stratification in PISA

Explore how parental education, occupation, and wealth relate to student achievement at age 15. Analyze outcomes across 101 countries and 8 PISA cycles using survey-weighted methods.

📊

Rich Data

Country-year microdata with ESCS, achievement scores, and survey weights across 101 countries and 8 PISA cycles (2000-2022)

📈

Advanced Analytics

Survey-weighted regression, ESCS gradients, distributional measures, and variance decomposition aligned with OECD guidance

🗺️

Global Insights

Cross-national maps and time-series views to compare ESCS gradients, achievement gaps, and trends

💾

Publication Ready

Export reproducible tables, charts, and HTML reports with transparent data lineage

About PISA, ESCS, and Achievement

PISA is the OECD's triennial international assessment of 15-year-old students in reading, mathematics, and science. It measures how well students apply knowledge to real-world problems and supports cross-national comparison. Official site: oecd.org/pisa | Data portal: oecd.org/pisa/data

ESCS (Family Background)

ESCS measures family socioeconomic status through three components: parental education (ISCED levels), parental occupational status (ISEI/HISEI), and home possessions (wealth, books, educational resources). It captures how strongly parental characteristics predict children's outcomes.

Achievement Outcomes

Achievement refers to PISA test scores in mathematics, reading, and science. The app uses the first plausible value (PV1) for each domain and applies sampling weights so estimates represent national student populations.

Learningtower Data Pipeline

We use the learningtower R package to access and harmonize OECD PISA microdata. The pipeline generates country-year JSON chunks that this web app loads on demand for interactive analysis.

Who This Tool Is For

Designed for researchers, students, educators, policy analysts, and journalists who need transparent access to PISA-based analysis of intergenerational transmission without writing bespoke code. It supports open, reproducible workflows by making the data pipeline, methods, and exports auditable.

What You Can Do

Compare ESCS achievement gradients across countries and years. Replicate intergenerational transmission analyses and inspect modeling choices. Generate charts, tables, and reports for publications or teaching.

Open Science Focus

All analyses are based on transparent, open-source code and a documented data pipeline so results can be replicated, extended, or adapted for specific cases.

Getting Started

Start in Data Configuration to choose countries and PISA cycles, select the achievement domain (math, reading, science), and choose the stratification variable (ESCS or parental education). Data load in country-year chunks and the analysis tabs update when loading completes.

Configure Data

Choose countries and years, then set the outcome (math, reading, science) and stratification variable (ESCS or parental education)

Load Data

Load selected data and follow progress as country-year files are fetched and merged in the browser

Explore & Analyze

Use the Overview, Distribution, Achievement Gap, Regression, Diagnostics, and Comparative tabs to study ESCS gradients and trends

Export Results

Export reproducible tables, charts, and HTML reports for papers, presentations, or policy briefs

Quick Links

OECD PISA Site View Source Code

OECD PISA: The Programme for International Student Assessment (PISA) is a triennial assessment of 15-year-olds in reading, mathematics, and science. Official site: oecd.org/pisa | Data portal: oecd.org/pisa/data
ESCS: PISA's composite index of socioeconomic background built from parents' education, occupational status, and household resources, standardized across OECD countries
Data Pipeline: This tool uses the learningtower R package to harmonize OECD PISA microdata and generate country-year JSON chunks for on-demand analysis in the browser
Coverage: 101 countries, 8 PISA cycles (2000-2022), 513 country-year combinations

Data Configuration

Select countries and years, then choose the achievement domain (math, reading, science) and the stratification variable (ESCS or parental education). Data are sourced from the OECD PISA database via the learningtower R package and loaded on demand as country-year chunks.

Welcome! Select countries and years, choose your outcome and stratification settings, then click "Load Selected Data" below to begin.

Data Selection

Years

Loading available years...

Countries

Loading available countries...

⚙️ Analysis Parameters

Outcome Variable

Stratification Variable

Sampling Weights

Following OECD technical standards

Control Variables (for regression models)

Gender (always included) Year

🚀 Load Data

Click below to load the selected countries and years. Data will be fetched progressively (~5MB per country-year combination).

💡 Tip: Start with 1-3 countries to ensure fast loading

Initializing... 0%

Loading data...

Intergenerational Transmission Overview

This dashboard examines the intergenerational transmission of educational achievement—how parental characteristics (education, occupational status, and household wealth, captured through the ESCS index) relate to children's academic performance at age 15. The analysis framework draws on social stratification theory (Bourdieu & Passeron, 1977; Coleman et al., 1966) to quantify how strongly family background predicts student outcomes across countries and over time.

Key Metrics Explained:
Mean Achievement: Survey-weighted average of test scores using OECD-recommended student weights (W_FSTUWT). Calculated as: μ = Σ(w_i × score_i) / Σ(w_i)
Dispersion Index (Gini): Measures spread of achievement scores from 0 (all students identical) to 1 (maximum dispersion). Higher values indicate greater variation in outcomes.
ESCS Gradient (β): The intergenerational transmission coefficient—shows how many score points of achievement are associated with a one-unit increase in family socioeconomic status. Larger values indicate stronger intergenerational transmission.

Mean Achievement Score

Survey-weighted average

Dispersion Index (Gini)

0 = identical, 1 = max spread

ESCS Gradient (β)

Intergenerational transmission strength

Intergenerational Transmission by Country and Year

What this shows: Scatter plot of mean achievement scores versus ESCS gradients for each country-year combination. Each point represents one country in one year, with country labels displayed.
Interpretation: Points in the upper-right show high achievement and strong intergenerational transmission (family background strongly predicts outcomes). Points in the lower-left show lower achievement with weaker intergenerational transmission. This visualization helps identify which education systems show more or less mobility in educational outcomes.
Calculation: X-axis = survey-weighted mean score; Y-axis = ESCS gradient (how many score points per unit of family SES)

Data Source: OECD PISA via learningtower R package | Analysis methodology follows OECD (2023) technical standards

Score Distribution Analysis

Distribution analysis reveals patterns of educational achievement across selected countries using box plots, percentile analysis, and Lorenz curves.

Achievement Score Distributions by Country and Year

What this shows: Grouped box plots displaying the distribution of achievement scores for each country-year combination. Each box shows the interquartile range (IQR, 25th-75th percentiles), with the median marked inside the box. Whiskers extend to 1.5×IQR.
Interpretation: Wider boxes indicate greater within-country variation in achievement. Compare box heights (medians) across countries to see performance differences. Boxes for the same country across different years reveal temporal trends.
Calculation: Box plot quartiles computed from raw student-level scores, grouped by country and year.

Achievement Percentiles by Country

What this shows: Line chart of score values at the 10th, 25th, 50th (median), 75th, and 90th percentiles for each country.
Interpretation: Steeper lines indicate greater within-country variation. Countries with higher lines across all percentiles have higher overall achievement. The gap between P90 and P10 shows the achievement range.
Calculation: Percentiles calculated using quantile function on sorted score distributions.

Lorenz Curve: Achievement Distribution

What this shows: Lorenz curves plot cumulative proportion of achievement against cumulative proportion of students (ranked from lowest to highest). The diagonal dashed line represents perfect equality (all students with identical scores).
Interpretation: Curves further from the diagonal indicate greater dispersion in achievement. The area between a country's curve and the diagonal relates to the Gini coefficient.
Calculation: Students sorted by score; cumulative sums computed for population and achievement shares.

Achievement Gap Decomposition

Decomposition of achievement gaps by socioeconomic status across and within countries (Reardon, 2011; Chmielewski, 2019).

Gap Measures Explained:
Achievement Gap (Q4-Q1): Difference in mean achievement between students in the top SES quartile (Q4) and bottom SES quartile (Q1). Calculated as: Gap = μ_Q4 - μ_Q1, using survey-weighted means for each quartile.
Variance Decomposition: Partitions total achievement variance into within-country and between-country components using ANOVA: σ²_total = σ²_within + σ²_between. Shows how much variation exists within vs. across nations.
Effect Size (Cohen's d): Standardized gap measure calculated as: d = (μ_Q4 - μ_Q1) / σ_pooled, where σ_pooled is the pooled standard deviation. Effect sizes: small (d ≈ 0.2), medium (d ≈ 0.5), large (d ≈ 0.8 or higher).

Gap Analysis Level:

Select analysis level to view gap statistics and visualizations at different granularities.

Regression Analysis

OLS, Fixed Effects, and Random Effects regression models examining the relationship between socioeconomic status and achievement, with survey-weighted estimation following OECD (2023) guidelines.

Model Specifications:
OLS (Pooled): y_i = β₀ + β₁X_i + ε_i. Pools all data, ignoring country clustering. Provides baseline estimate of SES → achievement relationship.
Fixed Effects (FE): y_it = α_i + β₁X_it + ε_it. Includes country fixed effects (dummy variables) to control for time-invariant country differences. Estimates within-country SES effects.
Random Effects (RE): y_it = (α + u_i) + β₁X_it + ε_it. Models country effects as random draws from a distribution. More efficient than FE if country effects are uncorrelated with X.
Hausman Test: Tests FE vs. RE. Null hypothesis: RE is consistent and efficient. If p < 0.05, prefer FE (country effects correlated with predictors).

Visualization Filter:

Select a specific country to view regression scatter plot and coefficient comparison for that country only, or choose "All Countries Combined" for pooled analysis.

Regression Tables: Show coefficients (β), standard errors (SE), t-statistics, p-values, and 95% confidence intervals for each model. Significance indicators: * p < 0.05, ** p < 0.01, *** p < 0.001. Model fit statistics include R², adjusted R², AIC, and BIC (lower values indicate better fit).

Scatter Plot with Fitted Regression Lines

What this shows: Raw data scatter plot overlaid with fitted regression lines from OLS, FE, and RE models. Each point represents one student's SES (X-axis) and achievement score (Y-axis).
Interpretation: Line slopes show the ESCS gradient (intergenerational transmission strength). Steeper slopes indicate stronger transmission (higher family SES → higher achievement). Compare line slopes across models to see how controlling for country effects changes the estimated relationship.

Coefficient Comparison Across Models

What this shows: Bar chart comparing the SES coefficient (β₁) across OLS, FE, and RE models. Error bars show 95% confidence intervals (±1.96 × SE).
Interpretation: If confidence intervals overlap, models produce statistically similar estimates. Large differences suggest country effects matter. FE typically produces smaller coefficients than OLS if high-SES countries have higher scores.

Model Diagnostics

Statistical tests and metrics to validate regression assumptions and choose the appropriate model specification. Select a country below to analyze its data independently.

Select Country

Select a country to view its regression diagnostics

1. Assumption Check Summary

Purpose: Quick overview of key OLS regression assumptions.
Why it matters: Violations can lead to biased coefficients, incorrect standard errors, and invalid p-values.
Interpretation: Pass = assumption met; Caution = minor concerns, results likely still valid; Concern = significant violation, interpret with care.

2. Model Fit Comparison

Purpose: Compare goodness-of-fit across OLS, Fixed Effects, and Random Effects models.
Key metrics:
• R²: Proportion of variance explained (0-1). Higher = better fit.
• R² Within: Variance explained within groups (countries). Key for panel models.
• R² Between: Variance explained between group means.
• AIC/BIC: Information criteria for model selection. Lower = better. BIC penalizes complexity more.
• ICC (ρ): Intraclass correlation. Proportion of variance at country level. High ICC suggests panel methods needed.

3. Hausman Specification Test

Purpose: Determine whether Fixed Effects or Random Effects is more appropriate.
Null hypothesis (H₀): Country effects are uncorrelated with predictors (RE is consistent).
Alternative (H₁): Country effects correlate with predictors (only FE is consistent).
Interpretation:
• p < 0.05 → Reject H₀ → Use Fixed Effects (systematic differences exist).
• p ≥ 0.05 → Fail to reject → Random Effects is acceptable and more efficient.

4. Residual Diagnostics

Purpose: Assess normality and distributional properties of model residuals.
Key metrics:
• Residual Mean: Should be ≈ 0. Non-zero suggests model misspecification.
• Skewness: Measures asymmetry. Should be ≈ 0 (±1 acceptable).
• Excess Kurtosis: Measures tail heaviness. Should be ≈ 0 (±2 acceptable).
• Jarque-Bera Test: Formal normality test. p < 0.05 rejects normality.
• Variance Ratio: Compares residual variance in upper vs lower fitted values. >1.5 suggests heteroscedasticity.
• Outliers (>3σ): Observations with unusually large residuals. Should be <1% of sample.

5. Influential Observations (Cook's Distance)

Purpose: Identify observations that disproportionately influence regression coefficients.
What it measures: Combined effect of an observation's leverage (unusual predictor values) and residual size.
Threshold: Cook's D > 4/n indicates potentially influential observations.
Interpretation:
• <5% influential points = typical, no action needed.
• 5-10% = moderate concern, check for data quality issues.
• >10% = investigate whether outliers are driving results.

6. Residual vs Fitted Plot (OLS)

Purpose: Visual check for linearity and homoscedasticity assumptions.
What to look for:
• Good: Random scatter around zero with constant spread.
• Fan shape: Variance increases with fitted values → heteroscedasticity.
• Curved pattern: Nonlinear relationship not captured by model.
• Clusters: May indicate omitted categorical variable.

Comparative Analysis

Cross-country and cross-year comparison of intergenerational transmission metrics, achievement gaps, and distributional measures.

Intergenerational Educational Stratification Map

What this shows: Choropleth world map displaying the ESCS gradient (intergenerational transmission strength) for each country. Colors range from blue (low gradients, higher educational mobility) to red (high gradients, stronger intergenerational transmission).
Interpretation: Darker red countries have stronger intergenerational transmission—children's achievement is more tightly linked to family background. Blue countries show higher educational mobility. Hover over countries to see exact gradient values, R², and sample size.
Calculation: For each country, run bivariate OLS: achievement = β₀ + β₁(ESCS) + ε. Map displays β₁ coefficient.

Temporal Trends: ESCS Gradient Over Time

What this shows: Line chart tracking how ESCS gradients (intergenerational transmission strength) change over PISA cycles for each country. Each line represents one country's trajectory across available years (2000-2022).
Interpretation: Rising lines indicate increasing intergenerational transmission (family background effects growing stronger). Falling lines show weakening transmission (improving educational mobility). Parallel lines suggest stable patterns. Countries with at least 2 time points are shown.
Calculation: Separate regression for each country-year combination. Plot β₁ (ESCS coefficient) over time.

Cross-National Achievement Trends

What this shows: Grouped bar chart showing mean achievement scores by country and year. Bars are grouped by country with different colors representing different PISA cycles.
Interpretation: Compare heights within countries to see temporal changes in achievement. Compare across countries to identify high and low performers. Look for convergence or divergence patterns.
Calculation: Survey-weighted mean achievement for each country-year combination.

ESCS Achievement Gap Comparison

What this shows: Combined bar/line chart comparing Q4–Q1 ESCS achievement gaps across countries. Bars show raw score gaps (difference between highest and lowest family background quartiles); the line overlays standardized effect sizes (Cohen's d).
Interpretation: Larger gaps and higher effect sizes indicate stronger intergenerational transmission of educational achievement.

Variance Decomposition: Within vs. Between Countries

What this shows: Bar chart partitioning total achievement variance into within-country and between-country components. Displays percentages and the intraclass correlation coefficient (ICC = ρ).
Interpretation: High within-country variance suggests more variation in achievement exists within nations than across them. High between-country variance indicates large cross-national differences in achievement levels. ICC = σ²_between / (σ²_between + σ²_within) measures the proportion of variance attributable to country differences.
Calculation: One-way ANOVA decomposition: σ²_total = σ²_within + σ²_between. Uses country as grouping variable.

Export Results

Export analysis results, visualizations, and datasets in various formats for publications, presentations, and further analysis.

📊 Summary Statistics

Export descriptive statistics, distributional measures, and achievement gaps in CSV format.

Includes: mean, SD, Gini, gradients, gaps, effect sizes

📈 Regression Models

Export regression coefficients, standard errors, and fit statistics for all models.

Includes: OLS, FE, RE with SE, t-stats, p-values, AIC/BIC

💾 Dataset

Export the currently loaded dataset with all variables included in your analysis.

Filtered by selected countries/years with survey weights

📸 Visualizations

Download all current visualizations as high-resolution PNG images.

Individual PNG files for each visualization

📄 Full Report

Generate a comprehensive HTML report with all analyses, tables, and visualizations.

Publication-ready report with embedded charts

💡 Tip: Load data and run analyses first. Export functions will use the most recent analysis results from each tab.

Documentation & Resources

Documentation on the PISA program, ESCS construction, methodology, data pipeline, citation guidelines, and open-source code.

📊 Data Sources

Primary Source:

OECD Programme for International Student Assessment (PISA).
oecd.org/pisa | PISA data portal

R Package:

We use learningtower to access and harmonize OECD PISA microdata and generate the country-year files used by this tool.
Vaughan, E., Fung, K., Cook, D., & Wickham, H. (2021).

📖 View Full Data Documentation

📐 Methodology

Comprehensive documentation of statistical methods, including:

Survey-weighted regression (OLS, FE, RE)
Variance decomposition analysis
Achievement gap calculations
Distributional measures (Gini coefficient)
Diagnostic procedures and assumptions

📖 View Methodology Guide

📰 Publication Article

Academic article introducing the application, with methodological choices and implementation details.

Abstract and research context
Formal model specifications
Validation and limitations

📖 Read the Article

📝 How to Cite

Citation guidelines for academic publications, presentations, and reports.

Schoenholzer, K. (2026).
Educational Stratification in PISA.
https://kevinschoenholzer.com/edustrat/

📖 View Citation Guidelines

💻 Source Code

Full source code is available on GitHub under an open-source license.

JavaScript ES6 modules
R data processing scripts
Plotly.js visualizations
Survey-weighted statistical functions

🔗 View on GitHub

🧰 Data Pipeline

R scripts for generating country-year chunks, metadata, and validation reports.

Chunk generation (JSON)
Metadata catalog creation
Quality checks and summaries

📖 View Pipeline Guide

📚 Key References

Social Stratification Theory

Bourdieu, P., & Passeron, J. C. (1977). Reproduction in Education, Society and Culture.
Coleman, J. S., et al. (1966). Equality of Educational Opportunity.

Achievement Gap Analysis

Reardon, S. F. (2011). The widening academic achievement gap between the rich and the poor. Community Investments, 23(2), 19-39.
Chmielewski, A. K. (2019). The global increase in the socioeconomic achievement gap, 1964 to 2015. American Sociological Review, 84(3), 517-544.

Technical Standards

OECD (2023). PISA 2022 Technical Report. Paris: OECD Publishing.