OECD PISA Data Analysis
Educational Stratification in PISA
Explore how parental education, occupation, and wealth relate to student achievement at age 15. Analyze outcomes across 101 countries and 8 PISA cycles using survey-weighted methods.
Rich Data
Country-year microdata with ESCS, achievement scores, and survey weights across 101 countries and 8 PISA cycles (2000-2022)
Advanced Analytics
Survey-weighted regression, ESCS gradients, distributional measures, and variance decomposition aligned with OECD guidance
Global Insights
Cross-national maps and time-series views to compare ESCS gradients, achievement gaps, and trends
Publication Ready
Export reproducible tables, charts, and HTML reports with transparent data lineage
About PISA, ESCS, and Achievement
PISA is the OECD's triennial international assessment of 15-year-old students in reading, mathematics, and science. It measures how well students apply knowledge to real-world problems and supports cross-national comparison. Official site: oecd.org/pisa | Data portal: oecd.org/pisa/data
ESCS (Family Background)
ESCS measures family socioeconomic status through three components: parental education (ISCED levels), parental occupational status (ISEI/HISEI), and home possessions (wealth, books, educational resources). It captures how strongly parental characteristics predict children's outcomes.
Achievement Outcomes
Achievement refers to PISA test scores in mathematics, reading, and science. The app uses the first plausible value (PV1) for each domain and applies sampling weights so estimates represent national student populations.
Learningtower Data Pipeline
We use the learningtower R package to access and harmonize OECD PISA microdata. The pipeline generates country-year JSON chunks that this web app loads on demand for interactive analysis.
Who This Tool Is For
Designed for researchers, students, educators, policy analysts, and journalists who need transparent access to PISA-based analysis of intergenerational transmission without writing bespoke code. It supports open, reproducible workflows by making the data pipeline, methods, and exports auditable.
What You Can Do
Compare ESCS achievement gradients across countries and years. Replicate intergenerational transmission analyses and inspect modeling choices. Generate charts, tables, and reports for publications or teaching.
Open Science Focus
All analyses are based on transparent, open-source code and a documented data pipeline so results can be replicated, extended, or adapted for specific cases.
Getting Started
Start in Data Configuration to choose countries and PISA cycles, select the achievement domain (math, reading, science), and choose the stratification variable (ESCS or parental education). Data load in country-year chunks and the analysis tabs update when loading completes.
Configure Data
Choose countries and years, then set the outcome (math, reading, science) and stratification variable (ESCS or parental education)
Load Data
Load selected data and follow progress as country-year files are fetched and merged in the browser
Explore & Analyze
Use the Overview, Distribution, Achievement Gap, Regression, Diagnostics, and Comparative tabs to study ESCS gradients and trends
Export Results
Export reproducible tables, charts, and HTML reports for papers, presentations, or policy briefs
Quick Links
ESCS: PISA's composite index of socioeconomic background built from parents' education, occupational status, and household resources, standardized across OECD countries
Data Pipeline: This tool uses the learningtower R package to harmonize OECD PISA microdata and generate country-year JSON chunks for on-demand analysis in the browser
Coverage: 101 countries, 8 PISA cycles (2000-2022), 513 country-year combinations
Select countries and years, then choose the achievement domain (math, reading, science) and the stratification variable (ESCS or parental education). Data are sourced from the OECD PISA database via the learningtower R package and loaded on demand as country-year chunks.
Data Selection
⚙️ Analysis Parameters
🚀 Load Data
Click below to load the selected countries and years. Data will be fetched progressively (~5MB per country-year combination).
This dashboard examines the intergenerational transmission of educational achievement—how parental characteristics (education, occupational status, and household wealth, captured through the ESCS index) relate to children's academic performance at age 15. The analysis framework draws on social stratification theory (Bourdieu & Passeron, 1977; Coleman et al., 1966) to quantify how strongly family background predicts student outcomes across countries and over time.
Mean Achievement: Survey-weighted average of test scores using OECD-recommended student weights (W_FSTUWT). Calculated as: μ = Σ(w_i × score_i) / Σ(w_i)
Dispersion Index (Gini): Measures spread of achievement scores from 0 (all students identical) to 1 (maximum dispersion). Higher values indicate greater variation in outcomes.
ESCS Gradient (β): The intergenerational transmission coefficient—shows how many score points of achievement are associated with a one-unit increase in family socioeconomic status. Larger values indicate stronger intergenerational transmission.
Intergenerational Transmission by Country and Year
What this shows: Scatter plot of mean achievement scores versus ESCS gradients for each country-year combination.
Each point represents one country in one year, with country labels displayed.
Interpretation: Points in the upper-right show high achievement and strong intergenerational transmission (family background strongly predicts outcomes).
Points in the lower-left show lower achievement with weaker intergenerational transmission. This visualization helps identify which education systems show more or less mobility in educational outcomes.
Calculation: X-axis = survey-weighted mean score; Y-axis = ESCS gradient (how many score points per unit of family SES)
Distribution analysis reveals patterns of educational achievement across selected countries using box plots, percentile analysis, and Lorenz curves.
Achievement Score Distributions by Country and Year
What this shows: Grouped box plots displaying the distribution of achievement scores for each country-year combination.
Each box shows the interquartile range (IQR, 25th-75th percentiles), with the median marked inside the box. Whiskers extend to 1.5×IQR.
Interpretation: Wider boxes indicate greater within-country variation in achievement. Compare box heights (medians) across countries to see performance differences.
Boxes for the same country across different years reveal temporal trends.
Calculation: Box plot quartiles computed from raw student-level scores, grouped by country and year.
Achievement Percentiles by Country
What this shows: Line chart of score values at the 10th, 25th, 50th (median), 75th, and 90th percentiles for each country.
Interpretation: Steeper lines indicate greater within-country variation. Countries with higher lines across all percentiles have higher overall achievement.
The gap between P90 and P10 shows the achievement range.
Calculation: Percentiles calculated using quantile function on sorted score distributions.
Lorenz Curve: Achievement Distribution
What this shows: Lorenz curves plot cumulative proportion of achievement against cumulative proportion of students (ranked from lowest to highest).
The diagonal dashed line represents perfect equality (all students with identical scores).
Interpretation: Curves further from the diagonal indicate greater dispersion in achievement. The area between a country's curve and the diagonal relates to the Gini coefficient.
Calculation: Students sorted by score; cumulative sums computed for population and achievement shares.
Decomposition of achievement gaps by socioeconomic status across and within countries (Reardon, 2011; Chmielewski, 2019).
Achievement Gap (Q4-Q1): Difference in mean achievement between students in the top SES quartile (Q4) and bottom SES quartile (Q1). Calculated as: Gap = μ_Q4 - μ_Q1, using survey-weighted means for each quartile.
Variance Decomposition: Partitions total achievement variance into within-country and between-country components using ANOVA: σ²_total = σ²_within + σ²_between. Shows how much variation exists within vs. across nations.
Effect Size (Cohen's d): Standardized gap measure calculated as: d = (μ_Q4 - μ_Q1) / σ_pooled, where σ_pooled is the pooled standard deviation. Effect sizes: small (d ≈ 0.2), medium (d ≈ 0.5), large (d ≈ 0.8 or higher).
Select analysis level to view gap statistics and visualizations at different granularities.
OLS, Fixed Effects, and Random Effects regression models examining the relationship between socioeconomic status and achievement, with survey-weighted estimation following OECD (2023) guidelines.
OLS (Pooled): y_i = β₀ + β₁X_i + ε_i. Pools all data, ignoring country clustering. Provides baseline estimate of SES → achievement relationship.
Fixed Effects (FE): y_it = α_i + β₁X_it + ε_it. Includes country fixed effects (dummy variables) to control for time-invariant country differences. Estimates within-country SES effects.
Random Effects (RE): y_it = (α + u_i) + β₁X_it + ε_it. Models country effects as random draws from a distribution. More efficient than FE if country effects are uncorrelated with X.
Hausman Test: Tests FE vs. RE. Null hypothesis: RE is consistent and efficient. If p < 0.05, prefer FE (country effects correlated with predictors).
Select a specific country to view regression scatter plot and coefficient comparison for that country only, or choose "All Countries Combined" for pooled analysis.
Regression Tables: Show coefficients (β), standard errors (SE), t-statistics, p-values, and 95% confidence intervals for each model. Significance indicators: * p < 0.05, ** p < 0.01, *** p < 0.001. Model fit statistics include R², adjusted R², AIC, and BIC (lower values indicate better fit).
Scatter Plot with Fitted Regression Lines
What this shows: Raw data scatter plot overlaid with fitted regression lines from OLS, FE, and RE models.
Each point represents one student's SES (X-axis) and achievement score (Y-axis).
Interpretation: Line slopes show the ESCS gradient (intergenerational transmission strength). Steeper slopes indicate stronger transmission (higher family SES → higher achievement).
Compare line slopes across models to see how controlling for country effects changes the estimated relationship.
Coefficient Comparison Across Models
What this shows: Bar chart comparing the SES coefficient (β₁) across OLS, FE, and RE models.
Error bars show 95% confidence intervals (±1.96 × SE).
Interpretation: If confidence intervals overlap, models produce statistically similar estimates.
Large differences suggest country effects matter. FE typically produces smaller coefficients than OLS if high-SES countries have higher scores.
Statistical tests and metrics to validate regression assumptions and choose the appropriate model specification. Select a country below to analyze its data independently.
1. Assumption Check Summary
Why it matters: Violations can lead to biased coefficients, incorrect standard errors, and invalid p-values.
Interpretation: Pass = assumption met; Caution = minor concerns, results likely still valid; Concern = significant violation, interpret with care.
2. Model Fit Comparison
Key metrics:
• R²: Proportion of variance explained (0-1). Higher = better fit.
• R² Within: Variance explained within groups (countries). Key for panel models.
• R² Between: Variance explained between group means.
• AIC/BIC: Information criteria for model selection. Lower = better. BIC penalizes complexity more.
• ICC (ρ): Intraclass correlation. Proportion of variance at country level. High ICC suggests panel methods needed.
3. Hausman Specification Test
Null hypothesis (H₀): Country effects are uncorrelated with predictors (RE is consistent).
Alternative (H₁): Country effects correlate with predictors (only FE is consistent).
Interpretation:
• p < 0.05 → Reject H₀ → Use Fixed Effects (systematic differences exist).
• p ≥ 0.05 → Fail to reject → Random Effects is acceptable and more efficient.
4. Residual Diagnostics
Key metrics:
• Residual Mean: Should be ≈ 0. Non-zero suggests model misspecification.
• Skewness: Measures asymmetry. Should be ≈ 0 (±1 acceptable).
• Excess Kurtosis: Measures tail heaviness. Should be ≈ 0 (±2 acceptable).
• Jarque-Bera Test: Formal normality test. p < 0.05 rejects normality.
• Variance Ratio: Compares residual variance in upper vs lower fitted values. >1.5 suggests heteroscedasticity.
• Outliers (>3σ): Observations with unusually large residuals. Should be <1% of sample.
5. Influential Observations (Cook's Distance)
What it measures: Combined effect of an observation's leverage (unusual predictor values) and residual size.
Threshold: Cook's D > 4/n indicates potentially influential observations.
Interpretation:
• <5% influential points = typical, no action needed.
• 5-10% = moderate concern, check for data quality issues.
• >10% = investigate whether outliers are driving results.
6. Residual vs Fitted Plot (OLS)
What to look for:
• Good: Random scatter around zero with constant spread.
• Fan shape: Variance increases with fitted values → heteroscedasticity.
• Curved pattern: Nonlinear relationship not captured by model.
• Clusters: May indicate omitted categorical variable.
Cross-country and cross-year comparison of intergenerational transmission metrics, achievement gaps, and distributional measures.
Intergenerational Educational Stratification Map
What this shows: Choropleth world map displaying the ESCS gradient (intergenerational transmission strength) for each country.
Colors range from blue (low gradients, higher educational mobility) to red (high gradients, stronger intergenerational transmission).
Interpretation: Darker red countries have stronger intergenerational transmission—children's achievement is more tightly linked to family background.
Blue countries show higher educational mobility. Hover over countries to see exact gradient values, R², and sample size.
Calculation: For each country, run bivariate OLS: achievement = β₀ + β₁(ESCS) + ε. Map displays β₁ coefficient.
Temporal Trends: ESCS Gradient Over Time
What this shows: Line chart tracking how ESCS gradients (intergenerational transmission strength) change over PISA cycles for each country.
Each line represents one country's trajectory across available years (2000-2022).
Interpretation: Rising lines indicate increasing intergenerational transmission (family background effects growing stronger).
Falling lines show weakening transmission (improving educational mobility). Parallel lines suggest stable patterns.
Countries with at least 2 time points are shown.
Calculation: Separate regression for each country-year combination. Plot β₁ (ESCS coefficient) over time.
Cross-National Achievement Trends
What this shows: Grouped bar chart showing mean achievement scores by country and year.
Bars are grouped by country with different colors representing different PISA cycles.
Interpretation: Compare heights within countries to see temporal changes in achievement.
Compare across countries to identify high and low performers. Look for convergence or divergence patterns.
Calculation: Survey-weighted mean achievement for each country-year combination.
ESCS Achievement Gap Comparison
What this shows: Combined bar/line chart comparing Q4–Q1 ESCS achievement gaps across countries.
Bars show raw score gaps (difference between highest and lowest family background quartiles); the line overlays standardized effect sizes (Cohen's d).
Interpretation: Larger gaps and higher effect sizes indicate stronger intergenerational transmission of educational achievement.
Variance Decomposition: Within vs. Between Countries
What this shows: Bar chart partitioning total achievement variance into within-country and between-country components.
Displays percentages and the intraclass correlation coefficient (ICC = ρ).
Interpretation: High within-country variance suggests more variation in achievement exists within nations than across them.
High between-country variance indicates large cross-national differences in achievement levels.
ICC = σ²_between / (σ²_between + σ²_within) measures the proportion of variance attributable to country differences.
Calculation: One-way ANOVA decomposition: σ²_total = σ²_within + σ²_between. Uses country as grouping variable.
Export analysis results, visualizations, and datasets in various formats for publications, presentations, and further analysis.
📊 Summary Statistics
Export descriptive statistics, distributional measures, and achievement gaps in CSV format.
Includes: mean, SD, Gini, gradients, gaps, effect sizes
📈 Regression Models
Export regression coefficients, standard errors, and fit statistics for all models.
Includes: OLS, FE, RE with SE, t-stats, p-values, AIC/BIC
💾 Dataset
Export the currently loaded dataset with all variables included in your analysis.
Filtered by selected countries/years with survey weights
📸 Visualizations
Download all current visualizations as high-resolution PNG images.
Individual PNG files for each visualization
📄 Full Report
Generate a comprehensive HTML report with all analyses, tables, and visualizations.
Publication-ready report with embedded charts
Documentation on the PISA program, ESCS construction, methodology, data pipeline, citation guidelines, and open-source code.
📊 Data Sources
Primary Source:
OECD Programme for International Student Assessment (PISA).
oecd.org/pisa |
PISA data portal
R Package:
We use learningtower to access and harmonize OECD PISA microdata and generate the country-year files used by this tool.
Vaughan, E., Fung, K., Cook, D., & Wickham, H. (2021).
📐 Methodology
Comprehensive documentation of statistical methods, including:
- Survey-weighted regression (OLS, FE, RE)
- Variance decomposition analysis
- Achievement gap calculations
- Distributional measures (Gini coefficient)
- Diagnostic procedures and assumptions
📰 Publication Article
Academic article introducing the application, with methodological choices and implementation details.
- Abstract and research context
- Formal model specifications
- Validation and limitations
📝 How to Cite
Citation guidelines for academic publications, presentations, and reports.
Educational Stratification in PISA.
https://kevinschoenholzer.com/edustrat/
💻 Source Code
Full source code is available on GitHub under an open-source license.
- JavaScript ES6 modules
- R data processing scripts
- Plotly.js visualizations
- Survey-weighted statistical functions
🧰 Data Pipeline
R scripts for generating country-year chunks, metadata, and validation reports.
- Chunk generation (JSON)
- Metadata catalog creation
- Quality checks and summaries
📚 Key References
- Bourdieu, P., & Passeron, J. C. (1977). Reproduction in Education, Society and Culture.
- Coleman, J. S., et al. (1966). Equality of Educational Opportunity.
- Reardon, S. F. (2011). The widening academic achievement gap between the rich and the poor. Community Investments, 23(2), 19-39.
- Chmielewski, A. K. (2019). The global increase in the socioeconomic achievement gap, 1964 to 2015. American Sociological Review, 84(3), 517-544.
- OECD (2023). PISA 2022 Technical Report. Paris: OECD Publishing.