Educational Stratification in PISA
Statistical Methods and Technical Documentation
This documentation describes the statistical methods used in the tool to study the intergenerational transmission of educational achievement—specifically, how parental characteristics (education, occupational status, and household wealth, captured through the ESCS index) relate to children's academic performance at age 15 in mathematics, reading, and science across more than 100 countries worldwide.
This application uses data from the Programme for International Student Assessment (PISA), a triennial international survey coordinated by the Organisation for Economic Co-operation and Development (OECD). PISA assesses 15-year-old students' proficiency in reading, mathematics, and science, and has been conducted every three years since 2000 (OECD, 2019).
PISA employs a two-stage stratified sampling design. In the first stage, schools are sampled with probability proportional to size. In the second stage, students within schools are randomly selected. This design ensures representative samples of the target population in each participating country while maintaining statistical efficiency (OECD, 2017).
Data are accessed via the learningtower R package (Wang et al., 2024), which provides cleaned and harmonized PISA data from 2000-2022 in an easy-to-use format. The package standardizes variable names across assessment cycles and handles missing data codes according to OECD specifications.
The current implementation includes data from eight PISA cycles:
PISA does not administer a single test to all students. Instead, students receive booklets containing different combinations of test items. To account for this design, PISA uses plausible values – a set of imputed values that represent the range of abilities a student might have demonstrated had they taken the full assessment (OECD, 2009).
Achievement scores are scaled to have an international mean of 500 and standard deviation of 100 in the first assessment cycle when a domain was the major focus.
The ESCS index is PISA's composite measure of family socioeconomic status—the primary variable for studying intergenerational transmission of educational achievement. It captures how parental characteristics relate to children's outcomes through three components:
These components are combined using principal components analysis and standardized to have an OECD mean of 0 and standard deviation of 1 (OECD, 2017). When analyzing ESCS gradients or quartile gaps, researchers are directly measuring the strength of intergenerational transmission—how strongly these parental characteristics predict their children's academic performance at age 15.
PISA provides multiple types of weights to account for the complex sampling design:
| Weight Type | Variable Name | Purpose |
|---|---|---|
| Final Student Weight | W_FSTUWT |
Represents national student population; accounts for selection probability and nonresponse |
| Senate Weight | W_FSENWT |
Equally weights countries; each country contributes equally regardless of population |
| Replicate Weights | W_FSTR1-W_FSTR80 |
Used for balanced repeated replication (BRR) standard error estimation |
The default analysis uses final student weights (W_FSTUWT) following OECD technical standards (OECD, 2023).
All descriptive statistics account for sampling weights. The weighted mean is calculated as:
The weighted variance is:
Weighted quantiles are computed by sorting observations by achievement score and finding the value at which the cumulative sum of weights reaches the target percentile times the total weight.
These measures summarize the spread of achievement distributions within countries. While not direct measures of intergenerational transmission, they provide context for understanding achievement variation and can be compared across countries and over time.
The Gini coefficient measures dispersion in achievement distribution, ranging from 0 (all students have identical scores) to 1 (maximum dispersion). For a sorted vector of achievement scores:
The P90/P10 ratio compares the 90th percentile to the 10th percentile:
The ESCS gradient (β) is the central statistic for quantifying intergenerational transmission of educational achievement. It measures how many score points of achievement are associated with a one-unit increase in family socioeconomic status—directly capturing how strongly parental education, occupation, and wealth predict children's academic outcomes. It is the slope from a weighted bivariate regression:
The weighted least squares estimator is:
The pooled ordinary least squares model treats all observations as independent:
The fixed effects model includes country-specific intercepts (αj) to control for all time-invariant country characteristics:
Country fixed effects are implemented using dummy variables with one country as the reference category.
The random effects model assumes country intercepts are drawn from a distribution:
Random effects are estimated using quasi-demeaning (Wooldridge, 2010):
Model comparison uses information-theoretic criteria that balance goodness-of-fit with parsimony:
Interpretation: Lower AIC/BIC values indicate better model fit. BIC penalizes model complexity more heavily than AIC (k·log(n) vs 2k), making it more conservative for large samples. AIC tends to select more complex models, while BIC favors parsimony (Burnham & Anderson, 2004).
Regression assumptions are validated through graphical diagnostics:
Residual vs Fitted Plot: Detects violations of homoscedasticity (constant variance). Under the assumption ε ~ N(0, σ²), residuals should be randomly scattered around zero without systematic patterns. Fanning patterns indicate heteroscedasticity, which may require robust standard errors or variance-stabilizing transformations.
Q-Q Plot (Quantile-Quantile Plot): Tests residual normality by comparing sample quantiles against theoretical normal quantiles. The standardized residuals are computed as:
Points lying close to the 45-degree reference line indicate normally distributed residuals. Systematic deviations (S-curves, fat tails) suggest departures from normality. Theoretical quantiles are computed using the inverse normal CDF approximation (Beasley-Springer-Moro algorithm).
Hausman Specification Test: Tests whether random effects assumptions hold by comparing FE and RE estimates (Hausman, 1978):
If p < 0.05, reject random effects in favor of fixed effects.
The intraclass correlation coefficient (ICC) partitions total variance into within-country and between-country components:
The ICC represents the proportion of total variance in achievement that is due to differences between countries. An ICC of 0.15 means 15% of variance is between countries and 85% is within countries.
Quartile gaps provide an intuitive measure of intergenerational transmission by comparing achievement between students from different family backgrounds. The Q4–Q1 gap shows the score-point difference between children from the most advantaged (top ESCS quartile) versus least advantaged (bottom ESCS quartile) families:
Effect sizes are interpreted using Cohen's conventions: small (0.2), medium (0.5), large (0.8) (Cohen, 1988).
For most analyses, standard errors are computed using classical formulas with sampling weights. For robust inference, the application supports balanced repeated replication (BRR) using the 80 replicate weights provided by PISA:
The current implementation uses listwise deletion for missing values. Students with missing data on any variable in the analysis are excluded. This approach is appropriate when data are missing completely at random (MCAR) but may introduce bias if data are missing at random (MAR) or not at random (MNAR).
OECD recommends multiple imputation for handling missing data in PISA (OECD, 2009), which is not yet implemented in this tool.
All analyses account for sampling weights, which adjust for unequal selection probabilities and nonresponse. However, the current implementation does not account for clustering (students within schools) when computing standard errors, which may underestimate uncertainty. For publication-quality inference, cluster-robust standard errors or BRR should be used.
Achievement scores are measured with error. PISA addresses this through plausible values, but the current implementation uses only the first plausible value. Full uncertainty quantification requires analyzing all plausible values and combining results (Rubin, 1987).
While PISA attempts to maintain comparability across assessment cycles through linking procedures, changes in test design, sampling, and country participation may affect trend estimates. Caution is advised when interpreting changes over time, particularly for countries that altered their sampling procedures.
PISA results are representative of 15-year-old students enrolled in school in participating countries. Results do not generalize to:
Analyses are implemented in JavaScript using the following libraries:
Regression models use ridge-regularized normal equations to improve numerical stability when computing (X'X)⁻¹:
This prevents singular matrix errors when country fixed effects induce near-collinearity.
Statistical functions have been validated against R implementations using identical datasets. Key validation tests:
weighted.mean() and cov.wt()lm() with weightsineq package