Methodology

Educational Stratification in PISA
Statistical Methods and Technical Documentation

This documentation describes the statistical methods used in the tool to study the intergenerational transmission of educational achievement—specifically, how parental characteristics (education, occupational status, and household wealth, captured through the ESCS index) relate to children's academic performance at age 15 in mathematics, reading, and science across more than 100 countries worldwide.

1. Data Sources

1.1 PISA Programme

This application uses data from the Programme for International Student Assessment (PISA), a triennial international survey coordinated by the Organisation for Economic Co-operation and Development (OECD). PISA assesses 15-year-old students' proficiency in reading, mathematics, and science, and has been conducted every three years since 2000 (OECD, 2019).

PISA employs a two-stage stratified sampling design. In the first stage, schools are sampled with probability proportional to size. In the second stage, students within schools are randomly selected. This design ensures representative samples of the target population in each participating country while maintaining statistical efficiency (OECD, 2017).

1.2 learningtower R Package

Data are accessed via the learningtower R package (Wang et al., 2024), which provides cleaned and harmonized PISA data from 2000-2022 in an easy-to-use format. The package standardizes variable names across assessment cycles and handles missing data codes according to OECD specifications.

1.3 Assessment Cycles

The current implementation includes data from eight PISA cycles:

PISA 2000
PISA 2003
PISA 2006
PISA 2009
PISA 2012
PISA 2015
PISA 2018
PISA 2022

2. Variables

2.1 Achievement Scores

PISA does not administer a single test to all students. Instead, students receive booklets containing different combinations of test items. To account for this design, PISA uses plausible values – a set of imputed values that represent the range of abilities a student might have demonstrated had they taken the full assessment (OECD, 2009).

Note: The current implementation uses the first plausible value for each domain for simplicity. Full analyses should use all plausible values and combine results using Rubin's rules (Rubin, 1987).

Achievement scores are scaled to have an international mean of 500 and standard deviation of 100 in the first assessment cycle when a domain was the major focus.

2.2 Economic, Social and Cultural Status (ESCS): Measuring Family Background

The ESCS index is PISA's composite measure of family socioeconomic status—the primary variable for studying intergenerational transmission of educational achievement. It captures how parental characteristics relate to children's outcomes through three components:

Highest parental occupation (HISEI): International Socio-Economic Index of Occupational Status, measuring the occupational prestige and economic rewards associated with parents' jobs. Based on the International Standard Classification of Occupations (ISCO).
Highest parental education (PARED): Years of schooling completed by the most educated parent, coded using ISCED (International Standard Classification of Education) levels.
Home possessions (HOMEPOS): An index of household wealth, cultural possessions (books, art), and home educational resources (desk, computer, internet access). This captures both material resources and cultural capital available to the student.

These components are combined using principal components analysis and standardized to have an OECD mean of 0 and standard deviation of 1 (OECD, 2017). When analyzing ESCS gradients or quartile gaps, researchers are directly measuring the strength of intergenerational transmission—how strongly these parental characteristics predict their children's academic performance at age 15.

ESCS_i = PCA(HISEI_i, PARED_i, HOMEPOS_i) where: HISEI_i = highest parental occupation for student i PARED_i = highest parental education for student i HOMEPOS_i = home possessions index for student i

2.3 Sampling Weights

PISA provides multiple types of weights to account for the complex sampling design:

Weight Type	Variable Name	Purpose
Final Student Weight	`W_FSTUWT`	Represents national student population; accounts for selection probability and nonresponse
Senate Weight	`W_FSENWT`	Equally weights countries; each country contributes equally regardless of population
Replicate Weights	`W_FSTR1-W_FSTR80`	Used for balanced repeated replication (BRR) standard error estimation

The default analysis uses final student weights (W_FSTUWT) following OECD technical standards (OECD, 2023).

3. Statistical Methods

3.1 Weighted Descriptive Statistics

All descriptive statistics account for sampling weights. The weighted mean is calculated as:

μ̂_w = (Σ w_i · y_i) / (Σ w_i) where: y_i = achievement score for student i w_i = sampling weight for student i μ̂_w = weighted mean

The weighted variance is:

σ̂²_w = [Σ w_i · (y_i - μ̂_w)²] / (Σ w_i)

Weighted quantiles are computed by sorting observations by achievement score and finding the value at which the cumulative sum of weights reaches the target percentile times the total weight.

3.2 Distributional Measures

These measures summarize the spread of achievement distributions within countries. While not direct measures of intergenerational transmission, they provide context for understanding achievement variation and can be compared across countries and over time.

Gini Coefficient

The Gini coefficient measures dispersion in achievement distribution, ranging from 0 (all students have identical scores) to 1 (maximum dispersion). For a sorted vector of achievement scores:

G = [2 · Σ(i · y_i)] / (n · Σ y_i) - (n + 1) / n where: y_i = achievement scores sorted in ascending order i = rank of observation (1, 2, ..., n) n = sample size

Coefficient of Variation

CV = σ / μ where σ is standard deviation and μ is mean

Percentile Ratios

The P90/P10 ratio compares the 90th percentile to the 10th percentile:

P90/P10 = Q_0.90 / Q_0.10

3.3 ESCS Gradient: The Core Measure of Intergenerational Transmission

The ESCS gradient (β) is the central statistic for quantifying intergenerational transmission of educational achievement. It measures how many score points of achievement are associated with a one-unit increase in family socioeconomic status—directly capturing how strongly parental education, occupation, and wealth predict children's academic outcomes. It is the slope from a weighted bivariate regression:

Y_i = α + β · ESCS_i + ε_i where: Y_i = achievement score (math, reading, or science) ESCS_i = family socioeconomic status index β = ESCS gradient (intergenerational transmission coefficient) ε_i = residual error A larger β indicates stronger intergenerational transmission—children's achievement is more strongly predicted by their parents' characteristics.

The weighted least squares estimator is:

β̂ = [Σ w_i · (ESCS_i - ESCS̄_w) · (Y_i - Ȳ_w)] / [Σ w_i · (ESCS_i - ESCS̄_w)²]

3.4 Regression Models

Pooled OLS

The pooled ordinary least squares model treats all observations as independent:

Y_ij = β₀ + β₁·ESCS_ij + β₂·X_ij + ε_ij where: Y_ij = achievement for student i in country j ESCS_ij = socioeconomic status X_ij = vector of control variables (gender, parental education, etc.) ε_ij ~ N(0, σ²)

Fixed Effects Model

The fixed effects model includes country-specific intercepts (α_j) to control for all time-invariant country characteristics:

Y_ijt = α_j + γ_t + β₁·ESCS_ijt + β₂·X_ijt + ε_ijt where: α_j = country fixed effects (j = 1, ..., J countries) γ_t = year fixed effects (t = 1, ..., T years)

Country fixed effects are implemented using dummy variables with one country as the reference category.

Random Effects Model

The random effects model assumes country intercepts are drawn from a distribution:

Y_ij = β₀ + u_j + β₁·ESCS_ij + β₂·X_ij + ε_ij where: u_j ~ N(0, τ²) = random country effect ε_ij ~ N(0, σ²) = individual error Total variance: Var(Y) = τ² + σ²

Random effects are estimated using quasi-demeaning (Wooldridge, 2010):

Y_ij* = Y_ij - θ · Ȳ_j where θ = 1 - √[σ² / (σ² + n_j·τ²)]

3.5 Model Selection and Diagnostics

Information Criteria

Model comparison uses information-theoretic criteria that balance goodness-of-fit with parsimony:

Akaike Information Criterion (AIC): AIC = n · log(RSS/n) + 2k Bayesian Information Criterion (BIC): BIC = n · log(RSS/n) + k · log(n) where: n = number of observations k = number of parameters (including intercept) RSS = residual sum of squares

Interpretation: Lower AIC/BIC values indicate better model fit. BIC penalizes model complexity more heavily than AIC (k·log(n) vs 2k), making it more conservative for large samples. AIC tends to select more complex models, while BIC favors parsimony (Burnham & Anderson, 2004).

Residual Diagnostics

Regression assumptions are validated through graphical diagnostics:

Residual vs Fitted Plot: Detects violations of homoscedasticity (constant variance). Under the assumption ε ~ N(0, σ²), residuals should be randomly scattered around zero without systematic patterns. Fanning patterns indicate heteroscedasticity, which may require robust standard errors or variance-stabilizing transformations.

Q-Q Plot (Quantile-Quantile Plot): Tests residual normality by comparing sample quantiles against theoretical normal quantiles. The standardized residuals are computed as:

z_i = (e_i - ē) / SD(e) where: e_i = residual for observation i ē = mean residual (should be ≈ 0) SD(e) = standard deviation of residuals

Points lying close to the 45-degree reference line indicate normally distributed residuals. Systematic deviations (S-curves, fat tails) suggest departures from normality. Theoretical quantiles are computed using the inverse normal CDF approximation (Beasley-Springer-Moro algorithm).

Hausman Specification Test: Tests whether random effects assumptions hold by comparing FE and RE estimates (Hausman, 1978):

H = (β̂_FE - β̂_RE)' · [Var(β̂_FE) - Var(β̂_RE)]^-1 · (β̂_FE - β̂_RE) ~ χ²_k H₀: E(u_j | X) = 0 (RE is consistent) H₁: E(u_j | X) ≠ 0 (only FE is consistent)

If p < 0.05, reject random effects in favor of fixed effects.

3.6 Variance Decomposition

The intraclass correlation coefficient (ICC) partitions total variance into within-country and between-country components:

ICC (ρ) = τ² / (τ² + σ²) where: τ² = between-country variance σ² = within-country variance

The ICC represents the proportion of total variance in achievement that is due to differences between countries. An ICC of 0.15 means 15% of variance is between countries and 85% is within countries.

3.7 ESCS Quartile Gaps: Family Background Differences in Achievement

Quartile gaps provide an intuitive measure of intergenerational transmission by comparing achievement between students from different family backgrounds. The Q4–Q1 gap shows the score-point difference between children from the most advantaged (top ESCS quartile) versus least advantaged (bottom ESCS quartile) families:

Gap (Q4-Q1) = Ȳ_Q4 - Ȳ_Q1 Effect Size (Cohen's d) = (Ȳ_Q4 - Ȳ_Q1) / SD_pooled where: Ȳ_Q4 = weighted mean for top SES quartile Ȳ_Q1 = weighted mean for bottom SES quartile SD_pooled = √[(SD_Q4² + SD_Q1²) / 2]

Effect sizes are interpreted using Cohen's conventions: small (0.2), medium (0.5), large (0.8) (Cohen, 1988).

3.8 Standard Errors

For most analyses, standard errors are computed using classical formulas with sampling weights. For robust inference, the application supports balanced repeated replication (BRR) using the 80 replicate weights provided by PISA:

SE(θ̂) = √[Σ_r=1^R (θ̂_r - θ̂)² / R] where: θ̂ = estimate using final weights θ̂_r = estimate using replicate weight r R = 80 replicate weights

4. Assumptions and Limitations

4.1 Causal Inference

Important: PISA is a cross-sectional observational study. All analyses report associations, not causal effects. The ESCS gradient measures how strongly parental characteristics (education, occupation, wealth) predict children's achievement at age 15, but does not prove that family background causes differences in achievement. Unmeasured confounders (e.g., genetic factors, peer effects, school quality, neighborhood characteristics) may explain observed associations. ESCS gradients should be interpreted as descriptive measures of intergenerational transmission patterns, not as causal estimates of the effect of changing family socioeconomic status.

4.2 Missing Data

The current implementation uses listwise deletion for missing values. Students with missing data on any variable in the analysis are excluded. This approach is appropriate when data are missing completely at random (MCAR) but may introduce bias if data are missing at random (MAR) or not at random (MNAR).

OECD recommends multiple imputation for handling missing data in PISA (OECD, 2009), which is not yet implemented in this tool.

4.3 Sampling Design

All analyses account for sampling weights, which adjust for unequal selection probabilities and nonresponse. However, the current implementation does not account for clustering (students within schools) when computing standard errors, which may underestimate uncertainty. For publication-quality inference, cluster-robust standard errors or BRR should be used.

4.4 Measurement Error

Achievement scores are measured with error. PISA addresses this through plausible values, but the current implementation uses only the first plausible value. Full uncertainty quantification requires analyzing all plausible values and combining results (Rubin, 1987).

4.5 Comparability Across Cycles

While PISA attempts to maintain comparability across assessment cycles through linking procedures, changes in test design, sampling, and country participation may affect trend estimates. Caution is advised when interpreting changes over time, particularly for countries that altered their sampling procedures.

4.6 Generalizability

PISA results are representative of 15-year-old students enrolled in school in participating countries. Results do not generalize to:

Out-of-school youth (particularly relevant in countries with low enrollment rates)
Students younger or older than 15 years
Adult populations
Non-participating countries

5. Software Implementation

5.1 Statistical Libraries

Analyses are implemented in JavaScript using the following libraries:

jStat: Statistical distributions (t-distribution, χ² distribution) for p-values
simple-statistics: Basic statistical functions (mean, variance, quantiles)
Plotly.js: Interactive visualizations

5.2 Numerical Stability

Regression models use ridge-regularized normal equations to improve numerical stability when computing (X'X)⁻¹:

β̂_ridge = (X'WX + λI)⁻¹X'WY where λ = 10⁻¹⁰ · trace(X'WX)

This prevents singular matrix errors when country fixed effects induce near-collinearity.

5.3 Validation

Statistical functions have been validated against R implementations using identical datasets. Key validation tests:

Weighted mean and variance match R's weighted.mean() and cov.wt()
Regression coefficients match R's lm() with weights
Gini coefficients match R's ineq package

6. References

Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Lawrence Erlbaum Associates.

Mislevy, R. J., Beaton, A. E., Kaplan, B., & Sheehan, K. M. (1992). Estimating population characteristics from sparse matrix samples of item responses. Journal of Educational Measurement, 29(2), 133–161. https://doi.org/10.1111/j.1745-3984.1992.tb00371.x

OECD (2009). PISA Data Analysis Manual: SPSS, Second Edition. OECD Publishing, Paris. https://doi.org/10.1787/9789264056275-en

OECD (2017). PISA 2015 Technical Report. OECD Publishing, Paris. https://www.oecd.org/pisa/data/2015-technical-report/

OECD (2019). PISA 2018 Assessment and Analytical Framework. OECD Publishing, Paris. https://doi.org/10.1787/b25efab8-en

OECD (2023). PISA 2022 Technical Report. OECD Publishing, Paris. https://www.oecd.org/pisa/data/pisa2022technicalreport/

Reardon, S. F. (2011). The widening academic achievement gap between the rich and the poor: New evidence and possible explanations. In R. Murnane & G. Duncan (Eds.), Whither Opportunity? Rising Inequality, Schools, and Children's Life Chances (pp. 91–116). Russell Sage Foundation.

Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys. John Wiley & Sons.

Wang, K., Yacobellis, P., Siregar, E., Romanes, S., Fitter, K., Dalla Riva, G. V., Cook, D., Tierney, N., Dingorkar, P., Sai Subramanian, S., & Chen, G. (2024). learningtower: OECD PISA datasets from 2000–2022 in an easy-to-use format (R package, Version 1.1.0). https://doi.org/10.32614/CRAN.package.learningtower

Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data (2nd ed.). MIT Press.

Wu, M. (2005). The role of plausible values in large-scale surveys. Studies in Educational Evaluation, 31(2–3), 114–128. https://doi.org/10.1016/j.stueduc.2005.05.005

Wuyts, C. (2024). The measurement of socio-economic status in PISA. OECD.

← Back to Application