Research question
The panel
The union and marriage “premiums” look large in Pooled OLS, but shrink sharply once Fixed Effects removes stable person traits — evidence of selection: higher-wage men sort into unions and marriage. Meanwhile the return to schooling cannot be estimated by FE at all, because education is fixed within a person over these years. Open the Multiverse tab to see the full distribution.
Descriptive statistics
Who identifies the effect? (switchers)
Wage trajectories (real people)
Distribution of log wage
Mean outcome by focal status & year
Where is the variation?
Reading this tab
Trajectories: each faint line is one man's wage path over up to 8 years. Panel data lets us watch the same person change — the engine of Fixed Effects.
Where is the variation? For a regressor to be identified by Fixed Effects, it must vary within people over time. A bar that is almost all “between” (like schooling) cannot be estimated by FE.
Which choices vary across the universe?
More forks
Specification curve
Each dot is one full analysis, sorted from smallest to largest estimate. The lower grid marks which choices that specification used. A wide, choice-driven curve means the result is fragile; a flat curve means it is robust.
Estimator
All five estimators, side by side
Hausman test: FE vs RE
⚙️ Under the hood — the equation being estimated
Regression output —
Residual diagnostics
Do the model's residuals behave? Look for no pattern or funnel shape (left), an approximately symmetric bell (centre), and points on the line (right).
Residuals vs fitted. A flat, patternless band supports linearity and constant variance; curvature or a funnel signals misspecification or heteroskedasticity.
Residual distribution. Should be roughly symmetric and bell-shaped; the dashed curve is a fitted normal density for comparison.
Normal Q–Q. Points on the reference line indicate normally distributed residuals; departures in the tails indicate skew or heavy tails.
Reproduce this specification in code
The point-and-click specification, translated into runnable code for the field's standard tools — the bridge from exploration to a reproducible script.
R · plm
Stata
Python · linearmodels
Download
Multiverse results
Every specification in the most recent build, with estimates, standard errors and 95% intervals.
Figures
The specification curve (build it first on the Specification Curve tab).
Lab report
A self-contained HTML report: data summary, current specification, regression table, multiverse summary and figure.
The data: the NLSY wage panel
545 young men from the U.S. National Longitudinal Survey of Youth, each observed every year from 1980 to 1987 — a balanced panel of 4,360 person-years. The extract comes from Vella & Verbeek (1998) and is distributed in the wooldridge and plm R packages; this lab bundles a tidy copy.
| Variable | Meaning | Varies within person? |
|---|---|---|
| lwage | log hourly wage (the outcome) | yes |
| union | covered by a union contract | yes (men join/leave) |
| married | currently married | yes |
| educ | years of schooling | no — fixed 1980–87 |
| exper, expersq | labour-market experience & its square | yes |
| hours, poorhlth, industry, region | time-varying controls | yes |
| black, hisp | race / ethnicity | no |
The estimators
Each estimator sits at a different point on an identification spectrum, trading bias against what it can estimate at all.
| Estimator | What it uses | Identifying assumption | Removes time-invariant confounding? |
|---|---|---|---|
| Pooled OLS | all variation (within + between) | E[uᵢ | xᵢₜ] = 0 | No |
| Between | person means only (cross-person) | E[uᵢ | x̄ᵢ] = 0 | No |
| Random Effects | GLS weighting of within + between | E[uᵢ | xᵢₜ] = 0 (RE exogeneity) | No |
| Fixed Effects (within) | within-person deviations | E[εᵢₜ | xᵢₜ, uᵢ] = 0 | Yes |
| First Differences | year-to-year changes | E[Δεᵢₜ | Δxᵢₜ] = 0 | Yes |
| Correlated RE (Mundlak) | RE + person-means of regressors | person-mean captures the correlation | Yes (focal coef = FE) |
| Two-way FE | within-person + year effects | E[εᵢₜ | xᵢₜ, uᵢ, λₜ] = 0 | Yes (+ common shocks) |
Why a multiverse?
A single regression hides a decision tree: which estimator, which controls, which sample, which standard errors. Each branch is defensible, yet they can give different answers. A specification curve (Simonsohn, Simmons & Nelson, 2020) or multiverse analysis (Steegen et al., 2016) reports the whole distribution of estimates instead of one cherry-picked number, making analytic flexibility — and its limits — visible.
Inference: from one CI to the whole curve
The lab offers three covariance estimators (classical i.i.d., heteroskedasticity-robust HC1, and cluster-robust by person CR1), an assumption-free cluster bootstrap that block-resamples persons, and a full Hausman test comparing Fixed and Random Effects jointly over their common coefficients. For the multiverse as a whole, a randomization test permutes the focal variable under the sharp null of no effect, recomputes the entire curve many times, and asks how often the null reproduces a median estimate (or count of significant results) as extreme as the one observed.
Verification — honest numerics
Every estimator is implemented from textbook formulas in plain JavaScript and independently reproduced in two languages: R (plm) and Python (linearmodels). Across all 1,440 specifications of the default multiverse, the pooled, between, fixed-effects and first-difference coefficients agree to better than 1×10⁻⁶ (R) and 5×10⁻⁷ (Python); random effects to 3×10⁻⁴. Benchmark focal coefficients (log wage on focal + experience, full sample):
| Effect | Pooled | Between | RE | FE | FD |
|---|---|---|---|---|---|
| Union premium | 0.167 | 0.257 | 0.102 | 0.083 | 0.043 |
| Marriage premium | 0.164 | 0.220 | 0.077 | 0.047 | 0.038 |
| Return to schooling | 0.102 | 0.099 | 0.103 | — | — |
The Hausman test in the Deep-Dive is the full joint test over all common time-varying coefficients (matching R's phtest), computed with classical covariance as the theory requires.
Glossary
- Within variation
- How much a variable changes for the same person over time. Fixed Effects and First Differences use only this.
- Between variation
- Differences in person averages across people. The Between estimator uses only this; Fixed Effects discards it.
- Time-invariant regressor
- A variable that never changes within a person (e.g. years of schooling here). Within estimators cannot identify its effect.
- Selection on unobservables
- When who "gets treated" (e.g. joins a union) is correlated with stable unobserved traits, biasing estimators that use between variation.
- θ (theta)
- The random-effects quasi-demeaning weight: θ=0 gives Pooled OLS, θ→1 approaches Fixed Effects.
- Hausman test
- Compares Fixed- and Random-Effects coefficients; a significant difference favours Fixed Effects (RE's exogeneity assumption fails).
- Specification curve
- The sorted set of estimates across all defensible analytic choices, plotted with the choices that produced each one.
- Researcher degrees of freedom
- The many defensible choices in an analysis whose combination can move the headline result.
How to cite
If you use this lab in teaching or research, please cite it and the underlying data:
The estimators are verified against R's plm and Python's linearmodels; the data ships under the terms of the public-use NLSY extract distributed in the wooldridge and plm packages.
References
- Vella, F. & Verbeek, M. (1998). Whose wages do unions raise? Journal of Applied Econometrics, 13(2), 163–183. (source of the NLSY extract)
- Mundlak, Y. (1978). On the pooling of time series and cross section data. Econometrica, 46(1), 69–85.
- Hausman, J. A. (1978). Specification tests in econometrics. Econometrica, 46(6), 1251–1271.
- Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data (2nd ed.). MIT Press.
- Croissant, Y. & Millo, G. (2008). Panel data econometrics in R: the plm package. Journal of Statistical Software, 27(2).
- Simonsohn, U., Simmons, J. P. & Nelson, L. D. (2020). Specification curve analysis. Nature Human Behaviour, 4, 1208–1214.
- Steegen, S., Tuerlinckx, F., Gelman, A. & Vanpaemel, W. (2016). Increasing transparency through a multiverse analysis. Perspectives on Psychological Science, 11(5), 702–712.