Econometrics 101

ImpactMojoEconometrics 101www.impactmojo.in

ImpactMojo 101 Series · Free Forever

Econometrics
101

From Correlation to Credible Causation — a Foundational Course on Estimating Causal Effects for Development Practitioners in South Asia

Research-BackedSouth Asia Focus100 SlidesFree Access

ImpactMojoEconometrics 101www.impactmojo.in

Agenda

What We Cover

01

What Econometrics Is

Slides 3–10

02

The Causal Question & the Counterfactual

Slides 11–19

03

OLS Regression, Properly Understood

Slides 20–28

04

Endogeneity & Omitted-Variable Bias

Slides 29–37

05

Randomised Experiments / RCTs

Slides 38–46

06

Instrumental Variables

Slides 47–55

07

Difference-in-Differences

Slides 56–64

08

Regression Discontinuity

Slides 65–73

09

Panel Data & Fixed Effects

Slides 74–82

10

Reading & Critiquing Results

Slides 83–92

11

Tools & Further Reading

Slides 93–99

ImpactMojoEconometrics 101www.impactmojo.in

01

Section One

What Econometrics Is

ImpactMojoEconometrics 101www.impactmojo.in

Definition

Econometrics, defined

Econometrics is where economics, statistics and real-world data meet. Its central ambition is not just to describe the world but to estimate causal effects — what would happen if we changed something.

Econometrics

The application of statistical methods to economic and social data in order to test theories and, above all, to measure the causal effect of a policy, programme or treatment on an outcome of interest.

You do not need heavy mathematics to think like an econometrician. You need to ask, relentlessly: compared to what?

ImpactMojoEconometrics 101www.impactmojo.in

Three Ingredients

Economics + statistics + data

Economics

A theory of how people and markets behave — what to look for and why

Statistics

Tools to separate signal from noise and quantify uncertainty

Data

Surveys, censuses, admin records, experiments — the evidence itself

Take away any one ingredient and you get something less: theory without data is speculation; data without theory is pattern-hunting.

ImpactMojoEconometrics 101www.impactmojo.in

The Core Job

The question behind every study

Almost every econometric study is, at heart, answering one policy question: does X cause Y, and by how much?

01

Does a cash transfer raise school attendance?

→

02

Does a new road increase farm incomes?

→

03

Does microcredit lift consumption?

→

04

Does a midday meal cut child anaemia?

Each is a causal question. The whole discipline exists because answering it honestly is genuinely hard.

ImpactMojoEconometrics 101www.impactmojo.in

The Trap

Correlation is not causation

Two things moving together — districts with more bank branches having higher incomes — does not mean one caused the other. The link could run the other way, or a third factor could drive both.

Whenever two variables are correlated, at least four explanations are possible — and only one of them is 'X causes Y'.

— a working principle of causal inference

ImpactMojoEconometrics 101www.impactmojo.in

Why Naive Comparisons Fail

The comparison that fools you

Suppose villages with a microfinance branch have higher incomes than villages without one. Tempting conclusion: microfinance works. But branches were not placed at random — lenders chose more promising villages to begin with.

The income gap mixes the effect of microfinance with the pre-existing differences between the two kinds of village. Econometrics exists to pull these apart.

ImpactMojoEconometrics 101www.impactmojo.in

Description vs Causation

Two very different goals

Descriptive / predictive

Who is poor? Where is dropout highest? What will demand be next year? Correlations are enough here.

Causal

Will this programme reduce dropout? Here you must imagine a world without the programme — a counterfactual.

Most policy questions are causal. That is why this course spends most of its time on causal methods.

ImpactMojoEconometrics 101www.impactmojo.in

The Credibility Revolution

How the field changed

From the 1990s onwards, econometrics shifted from elaborate models toward research designs that mimic experiments — the 'credibility revolution'. The 2019 and 2021 Nobel prizes recognised RCTs and natural-experiment methods used heavily in development.

2019

Nobel: Banerjee, Duflo & Kremer for the experimental approach to poverty

Sveriges Riksbank Prize

2021

Nobel: Card, Angrist & Imbens for natural experiments & causal methods

Sveriges Riksbank Prize

ImpactMojoEconometrics 101www.impactmojo.in

02

Section Two

The Causal Question & the Counterfactual

ImpactMojoEconometrics 101www.impactmojo.in

The Big Idea

Compared to what?

A causal effect is always a comparison: the outcome with the treatment versus the outcome that would have occurred without it, for the very same unit, at the same time.

Counterfactual

What would have happened to a treated unit had it not been treated. It is never observed — which is the whole difficulty of causal inference.

ImpactMojoEconometrics 101www.impactmojo.in

Potential Outcomes

Two possible futures for each unit

The potential-outcomes framework imagines, for each person i, two outcomes:

Y₋(1)

outcome if person i receives the treatment

Y₋(0)

outcome if person i does NOT receive it

The individual causal effect is the difference Y₋(1) − Y₋(0). It is exactly what we want — and exactly what we can never see.

ImpactMojoEconometrics 101www.impactmojo.in

The Core Obstacle

The fundamental problem of causal inference

For any one person we observe only one outcome: either the treated state or the untreated state, never both. The other is forever counterfactual.

The fundamental problem of causal inference: we can never observe both Y₋(1) and Y₋(0) for the same unit. Causal inference is, in essence, a missing-data problem.

ImpactMojoEconometrics 101www.impactmojo.in

The Workaround

We estimate averages, not individuals

Since the individual effect is unknowable, we aim instead for the average treatment effect across a group — the mean of Y(1) minus the mean of Y(0) for a population.

Average Treatment Effect (ATE)

The average of the individual causal effects across all units in a population: what the treatment does on average, even though no single person's effect is observed.

The trick is finding a credible stand-in for the unobserved counterfactual outcome of the treated group.

ImpactMojoEconometrics 101www.impactmojo.in

The Naive Estimate

Why a simple group difference goes wrong

We compare treated people's outcomes with untreated people's outcomes. But the untreated are different people — their average Y(0) need not equal the treated group's Y(0).

Observed gap = True effect (ATT) + Selection bias
Selection bias = how treated & untreated differ in Y(0) before any treatment

ImpactMojoEconometrics 101www.impactmojo.in

Selection Bias

The villain of the whole course

Selection bias

The systematic difference in (untreated) outcomes between those who get a treatment and those who do not, arising because they are not comparable to begin with.

Healthier people exercise more; richer farmers adopt new seeds first; motivated students attend coaching. Compare them with everyone else and you measure who they were, not what the treatment did.

ImpactMojoEconometrics 101www.impactmojo.in

An Example

Do hospitals make people sicker?

People who visited a hospital last year report worse health than people who did not. Does the hospital harm them? Of course not — sick people go to hospital. The comparison is contaminated by who selects in.

This cartoon example is the same logic that wrecks naive evaluations of training, microcredit and health camps. Selection is everywhere.

ImpactMojoEconometrics 101www.impactmojo.in

The Strategy

Every method is a counterfactual strategy

The rest of this course is a toolkit of research designs, each a different way to construct a credible counterfactual — a comparison group that plausibly shows what would have happened anyway.

01

RCTs: randomisation makes groups comparable

→

02

IV: an external nudge mimics random assignment

→

03

DiD: a comparison group tracks the trend

→

04

RD: units just above/below a cutoff are alike

ImpactMojoEconometrics 101www.impactmojo.in

03

Section Three

OLS Regression, Properly Understood

ImpactMojoEconometrics 101www.impactmojo.in

The Workhorse

Regression: fitting a line through data

Ordinary Least Squares (OLS) finds the straight line that minimises the sum of squared vertical distances between the line and the data points. It is the workhorse of applied economics.

Y = α + βX + ε
outcome = intercept + slope × predictor + error

ImpactMojoEconometrics 101www.impactmojo.in

See It

A fitted regression line

Years of schooling vs monthly wage — OLS line of best fit

Illustrative

The slope says how much wage rises, on average, per extra year of schooling — in this data. Whether that slope is causal is a separate question entirely.

ImpactMojoEconometrics 101www.impactmojo.in

What Regression Really Is

The conditional expectation function

Conditional Expectation Function (CEF)

E[Y | X] — the average value of the outcome Y for each value of the predictor X. OLS gives the best straight-line approximation to this function.

Read a regression as a machine that answers: for units with this value of X, what is the average Y? Nothing more is guaranteed.

ImpactMojoEconometrics 101www.impactmojo.in

Reading a Coefficient

What a slope coefficient means

A coefficient β on X is the predicted change in Y associated with a one-unit increase in X, holding the other included variables fixed.

Two load-bearing words: 'associated' (not necessarily caused) and 'included' (only the variables you put in the model are held fixed — not the ones you left out).

ImpactMojoEconometrics 101www.impactmojo.in

Multiple Regression

Controlling for other variables

Adding controls — Y = α + βX + γZ + ε — lets β describe the X–Y link among units with the same Z. This is how we try to compare like with like.

But you can only control for what you measure. The variables you cannot observe — ability, motivation, soil quality — are precisely the ones that cause trouble.

ImpactMojoEconometrics 101www.impactmojo.in

Units & Interpretation

Levels, logs and dummies

Form	Coefficient reads as	Common use
Y on X (levels)	ΔY in Y-units per 1-unit ΔX	Most variables
log Y on X	approx. % change in Y per 1-unit ΔX	Wages, income
log Y on log X	elasticity: % ΔY per 1% ΔX	Demand, output
Y on a dummy (0/1)	gap in mean Y between the two groups	Treated vs control

Knowing the functional form tells you how to translate a coefficient into a sentence a programme officer understands.

ImpactMojoEconometrics 101www.impactmojo.in

When OLS Is Trustworthy

Gauss–Markov & BLUE

Under a set of assumptions — the Gauss–Markov conditions — OLS is BLUE: the Best Linear Unbiased Estimator. Among all linear unbiased methods, it has the smallest variance.

Best: lowest variance among linear unbiased estimators
Linear: it is a linear function of the data
Unbiased: on average it hits the true value
Estimator: a recipe for guessing the parameter

ImpactMojoEconometrics 101www.impactmojo.in

The Crucial Caveat

Unbiased ≠ causal

The key Gauss–Markov assumption is that the error term is uncorrelated with X (exogeneity). If something in the error — an omitted cause — correlates with X, OLS is biased and the coefficient is not the causal effect.

This single assumption is where most of applied econometrics lives or dies. The next section is entirely about how it breaks.

ImpactMojoEconometrics 101www.impactmojo.in

04

Section Four

Endogeneity & Omitted-Variable Bias

ImpactMojoEconometrics 101www.impactmojo.in

The Central Threat

Endogeneity, defined

Endogeneity

A situation where the explanatory variable X is correlated with the error term — that is, with something that also affects Y but is left out of the model. It makes the OLS coefficient biased and non-causal.

When X is exogenous, OLS recovers the causal effect. When X is endogenous, it does not. Diagnosing endogeneity is the practitioner's core skill.

ImpactMojoEconometrics 101www.impactmojo.in

Three Sources

Where endogeneity comes from

Omitted variables

A common cause of both X and Y is left out

Reverse causality

Y also affects X — the arrow runs both ways

Measurement error

X is recorded with noise, biasing its coefficient

All three break the exogeneity assumption. We take them one at a time.

ImpactMojoEconometrics 101www.impactmojo.in

OVB

Omitted-variable bias, illustrated

Wage vs schooling — the same data split by (omitted) ability

Illustrative

Within each ability group the schooling slope is gentle. Pooled, a steep line appears — because higher-ability people get both more schooling and higher wages. OLS credits schooling for ability's effect.

ImpactMojoEconometrics 101www.impactmojo.in

Sign of the Bias

Which way does OVB push?

The direction of omitted-variable bias depends on two signs: how the omitted variable Z relates to X, and how Z relates to Y.

Z → X	Z → Y	Bias on β
+	+	Upward (too big)
−	−	Upward (too big)
+	−	Downward (too small)
−	+	Downward (too small)

Even when you cannot fix OVB, you can often reason about its direction — and so bound how wrong your estimate might be.

ImpactMojoEconometrics 101www.impactmojo.in

Reverse Causality

When the arrow runs both ways

Do more police cause more crime? Cross-section data often shows a positive correlation — because cities with more crime hire more police. Causation runs from Y to X.

01

More crime (Y)

→

02

leads cities to hire more police (X)

→

03

so X and Y correlate positively

→

04

even if police actually reduce crime

ImpactMojoEconometrics 101www.impactmojo.in

Measurement Error

Noise in X biases toward zero

When the predictor X is measured with random error — self-reported income, recalled expenditure — its coefficient is pulled toward zero. This is attenuation bias.

Counter-intuitive but important: messy measurement of X usually makes a real effect look smaller than it is, not larger. Noise in Y, by contrast, mainly inflates uncertainty.

ImpactMojoEconometrics 101www.impactmojo.in

The False Comfort of Controls

More controls is not a cure

It is tempting to believe that adding enough control variables removes bias. But you can only control for what you observe and measure. Unobservables — motivation, ability, local governance — remain in the error.

Worse, controlling for the wrong variable — something caused by the treatment, or a collider — can introduce bias. Controls are a scalpel, not a sledgehammer.

ImpactMojoEconometrics 101www.impactmojo.in

The Way Out

Design beats adjustment

Because we can never be sure we have controlled for every confounder, credible causal work relies on a research design that creates exogenous variation in X — variation unrelated to the unobservables.

The way to estimate a causal effect is not to control for everything, but to find variation in the treatment that is as good as random.

— the design-based view of econometrics

ImpactMojoEconometrics 101www.impactmojo.in

05

Section Five

Randomised Experiments / RCTs

ImpactMojoEconometrics 101www.impactmojo.in

The Gold Standard

Randomisation solves selection

In a randomised controlled trial (RCT), units are assigned to treatment or control by a coin flip. On average the two groups are identical in everything — observed and unobserved — except the treatment.

Because assignment is independent of potential outcomes, the control group is a credible counterfactual. Selection bias is designed away.

ImpactMojoEconometrics 101www.impactmojo.in

Why It Works

Balance in expectation

Randomisation does not make any two specific people identical. It makes the groups statistically equivalent, so their average Y(0) is the same. The control group's outcome stands in for the treated group's missing counterfactual.

Effect = mean Y(treated) − mean Y(control)
and with randomisation, selection bias ≈ 0

ImpactMojoEconometrics 101www.impactmojo.in

Check the Balance

A balance table

Baseline characteristics — treatment vs control (should be similar)

Illustrative balance check

Good randomisation produces near-identical groups at baseline. A balance table is the first thing to check in any RCT — it is the evidence that the design worked.

ImpactMojoEconometrics 101www.impactmojo.in

J-PAL

RCTs in development: J-PAL & the field

The Abdul Latif Jameel Poverty Action Lab (J-PAL), founded in 2003, popularised RCTs in development. Hundreds of trials — many in India — have tested deworming, remedial teaching, immunisation incentives, and more.

A landmark example: Pratham's 'Teaching at the Right Level' remedial-education model was refined and scaled through a sequence of RCTs across Indian states.

ImpactMojoEconometrics 101www.impactmojo.in

Growth

The rise of development RCTs

Cumulative development RCTs registered (stylised, illustrative trend)

Illustrative — stylised to show the trend, not exact counts

The exact numbers here are illustrative, but the shape is real: development RCTs grew explosively after the mid-2000s.

ImpactMojoEconometrics 101www.impactmojo.in

Two Kinds of Validity

Internal vs external validity

Internal validity

Is the estimated effect causally correct for this sample? RCTs are strong here — their headline virtue.

External validity

Will it hold elsewhere — other states, scales, populations? RCTs are often weak here.

A perfectly clean trial in one district may not generalise. 'It worked in Rajasthan' is not 'it will work in Bihar'.

ImpactMojoEconometrics 101www.impactmojo.in

Real-World Wrinkles

What can still go wrong

Threat	What happens	Fix / response
Attrition	Treated & control drop out differently	Track everyone; bound effects
Spillovers	Control units affected by treatment	Randomise at cluster level
Non-compliance	Assigned but don't take treatment	Analyse by assignment (ITT)
Hawthorne effects	Being watched changes behaviour	Blinding where possible

Intention-to-treat (ITT) — analysing people by the group they were assigned to — preserves the randomisation even when compliance is imperfect.

ImpactMojoEconometrics 101www.impactmojo.in

Ethics

Is it ethical to randomise a benefit?

Randomise when there is genuine uncertainty about whether the programme works (equipoise)
Use waitlists or phased roll-outs so the control group eventually benefits
Never withhold a known, proven, life-saving treatment to run a trial
Secure informed consent and ethics-board (IRB) approval

Scarce budgets mean not everyone can be served at once anyway. A lottery for limited places can be both fair and a clean experiment.

ImpactMojoEconometrics 101www.impactmojo.in

06

Section Six

Instrumental Variables

ImpactMojoEconometrics 101www.impactmojo.in

When You Can't Randomise

Borrowing randomness from nature

Often you cannot run an experiment — the treatment already happened, or randomising is impossible. An instrumental variable (IV) finds a source of variation in X that is 'as good as random'.

Instrumental variable (instrument)

A variable Z that shifts the treatment X but affects the outcome Y only through X. It isolates the part of X that is unrelated to the confounders.

ImpactMojoEconometrics 101www.impactmojo.in

The Logic

Use only the exogenous part of X

01

Instrument Z (as-good-as-random)

→

02

shifts treatment X

→

03

X changes the outcome Y

→

04

Z affects Y ONLY through X

IV throws away the endogenous, confounded variation in X and keeps only the clean variation driven by Z. That clean slice yields a causal estimate.

ImpactMojoEconometrics 101www.impactmojo.in

Two Conditions

An instrument must satisfy BOTH

1. Relevance

Z must actually shift X — a real, strong first-stage relationship between instrument and treatment. Testable in the data.

2. Exclusion restriction

Z must affect Y only through X — no other pathway, no direct effect, uncorrelated with the error. Untestable; argued, not proven.

BOTH are required. Relevance you can check; the exclusion restriction you must defend with theory and institutional knowledge — it is where most IV claims succeed or fail.

ImpactMojoEconometrics 101www.impactmojo.in

A Classic Example

Rainfall as an instrument

To study whether economic downturns fuel conflict, researchers have used rainfall shocks as an instrument for agricultural income in rain-fed economies. Rain is plausibly random year to year.

Relevance: rainfall strongly affects farm income (first stage)
Exclusion: rainfall is argued to affect outcomes only via income — the part you must defend
Caveat: if rain also affects, say, mobility or disease directly, exclusion fails

ImpactMojoEconometrics 101www.impactmojo.in

Another Example

Distance, sib-sex & quarter of birth

Instrument (Z)	Treatment (X)	Exclusion argument
Distance to a school/college	Years of schooling	Distance affects wages only via schooling
Quarter of birth	Years of schooling	Birth-month is arbitrary, tied to school-start laws
Sex composition of first 2 kids	Having a 3rd child	Sex mix is random, shifts fertility

Each is clever — and each has been challenged on exclusion grounds. A good IV invites scrutiny of the one assumption you cannot test.

ImpactMojoEconometrics 101www.impactmojo.in

What IV Estimates

A local effect, for compliers

IV does not recover the average effect for everyone. It recovers the Local Average Treatment Effect (LATE) — the effect for the compliers, those whose treatment status is moved by the instrument.

So 'the IV estimate' answers a specific question: the effect on people the instrument actually nudged. Different instruments can give different — both correct — LATEs.

ImpactMojoEconometrics 101www.impactmojo.in

Weak Instruments

The danger of a weak first stage

If Z only weakly predicts X (a weak instrument), IV estimates become wildly imprecise and can be more biased than plain OLS — even tiny exclusion violations get amplified.

Rule of thumb: report the first-stage F-statistic; a common (rough) threshold is F > 10. A weak instrument is worse than no instrument at all.

ImpactMojoEconometrics 101www.impactmojo.in

Reading IV Critically

Three questions for any IV study

Is it relevant? Is the first stage strong (high F)?
Is exclusion plausible? What is the story for 'only through X', and what would break it?
Whose effect is it? Who are the compliers — and do you care about them?

A persuasive IV paper spends most of its words defending the exclusion restriction, not running the regression.

ImpactMojoEconometrics 101www.impactmojo.in

07

Section Seven

Difference-in-Differences

ImpactMojoEconometrics 101www.impactmojo.in

The Idea

Before-and-after, with a comparison group

Difference-in-Differences (DiD) studies a policy that hits one group but not another. It compares the change in the treated group with the change in an untreated comparison group.

Difference-in-Differences

An estimator that subtracts the before–after change in a comparison group from the before–after change in the treated group, netting out both fixed group differences and common time trends.

ImpactMojoEconometrics 101www.impactmojo.in

The Two Differences

Why subtract twice?

Difference 1

Treated group: after − before (removes fixed traits of the group)

Difference 2

Comparison group: after − before (captures what would have happened anyway)

The DiD estimate is Difference 1 − Difference 2. The comparison group's change is the counterfactual trend for the treated group.

ImpactMojoEconometrics 101www.impactmojo.in

See It

A difference-in-differences plot

Outcome over time — treated vs comparison, policy at 'After'

Illustrative

The DiD effect is the gap between the treated group's actual outcome (58) and its counterfactual (46) — about 12 points. The dashed red line is the assumed parallel trend.

ImpactMojoEconometrics 101www.impactmojo.in

The Key Assumption

Parallel trends — not equal levels

DiD is valid only if, absent the policy, the two groups would have moved in parallel — the same trend over time. The groups need NOT start at the same level.

Common error: thinking DiD requires the groups to be identical before treatment. It does not. It requires their trends to be parallel — a statement about slopes, not levels.

ImpactMojoEconometrics 101www.impactmojo.in

Defending It

How to support parallel trends

Plot pre-treatment trends: did the groups move together before the policy?
Run a placebo / event-study check on pre-periods
Choose a comparison group as similar as possible to the treated one
Be honest: parallel trends is an assumption, never fully provable

Parallel pre-trends do not prove parallel counterfactual trends — but their absence is a serious warning sign.

ImpactMojoEconometrics 101www.impactmojo.in

Natural Experiments

When policy creates the design

DiD shines with natural experiments — policies rolled out to some states/districts and not others, or at different times. The staggered roll-out supplies the treatment and comparison groups.

Indian examples: the phased district roll-out of NREGA (2006–08), or state-level reforms introduced in some states before others, are natural settings for DiD.

ImpactMojoEconometrics 101www.impactmojo.in

When It Breaks

Threats to a DiD design

Threat	What it does	Watch for
Diverging trends	Groups were drifting apart anyway	Non-parallel pre-trends
Other shocks	A second event hits only one group	Concurrent policies
Composition change	Who is in each group shifts over time	Migration, attrition
Anticipation	Behaviour changes before the policy	Pre-period jumps

Recent methods literature also warns that staggered roll-outs with two-way fixed effects can mislead if effects vary over time — use modern DiD estimators.

ImpactMojoEconometrics 101www.impactmojo.in

Reading DiD Critically

Questions for any DiD study

Did the authors show parallel pre-trends?
Is the comparison group genuinely comparable?
Could another shock have hit only one group at the same time?
With staggered timing, did they use an appropriate modern estimator?

DiD is powerful and intuitive — which is exactly why its one assumption deserves the hardest scrutiny.

ImpactMojoEconometrics 101www.impactmojo.in

08

Section Eight

Regression Discontinuity

ImpactMojoEconometrics 101www.impactmojo.in

The Idea

Assignment by an arbitrary cutoff

Many programmes use a threshold rule: a scholarship for scores above 60, a poverty scheme for those below a deprivation score. Regression Discontinuity (RD) exploits that sharp cutoff.

Regression Discontinuity (RD)

A design that compares units just above and just below a cutoff on a 'running variable'. Near the threshold, who lands on which side is essentially random, so the two sides are comparable.

ImpactMojoEconometrics 101www.impactmojo.in

Why It Works

Just-above ≈ just-below

A student scoring 59 and one scoring 61 are, in every meaningful way, alike — ability, background, motivation. Yet one gets the programme and the other does not. The cutoff manufactures a local experiment.

The jump in the outcome at the threshold — a discontinuity that nothing else can explain — is the causal effect of the programme.

ImpactMojoEconometrics 101www.impactmojo.in

See It

A jump at the cutoff

Outcome vs running variable — treatment assigned above the cutoff (50)

Illustrative

The vertical jump at the cutoff (50) — roughly 41 to 53 — is the estimated effect. The smooth slope on each side is the relationship that would hold without any jump.

ImpactMojoEconometrics 101www.impactmojo.in

Sharp vs Fuzzy

Two flavours of RD

Sharp RD

Crossing the cutoff perfectly determines treatment — everyone above is treated, everyone below is not.

Fuzzy RD

Crossing the cutoff only raises the probability of treatment. The jump in take-up is used like an instrument.

Fuzzy RD is essentially IV at the threshold: the cutoff instruments for actual treatment.

ImpactMojoEconometrics 101www.impactmojo.in

Local Validity

RD gives a LOCAL effect

RD estimates the effect only at the cutoff — for units near the threshold. It says little about people far from it.

Key caveat: the RD effect is local. A scholarship's effect for students scoring 59–61 may differ entirely from its effect for those scoring 90. Do not over-generalise the jump.

ImpactMojoEconometrics 101www.impactmojo.in

The Main Threat

Manipulation of the running variable

RD fails if people can precisely manipulate which side of the cutoff they land on — an examiner nudging a 59 to a 61, a household mis-reporting assets to qualify.

Diagnostic: check for bunching — a suspicious pile-up of cases just on the favourable side of the cutoff (a McCrary density test). Smoothness across the threshold is the credibility test.

ImpactMojoEconometrics 101www.impactmojo.in

Practical Choices

Bandwidth and functional form

Bandwidth: how wide a window around the cutoff to use — narrow is cleaner but noisier
Functional form: fit flexible curves each side; beware high-order polynomials that invent jumps
Covariate smoothness: other variables should NOT jump at the cutoff — a useful placebo check

Good RD work shows the estimate is robust to the bandwidth choice, not an artefact of one window.

ImpactMojoEconometrics 101www.impactmojo.in

Where RD Fits

RD in development practice

RD is ideal wherever eligibility hinges on a score or threshold: poverty-line targeting (BPL cutoffs, SECC deprivation scores), exam-based scholarships, population thresholds that trigger a facility or grant.

Because eligibility rules are everywhere in Indian welfare programmes, RD is often the most natural — and most credible — design available to an evaluator.

ImpactMojoEconometrics 101www.impactmojo.in

09

Section Nine

Panel Data & Fixed Effects

ImpactMojoEconometrics 101www.impactmojo.in

Following Units Over Time

What panel data buys you

Panel data tracks the same units — households, districts, firms — across multiple periods. This repeated observation lets us net out stable, unchanging differences between units.

Panel (longitudinal) data

Data on the same set of units observed at two or more points in time, combining a cross-section with a time dimension.

ImpactMojoEconometrics 101www.impactmojo.in

The Core Trick

Each unit becomes its own control

With fixed effects, we compare each unit to itself over time. Anything about the unit that stays constant — and so cannot explain changes — is swept out of the comparison.

Yᵢₜ = βXᵢₜ + αᵢ + δₜ + εᵢₜ
αᵢ = unit fixed effect  δₜ = time fixed effect

ImpactMojoEconometrics 101www.impactmojo.in

The Within Estimator

Fixed effects use within-unit variation

The fixed-effects (or within) estimator subtracts each unit's own average from every observation, so β is identified only from how a unit changes relative to itself over time.

Differences between units — rich vs poor district, fertile vs arid land — are discarded. Only the within-unit story remains, and that is what removes time-invariant confounders.

ImpactMojoEconometrics 101www.impactmojo.in

The Crucial Limit

Fixed effects remove only TIME-INVARIANT confounders

Unit fixed effects control for everything about a unit that is constant over time — geography, culture, fixed institutions — even if you never measured it.

Critical caveat: they do nothing about confounders that change over time. A district-specific shock that moves with the treatment will still bias β. Fixed effects are not a magic exogeneity machine.

ImpactMojoEconometrics 101www.impactmojo.in

Fixed vs Random Effects

Two ways to model the unit term

	Fixed effects	Random effects
Assumes	Unit term may correlate with X	Unit term uncorrelated with X
Uses	Within-unit variation only	Within + between variation
Robust to	Time-invariant confounding	More efficient if assumption holds
Safer when	You fear omitted unit traits	Strong, often unrealistic

For causal work where you worry about unobserved unit traits, fixed effects is usually the safer default — it makes the weaker assumption.

ImpactMojoEconometrics 101www.impactmojo.in

First Differences

A close cousin: differencing

With two periods, first-differencing — regressing the change in Y on the change in X — removes the fixed unit term just as fixed effects do. (DiD is exactly this idea with a comparison group.)

Fixed effects, first differences and DiD are a family: all exploit repeated observation to subtract away stable, unobserved differences between units.

ImpactMojoEconometrics 101www.impactmojo.in

Standard Errors

Why you must cluster

In panel data, a unit's observations are correlated across time — this year looks like last year. Ignoring that makes standard errors far too small and 'significance' spurious.

Fix: cluster the standard errors at the unit level (e.g. by district or village). Clustering acknowledges that observations within a group are not independent — honest uncertainty, not inflated confidence.

ImpactMojoEconometrics 101www.impactmojo.in

Reading Panel Work

Questions for a fixed-effects study

Are both unit and time fixed effects included where needed?
Could a time-varying confounder still drive the result?
Are standard errors clustered at the right level?
Is β identified from credible within-unit variation, or a few odd cases?

Fixed effects buy a lot — but remember what they cannot buy: protection from confounders that move over time.

ImpactMojoEconometrics 101www.impactmojo.in

10

Section Ten

Reading & Critiquing Results

ImpactMojoEconometrics 101www.impactmojo.in

Uncertainty

Standard errors quantify sampling noise

Standard error

A measure of how much an estimate would vary across repeated random samples. It quantifies sampling uncertainty — how precisely the effect is pinned down — not whether the design is valid.

A small standard error says the number is precise. It says nothing about whether the number is right — a biased design gives precisely wrong answers.

ImpactMojoEconometrics 101www.impactmojo.in

Confidence Intervals

Report a range, not just a point

A coefficient of 0.12 is shorthand. The honest version is a confidence interval — say [0.04, 0.20] — the range of effects consistent with the data at, usually, 95% confidence.

If a 95% interval comfortably includes zero, the data cannot rule out 'no effect'. Always read the interval, not just the point estimate or the stars.

ImpactMojoEconometrics 101www.impactmojo.in

See It

Same point estimate, very different certainty

Three studies, all estimating a +4-point effect (95% intervals)

Illustrative

All three centre on +4, but only Study A rules out zero. Study C's interval spans negative values — its 'effect' is indistinguishable from no effect. The point estimate alone hides this.

ImpactMojoEconometrics 101www.impactmojo.in

Significance

What a p-value does — and doesn't — say

p-value

The probability of seeing an estimate at least this extreme if the true effect were zero. A small p-value means the result is unlikely under 'no effect' — nothing more.

Statistically significant ≠ large, important, or causally valid. With a big sample a trivial effect can be 'significant'. Always ask: how big is the effect, and from a credible design?

ImpactMojoEconometrics 101www.impactmojo.in

Effect Size

Is the effect big enough to matter?

An effect can be real, precise and significant — yet too small to justify the cost. Always translate a coefficient into something a decision-maker feels: rupees, percentage points, children, school days.

Pair statistical significance with practical significance and a cost comparison. 'Significant' is the start of the conversation, not the end.

ImpactMojoEconometrics 101www.impactmojo.in

Robustness

Does the result survive poking?

Does the estimate hold across alternative specifications and control sets?
Does it survive different samples, sub-groups and outlier handling?
Are there placebo tests that should find nothing — and do?
Do the authors show the result is not knife-edge on one choice?

A finding that appears only under one precise specification is fragile. Credible results are robust ones.

ImpactMojoEconometrics 101www.impactmojo.in

p-hacking

The garden of forking paths

Try enough specifications, subgroups and outcomes and some will cross p < 0.05 by chance. Reporting only those — p-hacking — manufactures false findings that will not replicate.

Be wary of a lone 'significant' subgroup, an oddly specific specification, or many outcomes with one star. Ask what was tested but not reported.

ImpactMojoEconometrics 101www.impactmojo.in

Pre-registration

Tie your hands in advance

A pre-analysis plan — specifying hypotheses, outcomes and methods before seeing the data — removes the freedom to fish. Registries (e.g. the AEA RCT Registry) make the commitment public.

Pre-registration cannot make a bad design good, but it makes an honest design credible — readers know the result was not cherry-picked after the fact.

ImpactMojoEconometrics 101www.impactmojo.in

External Validity

Will it travel?

A clean estimate is internally valid for its setting. Whether it generalises — to another state, scale, time or population — is external validity, a separate and often harder question.

Ask: what was the context and the sample? Would the mechanism plausibly work elsewhere? Scaling up can itself change the effect (general-equilibrium and implementation effects).

ImpactMojoEconometrics 101www.impactmojo.in

11

Section Eleven

Tools & Further Reading

ImpactMojoEconometrics 101www.impactmojo.in

Software

What practitioners actually use

Tool	Good for	Note
Stata	Applied micro-econometrics, panel, IV, RD	Industry standard; paid
R	Free, flexible, reproducible analysis & graphics	Rich causal-inference packages
Python (statsmodels, linearmodels)	Automation, large data, ML	Free, general-purpose
Excel / Sheets	Quick description, not inference	Fine to start; outgrow it

The software matters far less than the research design. A clean design in Excel beats a flawed one in Stata.

ImpactMojoEconometrics 101www.impactmojo.in

Method Cheat-Sheet

Which design for which situation?

If you can…	Use	Key assumption
Randomise treatment	RCT	Successful randomisation
Find an as-good-as-random nudge	IV	Relevance + exclusion
Compare a treated & untreated group over time	DiD	Parallel trends
Exploit a cutoff rule	RD	No manipulation at cutoff
Follow units over time	Panel fixed effects	Confounders time-invariant

Start from the variation you have, then pick the design — not the other way round.

ImpactMojoEconometrics 101www.impactmojo.in

The Essential Books

Angrist & Pischke

Mostly Harmless Econometrics — Angrist & Pischke (the design-based bible; RCT, IV, DiD, RD)
Mastering 'Metrics — Angrist & Pischke (gentler, intuitive introduction — start here)
Causal Inference: The Mixtape — Scott Cunningham (free online, code-rich)

If you read one book, read Mastering 'Metrics. It teaches the five core designs in this course with humour and real studies.

ImpactMojoEconometrics 101www.impactmojo.in

Development Reading

Evidence in development

Poor Economics — Banerjee & Duflo (RCTs and the lives of the poor)
Running Randomized Evaluations — Glennerster & Takavarasha (a practical RCT field guide)
J-PAL & Innovations for Poverty Action (IPA) — policy briefs and evidence syntheses

Read the methods section of real studies critically — it is the fastest way to internalise the ideas in this deck.

ImpactMojoEconometrics 101www.impactmojo.in

The Takeaways

If you remember five things

Always ask 'compared to what?' — causation needs a counterfactual
Correlation is not causation — suspect selection and confounding first
Design beats adjustment — RCT, IV, DiD, RD, fixed effects each build a comparison
Every method has ONE load-bearing assumption — know it, and interrogate it
Read the standard errors and the robustness — precise is not the same as right

ImpactMojoEconometrics 101www.impactmojo.in

Where Next

Keep building

Econometrics is learned by doing. Take a real Indian dataset, pose a causal question, and ask which design its variation can support — then stress-test the assumption that design relies on.

Pair this deck with ImpactMojo's Data Literacy, Impact Evaluation and Exploratory Data Analysis 101 courses.

ImpactMojoEconometrics 101www.impactmojo.in

Econometrics 101 · Complete

Now go ask
'compared to what?'

Every credible causal claim rests on a counterfactual and one load-bearing assumption. You now know the five core designs and how to interrogate each one. Explore the rest of the ImpactMojo 101 Series, free forever.

More 101 Courses Explore ImpactMojo Dataverse

CC BY-NC-ND 4.0·Free Forever·ImpactMojo 101 Series