Subject Pack . S7 . Interactive

Public Policy Evaluation

Policy evaluation vs programme evaluation, political economy awareness in design, implementation tracking and regulatory impact, and welfare-effect estimation approaches. Walk out with a policy evaluation design brief.

4 modules~3 hoursInteractiveIndia-context
Your progress
0% complete
Your Capstone

Policy Evaluation Design Brief

Walk in with a public policy or government scheme. Walk out with an evaluation design brief covering the policy-programme distinction, political economy mapping, implementation tracking, and welfare estimation approach.

Module 1 . ~25 min

Policy evaluation vs programme evaluation

A programme is a bounded intervention with defined beneficiaries, a budget, and a start/end date. A policy is a rule, incentive structure, or institutional arrangement that shapes behaviour across an entire population. Evaluating the two requires different logics.

Key differences

DimensionProgramme evaluationPolicy evaluation
ScopeBounded: specific geographies, beneficiariesUniversal or near-universal: affects entire populations
CounterfactualNon-participants or comparison groupOften no untreated group (universal policies); use pre-post, cross-state, or regression discontinuity
ImplementationControlled by implementing agencyMediated through bureaucracy, politics, and state capacity
Timeline2-5 years typicalEffects unfold over decades; political cycles interrupt
DataPrimary collection usually feasibleOften relies on administrative data (NSSO, PLFS, Census, SECC)

Indian policy evaluation landscape

India has a growing culture of policy evaluation but it remains patchy. NITI Aayog's Development Monitoring and Evaluation Office (DMEO) commissions evaluations of centrally sponsored schemes. The Programme Evaluation Organisation (PEO) has been folded into DMEO. State-level evaluation capacity varies enormously -- Kerala and Tamil Nadu have strong traditions; many states have none.

Worked example

PM-KISAN (Rs 6,000/year cash transfer to farmer families) was implemented nationally in February 2019. No randomisation was possible because it is a universal entitlement. Evaluations have used: (a) PLFS panel data comparing pre/post implementation periods, (b) regression discontinuity at the 2-hectare cutoff (in the initial design which had a landholding cap), and (c) difference-in-differences across states with faster vs slower DBT rollout. Each design answers a different question and has different limitations.

Your Policy Scoping

Fill these for your policy/scheme. Answers flow into the capstone.

e.g., "PM-KISAN national cash transfer to farmer families, evaluating effect on input use and yields"
Who or what is the comparison, given universal coverage?
Saved
Self-check
The Right to Education Act (2009) mandated free education for all children aged 6-14. How would you design an evaluation of RTE's effect on learning outcomes?
RCT with treatment and control schools
Pre-post comparison using ASER data from 2006-2014, comparing trends across states with different implementation timing and intensity
Survey of teachers asking if RTE improved learning
Cannot be evaluated because it is universal
Correct. Universal policies like RTE require pre-post or cross-state comparison designs. ASER provides annual, comparable learning data from 2005 onwards -- a rare asset for policy evaluation. Variation in RTE implementation timing across states creates a natural experiment.
Module 2 . ~30 min

Political economy awareness in design

Policy evaluation is never politically neutral. The findings will be used, misused, or ignored depending on who commissioned the evaluation, when in the political cycle it is released, and which stakeholders benefit from the current policy.

The four political economy questions

The evaluator's independence

DMEO guidelines require evaluations to be conducted by "independent" agencies. In practice, agencies that produce unfavourable findings are less likely to receive future contracts. This structural incentive biases towards positive findings. Document your quality assurance process explicitly: peer review by named experts, pre-registered analysis plan, data deposited in a public repository.

Your Political Economy Mapping

Map the political economy. These flow into your capstone.

Pre-registration, peer review, data deposit, etc.
Saved
Self-check
You are commissioned to evaluate a state government's flagship nutrition programme 6 months before state elections. What is the most important design decision?
Use the most rigorous method possible (RCT)
Pre-register the analysis plan and methodology before data collection, so findings cannot be selectively reported under political pressure
Delay the evaluation until after elections
Only report positive findings to maintain access
Correct. Pre-registration protects both the evaluator and the funder. When the analysis plan is public before data collection, selective reporting becomes visible. This is the strongest independence safeguard available.
Module 3 . ~30 min

Implementation tracking + regulatory impact

Most policies fail not because the design is wrong but because implementation breaks down. Lant Pritchett's concept of "capability traps" explains why: India announces ambitious policies that exceed the implementation capacity of the state machinery. Implementation tracking is therefore the most useful form of policy evaluation for improving outcomes.

What to track

Worked example

MGNREGA implementation tracking: The most informative evaluations of MGNREGA focus not on "did it reduce poverty?" (too distal, too many confounders) but on implementation quality: days provided vs days demanded (demand gap), wage payment delays (measured in days between muster roll closing and DBT credit), social audit coverage, and material-labour ratio. These process metrics directly diagnose implementation health and are actionable at the block level.

Your Implementation Tracking Design

Design the implementation analysis. These flow into your capstone.

e.g., DBT data, PFMS, MGNREGA MIS, PM-KISAN portal, SECC
Saved
Self-check
An evaluation of PM-KISAN finds: "The scheme has no significant effect on agricultural productivity." The government dismisses the finding. What might the evaluation have missed?
The sample size was too small
Implementation tracking -- if only 60% of eligible farmers actually receive payments, and payments are delayed by 3+ months, the "treatment" is not what the policy intended; the evaluation measured a diluted version
The comparison group was poorly selected
Agricultural productivity is not the right outcome
Correct. Evaluating a policy's intent without measuring its implementation is evaluating a theory, not a reality. Implementation tracking (coverage, timeliness, amount) should always accompany impact estimation. A null effect may reflect implementation failure, not design failure.
Module 4 . ~25 min

Welfare-effect estimation approaches

When you need to estimate the causal effect of a policy on welfare outcomes, you need a credible identification strategy. India's policy landscape offers several natural experiment opportunities.

Identification strategies for Indian policies

StrategyWhen it worksIndian example
Regression discontinuityEligibility has a sharp cutoff (BPL score, age, income)PMAY-G eligibility based on SECC deprivation score
Difference-in-differencesPolicy rolled out at different times across states/districtsMGNREGA phased rollout (2006-2008) across districts
Instrumental variablesAn instrument affects policy exposure but not outcomes directlyRainfall as instrument for MGNREGA take-up (demand rises after drought)
Synthetic controlA single treated unit (one state) with many potential comparison unitsGujarat's industrial policy effect compared to synthetic Gujarat

Administrative data for welfare estimation

India's administrative data ecosystem has expanded enormously: NSSO/PLFS (labour), NFHS (health/nutrition), AIDIS (debt/assets), ASI (industry), UDISE+ (education). These datasets are free, nationally representative, and increasingly available as microdata. Use them.

The general equilibrium problem

Large-scale policies create general equilibrium effects that programme evaluations miss. MGNREGA raises rural wages even for non-participants. GST changes relative prices across sectors. PM-KISAN may affect land prices. These spillover effects can be larger than direct effects and require macro-level analysis, not household surveys.

Your Welfare Estimation Plan

Design the estimation strategy. These flow into your capstone.

Saved
Self-check
MGNREGA was rolled out in three phases: Phase 1 (200 poorest districts, Feb 2006), Phase 2 (130 districts, 2007), Phase 3 (remaining districts, 2008). What identification strategy does this phased rollout enable?
Difference-in-differences -- Phase 1 districts as treatment, Phase 3 districts as control during 2006-2008, then Phase 3 becomes treated
Regression discontinuity at the poverty cutoff
Randomised controlled trial across phases
Synthetic control for each phase
Correct. MGNREGA's phased rollout is the most-used natural experiment in Indian policy evaluation. Phase 1 districts received the programme 2 years before Phase 3, creating a treatment-control window. Dozens of papers use this design (Imbert & Papp 2015, Zimmermann 2020, etc.).
Capstone

Your Policy Evaluation Design Brief

Click Build my brief to compile everything.

Policy Evaluation Design Brief

Click "Build my brief" -- your module answers will be pulled into the artefact.

Your brief will appear here when you click "Build my brief".
Done?

Share this brief with a policy researcher before circulating. The most common blind spot is ignoring implementation quality when interpreting impact estimates.

Help us improve: feedback form.