Subject Pack . S9 . Interactive

Governance Reform Evaluation

Institutional change measurement, citizen-experience metrics, before-after with structural break design, and the political economy of evaluation findings. Walk out with a governance evaluation design brief.

4 modules~3 hoursInteractiveIndia-context

Your progress

0% complete

Your Capstone

Governance Evaluation Design Brief

Walk in with a governance reform initiative. Walk out with an evaluation design brief covering institutional change measurement, citizen-experience metrics, research design, and political economy mapping.

Module 1 . ~25 min

Institutional change -- what can be measured

Governance reform aims to change how institutions function -- making them more transparent, accountable, responsive, or efficient. These are inherently hard to measure because institutions are complex systems, not bounded interventions.

Four dimensions of institutional change

Dimension	What it captures	Example indicators	Data source
Rules	Formal rule changes (laws, orders, SOPs)	New SOP adopted, RTI compliance rate, file disposal timeline	Administrative records, gazette notifications
Practices	How rules are actually implemented	Average service delivery time, absenteeism rates, meeting frequency	Mystery client visits, direct observation, administrative data
Norms	Informal expectations and culture	Corruption perception, trust in institution, perceived responsiveness	Citizen surveys, staff surveys
Outcomes	End results for citizens	Service access, grievance resolution, satisfaction	Citizen surveys, service delivery data

Most governance evaluations measure rules and claim outcomes. The gap between "the SOP was changed" (rules) and "citizens experienced faster service" (outcomes) is where governance reforms succeed or fail. Measuring practices is the critical middle layer.

Worked example

Rajasthan's Bhamashah Yojana digitised social protection benefit delivery. The evaluation measured: (a) rules -- database created, Aadhaar-linked accounts opened (output), (b) practices -- time from application to benefit receipt, number of trips to office (measured via citizen survey), (c) norms -- corruption perception around benefit delivery (pre/post survey), (d) outcomes -- inclusion errors and exclusion errors in benefit targeting. The most impactful finding was that digitisation reduced the number of citizen-office trips from 4.2 to 1.8 on average, but exclusion errors increased for households without Aadhaar.

Your Institutional Change Framing

Map the four dimensions for your reform. Answers flow into the capstone.

Governance reform name and contexte.g., "E-governance reform in land records (Bhoomi-type), 3 districts, Karnataka"

Rules dimension: what formal changes occurred?

Practices dimension: what observable behaviours should change?

Norms dimension: what cultural/informal changes are expected?

Outcomes dimension: what end results for citizens?

Saved

Self-check

A state government digitises its Public Distribution System (PDS). The evaluation reports: "100% of fair price shops now have electronic Point of Sale devices." Is this an outcome finding?

Yes -- full digitisation is the outcome

No -- device installation is a rules/infrastructure change (output); outcomes would be reduced diversion, fewer ghost beneficiaries, or faster service for citizens

Yes, if the devices are functional

Only if compared to a pre-digitisation baseline

Correct. Installing technology is infrastructure, not reform. The evaluation must measure what changed for citizens (service speed, accuracy, leakage reduction) and for the institution (transparency, accountability). Many e-governance evaluations stop at installation counts.

Module 2 . ~30 min

Citizen-experience metrics

Governance exists to serve citizens. The most credible governance evaluation evidence comes from measuring citizen experience directly -- not from administrative self-reports or expert scorecards.

Three approaches to citizen experience measurement

Citizen Report Cards (CRC) -- pioneered by the Public Affairs Centre, Bangalore. Large-sample surveys of citizens rating government services on accessibility, reliability, quality, responsiveness, and corruption. The Bangalore CRC has been running since 1994 and has demonstrably improved service delivery through public pressure.
Community Scorecards -- participatory tool where community members rate local service providers (PHC, school, panchayat office) and providers respond. More process-oriented; produces dialogue, not just data.
Mystery client / simulated citizen -- trained researchers visit government offices posing as citizens to measure actual service quality, wait times, bribe demands. The most rigorous measure of frontline practice but ethically and operationally complex.

Indian governance data sources

Source	What it provides	Frequency
Service delivery surveys (World Bank)	Provider absenteeism, infrastructure, service quality	Periodic (2003, 2010, 2019)
Governance Performance Index (NITI Aayog)	State-level governance quality across sectors	Irregular
DISHA dashboards	District-level scheme implementation data	Real-time
RTI compliance data	Response rates, timelines, appeals	Annual (CIC annual report)
CPGRAMS	Grievance registration and disposal	Real-time

The social desirability problem

Citizens in India often under-report negative experiences with government services, especially to unknown surveyors. They fear retaliation or believe it is futile. Use indirect questioning techniques: "Some people in this area say that obtaining a caste certificate takes 3 trips to the office and costs Rs 500 in unofficial fees. In your experience, is this more or less than what it actually takes?" This indirect framing yields more honest responses than "Did you pay a bribe?"

Your Citizen Experience Measurement

Design the citizen-side measurement. These flow into your capstone.

Primary citizen experience method

Citizen Report Card -- large-sample surveyCommunity Scorecard -- participatory processMystery client -- simulated citizen visitsMixed approach

Service quality dimensions you will measure

Citizen sample design

How will you handle social desirability bias?

Administrative data you will triangulate with

Saved

Self-check

You are evaluating a one-stop-centre reform in district offices. The government reports "average service time reduced from 7 days to 2 days." Your citizen survey finds "average service time is 5 days." Why might these differ?

The citizen survey sample is biased

The government measures from application receipt to file closure; citizens measure from first visit (including document preparation, re-visits for corrections) to actual receipt of service -- the citizen journey is longer than the administrative process

The government data is falsified

Citizens remember incorrectly

Correct. The citizen journey includes steps that administrative data does not capture: gathering documents, making initial inquiries, returning for corrections, waiting for notifications. Both measures are valid; they answer different questions. The evaluation should report both and explain the gap.

Module 3 . ~30 min

Before-after with structural break design

Most governance reforms are implemented universally within a jurisdiction. There is no control group. The most feasible design is before-after with structural break analysis -- testing whether the reform corresponds to a detectable change in the trend of governance outcomes.

How structural break designs work

Collect time-series data -- at least 8-10 pre-reform data points and 4-6 post-reform points. For governance, monthly or quarterly administrative data often provides this (e.g., RTI responses per month, grievance disposal rates, service delivery times from DISHA).
Test for structural break -- use Chow test or Bai-Perron test to determine whether the reform date corresponds to a statistically significant break in the trend.
Control for confounders -- other events that coincided with the reform (elections, budget changes, personnel transfers). Document these explicitly.
Supplement with qualitative evidence -- interviews with officials and citizens to explain the mechanism. The quant shows when things changed; the qual shows why.

Worked example

AP's Mee Seva (citizen service centres) reform was evaluated using monthly data on service transactions from 2010-2016. The structural break test showed a significant increase in transaction volume at the Mee Seva launch date (2011), with a secondary break when rural centres opened (2013). Qualitative interviews with citizens confirmed that the primary driver was reduced travel cost (service available locally) rather than faster processing (which was marginal).

Your Research Design

Design the before-after analysis. These flow into your capstone.

Time-series data available (what indicator, how many pre/post points)

Reform implementation date (or phased dates)

Confounding events around the reform date

Qualitative component to explain the mechanism

Alternative design if time-series data is insufficientCross-jurisdiction comparison? Matched district design?

Saved

Self-check

You have 3 months of pre-reform data and 3 months of post-reform data on grievance disposal rates. Is this sufficient for a structural break analysis?

Yes -- 6 data points is enough

No -- structural break tests require at least 8-10 pre-reform points to establish the pre-existing trend; with only 3, you cannot distinguish reform effects from normal variation

Yes, if you use weekly data instead of monthly

Depends on the effect size

Correct. With only 3 pre-reform data points, you have no baseline trend to compare against. The post-reform data could simply reflect normal month-to-month variation. You need a longer time series or an alternative design (cross-jurisdiction comparison, for instance).

Module 4 . ~25 min

Political economy of evaluation findings

Governance evaluations are uniquely political. Unlike health or education evaluations, governance evaluations directly assess the performance of the political-administrative system. The people being evaluated are often the people who commissioned the evaluation.

Three political economy traps

The showcase trap -- the reform is implemented in one or two model districts where the collector is supportive. The evaluation covers these districts. Findings are positive. The government scales up. But model-district performance does not replicate because the enabling conditions (strong collector, extra budget, political attention) do not scale.
The attribution trap -- a new government claims credit for improvements that began under the previous government. Or the reverse: a new government discredits the previous government's reform. Evaluators must be clear about timelines and trends, not just snapshot comparisons.
The messenger trap -- the evaluator finds that the reform worsened outcomes for certain groups (e.g., digitisation excluded those without smartphones). Reporting this honestly risks losing future contracts or access. Pre-commit to transparency: pre-register, use independent peer review, share data publicly.

The "good enough" governance standard

Not all governance reforms need to reach ideal standards to be valuable. Matt Andrews' concept of "good enough governance" suggests evaluating reforms against locally achievable benchmarks, not global best practice. If a district office in Jharkhand reduces service time from 30 days to 10 days, that is meaningful progress even if the Kerala benchmark is 3 days. Frame findings as progress relative to the starting point, not just distance from the ideal.

Your Political Economy Mapping

Map the political economy of your evaluation. These flow into your capstone.

Key political stakeholders and their interests in the findings

Is there a showcase-district risk? How will you sample to avoid it?

Your independence safeguards

How will you frame findings for maximum policy utility?

Your honesty-test sentence

Saved

Self-check

The state government asks you to evaluate their e-governance reform. They suggest you study the 3 best-performing districts where the reform was piloted with extra support. What is the risk?

No risk -- best-performing districts show the reform's potential

Showcase-district bias -- findings will not represent typical implementation; insist on including average and below-average districts in the sample to show the full distribution of implementation quality

The sample is too small (only 3 districts)

The government is trying to influence the findings

Correct. Evaluation of showcase districts produces evidence about what is possible, not what is typical. For policy decisions about scaling, the government needs to know how the reform performs under normal conditions -- average bureaucratic capacity, standard budgets, typical political attention levels.

Capstone

Your Governance Evaluation Design Brief

Click Build my brief to compile everything.

Governance Evaluation Design Brief

Click "Build my brief" to compile your answers.

Your brief will appear here when you click "Build my brief".

Where to go next on ImpactMojo

Done?

Share this brief with a governance specialist or an IAS/IPS officer before circulating. The most common blind spot is assuming that technology deployment equals institutional reform.

Help us improve: feedback form.

All Practice Packs →