Subject Pack . S4 . Interactive

Education Programme Evaluation

What "learning outcomes" actually means in the Indian context, how teacher practice connects to student outcomes, NEP 2020 evaluation opportunities, and the TaRL evidence pattern. Walk out with an education evaluation design brief.

4 modules ~3 hours Interactive India-context
Your progress
0% complete
Your Capstone

Education Evaluation Design Brief

Walk in with an education programme. Walk out with an evaluation design brief -- learning outcome measurement, teacher-practice linkage, data plan, and analysis approach. Built automatically from your module answers.

Module 1 . ~25 min

What "learning outcomes" actually means

India has at least four different systems for measuring learning outcomes, and they do not agree with each other. Before designing an education evaluation, you must decide which definition of "learning" you are measuring, and why.

The four measurement systems

SystemWhat it measuresGradesStrengthsLimitations
ASERFoundational literacy and numeracy (can the child read a paragraph? do division?)3-8 (household-based)Simple, comparable across states/years, household-based avoids school-selection biasFloor-level measure; cannot differentiate above-basic proficiency
NASCurriculum-linked competencies across subjects3, 5, 8, 10Government-administered, nationally representative, aligned to NCFSchool-based (misses out-of-school children), item quality varies
Board examsSubject mastery at secondary/higher secondary10, 12High-stakes, institutionally embeddedRote-focused, not comparable across boards, massive grade inflation
PISA (India participation from 2028)Application of knowledge in reading, math, science15-year-oldsInternational comparabilityIndia participated only once (2009, Himachal + Tamil Nadu); re-entering 2028

Choosing your outcome measure

Worked example

Pratham's Read India uses ASER-type assessments because their programme explicitly targets foundational literacy. The assessment takes 5 minutes per child, is administered one-on-one, and produces a clear categorical outcome (cannot read, letter level, word level, paragraph level, story level). This simplicity is a feature, not a limitation -- it matches the programme's theory of change precisely.

Your Learning Outcome Definition

Fill these for your programme. Your answers flow into the capstone.

e.g., "FLN remedial programme, grades 1-3, 80 government schools, Madhya Pradesh"
How will you know the learning gain is due to your programme, not normal schooling?
Saved
Self-check
An ed-tech company reports: "Our app improved learning -- students completed 200% more modules." Why is this NOT a learning outcome finding?
The sample size is too small
Module completion measures engagement/usage, not whether the child actually learned the content
The comparison group is missing
The effect size is implausibly large
Correct. Platform usage metrics (time-on-app, modules completed, levels cleared) are process/engagement measures, not learning outcomes. Learning outcomes require an assessment of what the child can actually do, administered independently of the platform.
Module 2 . ~30 min

Teacher practice vs student outcomes

Most education programmes in India work through teachers -- training them, providing materials, or restructuring classrooms. The evaluation question is whether changes in teacher practice lead to changes in student learning. This link is harder to establish than it appears.

The measurement gap

Teacher training programmes routinely measure: (a) training attendance, (b) teacher knowledge post-training, and (c) teacher satisfaction. None of these predict classroom behaviour change. The critical measure is what teachers actually do differently in the classroom after the training.

Classroom observation protocols

ProtocolWhat it capturesIndia useCost per observation
Stallings SnapshotTime-on-task, activity type (every 10 seconds for 15 min)World Bank India studies; robust for large samplesRs 3,000-4,000
CLASSEmotional support, classroom organisation, instructional supportLimited India use; needs extensive observer trainingRs 5,000-7,000
Teach (World Bank)10 elements of effective teaching; 20-min observationGrowing India use (MP, Rajasthan studies)Rs 3,500-5,000
Custom structured checklistProgramme-specific behaviours (e.g., "uses TLM," "groups students by level")Most common in Indian NGO evaluationsRs 2,000-3,000

The attribution chain

Teacher training changes teacher knowledge (maybe). Teacher knowledge changes teacher practice (sometimes). Teacher practice changes student learning (under specific conditions). Each link requires separate evidence. Most evaluations measure the first link and claim the third.

The Hawthorne problem

Teachers teach differently when observed. A single announced observation captures performance behaviour, not typical practice. Budget for at least two observations per teacher, one unannounced. The Stallings Snapshot was designed for this -- its fixed interval coding reduces subjective bias.

Your Teacher Practice Measurement Plan

Design how you will measure teacher practice change. These flow into your capstone.

Saved
Self-check
After a teacher training programme, 95% of teachers report "high satisfaction" and post-test scores show a 30% knowledge gain. Can you conclude that classroom practice has changed?
Yes -- knowledge gain predicts practice change
No -- satisfaction and knowledge are necessary but not sufficient; classroom observation is needed to confirm practice change
Yes, if the sample size is large enough
Only if the training was more than 5 days
Correct. Knowledge gain without classroom observation evidence is the most common over-claim in teacher training evaluations. Teachers may know the content but not change their practice. The World Bank's SABER-Teachers research shows that training duration and teacher knowledge are weakly correlated with actual practice change.
Module 3 . ~30 min

NEP 2020 evaluation opportunities

The National Education Policy 2020 is India's most ambitious education reform in decades. It creates both a demand for evaluation and a set of structural changes that evaluators must understand. Six years in (2026), implementation varies dramatically across states.

Key NEP reforms that create evaluation opportunities

State-level variation

NEP implementation is a state subject. Kerala, Karnataka, and Madhya Pradesh are among the early movers. UP and Bihar lag on structural reforms but have active FLN missions. This variation creates natural experiments for evaluators -- if two neighbouring states implement differently, difference-in-differences designs become feasible.

Worked example

Madhya Pradesh's CM-RISE schools: MP identified 9,600 government schools for intensive improvement under the CM-RISE banner. These schools receive additional resources, teacher training, and infrastructure. The remaining ~85,000 schools serve as a natural comparison group. An evaluator could use a matched-comparison design (matching CM-RISE schools to non-CM-RISE schools on pre-intervention characteristics) to estimate the programme's effect on NAS scores.

Your NEP 2020 Evaluation Positioning

Position your evaluation relative to NEP reforms. These flow into your capstone.

Saved
Self-check
A state has implemented NIPUN Bharat's FLN mission in all government schools simultaneously. Can you use a randomised controlled trial to evaluate its effect?
Yes -- randomly assign schools to treatment and control
No -- universal rollout means no untreated group exists; use pre-post with historical comparison or cross-state comparison instead
Yes -- randomly assign students within schools
No -- RCTs cannot be used for government programmes
Correct. When a policy is universally implemented, there is no within-state untreated group for randomisation. Your design options include: (a) pre-post comparison using ASER/NAS historical data, (b) cross-state comparison where implementation timing differs, or (c) regression discontinuity if eligibility has a cutoff.
Module 4 . ~25 min

The TaRL evidence pattern

Teaching at the Right Level (TaRL), developed by Pratham and rigorously evaluated by J-PAL, is the most-replicated education intervention in the developing world. Understanding its evidence pattern teaches you how a well-evaluated programme builds its evidence base over time.

The TaRL evaluation arc

  1. Proof of concept (2001-2005) -- Pratham's Balsakhi programme in Mumbai and Vadodara. Two RCTs by Banerjee, Cole, Duflo, and Linden. Effect: 0.14-0.28 SD on math and language. Established that non-formal, level-grouped instruction works.
  2. Government integration (2008-2014) -- Read India scaled through state governments. Evaluation showed that scale-up through existing government machinery reduced effects unless specific implementation conditions were met.
  3. Mechanism isolation (2015-2019) -- RCTs in Uttar Pradesh and Haryana isolated which components matter: level-grouping + dedicated instructional time + simple materials. Teacher training alone was insufficient.
  4. International replication (2016-present) -- TaRL replicated in Zambia, Botswana, Ghana, Cote d'Ivoire. Effects replicate when implementation fidelity is maintained. Won the 2023 Yidan Prize.

What the TaRL evidence teaches about education evaluation

The replication lesson

TaRL shows that evidence accumulation is not linear. The first RCT proved the concept. Government scale-up evaluations showed what breaks. Component-isolation studies showed what matters. International replications showed what generalises. Design your evaluation as one node in this chain -- what specific evidence gap does it fill?

Your Evidence Positioning

Position your evaluation in the evidence landscape. These flow into your capstone.

Will you measure outcomes at endline only, or at 6/12 months post-intervention?
"This evaluation will tell us ___ and will NOT tell us ___."
Saved
Self-check
Your education programme shows a 0.4 SD effect on foundational numeracy at endline (immediately after a 45-day camp). Is this sufficient evidence for a government to adopt the model at scale?
Yes -- 0.4 SD is a large effect in education
No -- only RCT evidence counts for government adoption
Not yet -- you need sustainability data (do gains persist at 6-12 months?) and implementation cost evidence before recommending scale-up
Yes, if the sample size exceeds 1,000
Correct. Short-burst education interventions often show impressive immediate effects that fade. TaRL evidence shows 30-50% fade-out within a year of camp-based models. Government scale-up decisions need: (a) sustainability data, (b) cost-per-child, and (c) implementation feasibility evidence alongside the effect size.
Capstone

Your Education Evaluation Design Brief

You have completed the four modules. Click Build my brief to compile everything.

Education Evaluation Design Brief

Click "Build my brief" -- your module answers will be pulled into the artefact.

Your brief will appear here when you click "Build my brief".
Done?

Share this brief with a colleague who works in education before circulating. The most common blind spot is confusing assessment system alignment with learning outcome measurement.

Help us improve this pack: feedback form.