Subject Pack . S4 . Interactive

Education Programme Evaluation

What "learning outcomes" actually means in the Indian context, how teacher practice connects to student outcomes, NEP 2020 evaluation opportunities, and the TaRL evidence pattern. Walk out with an education evaluation design brief.

4 modules ~3 hours Interactive India-context

Your progress

0% complete

Your Capstone

Education Evaluation Design Brief

Walk in with an education programme. Walk out with an evaluation design brief -- learning outcome measurement, teacher-practice linkage, data plan, and analysis approach. Built automatically from your module answers.

Module 1 . ~25 min

What "learning outcomes" actually means

India has at least four different systems for measuring learning outcomes, and they do not agree with each other. Before designing an education evaluation, you must decide which definition of "learning" you are measuring, and why.

The four measurement systems

System	What it measures	Grades	Strengths	Limitations
ASER	Foundational literacy and numeracy (can the child read a paragraph? do division?)	3-8 (household-based)	Simple, comparable across states/years, household-based avoids school-selection bias	Floor-level measure; cannot differentiate above-basic proficiency
NAS	Curriculum-linked competencies across subjects	3, 5, 8, 10	Government-administered, nationally representative, aligned to NCF	School-based (misses out-of-school children), item quality varies
Board exams	Subject mastery at secondary/higher secondary	10, 12	High-stakes, institutionally embedded	Rote-focused, not comparable across boards, massive grade inflation
PISA (India participation from 2028)	Application of knowledge in reading, math, science	15-year-olds	International comparability	India participated only once (2009, Himachal + Tamil Nadu); re-entering 2028

Choosing your outcome measure

Foundational literacy/numeracy programme (FLN)? ASER-type tools are the default. NIPUN Bharat's FLN assessment framework aligns to this. Use if your programme targets grades 1-3 under the FLN mission.
Curriculum-enrichment programme? Align to NAS competencies for the relevant grade. But NAS is administered once every 3 years -- you will need to build your own assessment aligned to NAS items.
21st-century skills or SEL programme? No government assessment captures this. You need a custom instrument. See the SEL Evaluation Practice Pack.
Ed-tech programme? Platform usage data is not a learning outcome. Measure actual learning with a test administered offline.

Worked example

Pratham's Read India uses ASER-type assessments because their programme explicitly targets foundational literacy. The assessment takes 5 minutes per child, is administered one-on-one, and produces a clear categorical outcome (cannot read, letter level, word level, paragraph level, story level). This simplicity is a feature, not a limitation -- it matches the programme's theory of change precisely.

Your Learning Outcome Definition

Fill these for your programme. Your answers flow into the capstone.

Programme name and context e.g., "FLN remedial programme, grades 1-3, 80 government schools, Madhya Pradesh"

Which measurement system aligns to your programme?

ASER-type -- foundational literacy/numeracy NAS-aligned -- curriculum competencies Custom assessment -- 21st century skills, SEL, critical thinking Platform data + offline test -- ed-tech

Specific learning outcome you will measure (one sentence)

Assessment instrument you will use or build

Comparison/counterfactual strategy How will you know the learning gain is due to your programme, not normal schooling?

Saved

Self-check

An ed-tech company reports: "Our app improved learning -- students completed 200% more modules." Why is this NOT a learning outcome finding?

The sample size is too small

Module completion measures engagement/usage, not whether the child actually learned the content

The comparison group is missing

The effect size is implausibly large

Correct. Platform usage metrics (time-on-app, modules completed, levels cleared) are process/engagement measures, not learning outcomes. Learning outcomes require an assessment of what the child can actually do, administered independently of the platform.

Module 2 . ~30 min

Teacher practice vs student outcomes

Most education programmes in India work through teachers -- training them, providing materials, or restructuring classrooms. The evaluation question is whether changes in teacher practice lead to changes in student learning. This link is harder to establish than it appears.

The measurement gap

Teacher training programmes routinely measure: (a) training attendance, (b) teacher knowledge post-training, and (c) teacher satisfaction. None of these predict classroom behaviour change. The critical measure is what teachers actually do differently in the classroom after the training.

Classroom observation protocols

Protocol	What it captures	India use	Cost per observation
Stallings Snapshot	Time-on-task, activity type (every 10 seconds for 15 min)	World Bank India studies; robust for large samples	Rs 3,000-4,000
CLASS	Emotional support, classroom organisation, instructional support	Limited India use; needs extensive observer training	Rs 5,000-7,000
Teach (World Bank)	10 elements of effective teaching; 20-min observation	Growing India use (MP, Rajasthan studies)	Rs 3,500-5,000
Custom structured checklist	Programme-specific behaviours (e.g., "uses TLM," "groups students by level")	Most common in Indian NGO evaluations	Rs 2,000-3,000

The attribution chain

Teacher training changes teacher knowledge (maybe). Teacher knowledge changes teacher practice (sometimes). Teacher practice changes student learning (under specific conditions). Each link requires separate evidence. Most evaluations measure the first link and claim the third.

The Hawthorne problem

Teachers teach differently when observed. A single announced observation captures performance behaviour, not typical practice. Budget for at least two observations per teacher, one unannounced. The Stallings Snapshot was designed for this -- its fixed interval coding reduces subjective bias.

Your Teacher Practice Measurement Plan

Design how you will measure teacher practice change. These flow into your capstone.

What is the teacher-facing intervention?

Specific classroom behaviours you expect to change (list 3-5)

Observation protocol

Stallings Snapshot -- time-on-task Teach (World Bank) -- 10 elements Custom structured checklist CLASS -- emotional/instructional support

Number of teachers and observation rounds

How will you link teacher practice data to student outcome data?

Saved

Self-check

After a teacher training programme, 95% of teachers report "high satisfaction" and post-test scores show a 30% knowledge gain. Can you conclude that classroom practice has changed?

Yes -- knowledge gain predicts practice change

No -- satisfaction and knowledge are necessary but not sufficient; classroom observation is needed to confirm practice change

Yes, if the sample size is large enough

Only if the training was more than 5 days

Correct. Knowledge gain without classroom observation evidence is the most common over-claim in teacher training evaluations. Teachers may know the content but not change their practice. The World Bank's SABER-Teachers research shows that training duration and teacher knowledge are weakly correlated with actual practice change.

Module 3 . ~30 min

NEP 2020 evaluation opportunities

The National Education Policy 2020 is India's most ambitious education reform in decades. It creates both a demand for evaluation and a set of structural changes that evaluators must understand. Six years in (2026), implementation varies dramatically across states.

Key NEP reforms that create evaluation opportunities

NIPUN Bharat / FLN Mission -- every state is now implementing foundational literacy and numeracy programmes for grades 1-3. The mission has built-in assessment expectations (SAFAL, state-level FLN assessments). Evaluation opportunity: Do these state-level FLN programmes actually improve foundational skills compared to pre-NEP instruction?
5+3+3+4 restructuring -- the new curricular structure with Foundational (3-8 years), Preparatory (8-11), Middle (11-14), and Secondary (14-18) stages. States are implementing at different paces. Evaluation opportunity: How does restructuring affect learning continuity?
NCF 2023 rollout -- the new National Curriculum Framework emphasises competency-based learning and reduced content load. Evaluation opportunity: Are the new textbooks and pedagogy actually reducing rote learning?
PARAKH -- the new national assessment body replacing the old NAS model. Its mandate includes holistic, competency-based assessment. Still operationalising in 2026.

State-level variation

NEP implementation is a state subject. Kerala, Karnataka, and Madhya Pradesh are among the early movers. UP and Bihar lag on structural reforms but have active FLN missions. This variation creates natural experiments for evaluators -- if two neighbouring states implement differently, difference-in-differences designs become feasible.

Worked example

Madhya Pradesh's CM-RISE schools: MP identified 9,600 government schools for intensive improvement under the CM-RISE banner. These schools receive additional resources, teacher training, and infrastructure. The remaining ~85,000 schools serve as a natural comparison group. An evaluator could use a matched-comparison design (matching CM-RISE schools to non-CM-RISE schools on pre-intervention characteristics) to estimate the programme's effect on NAS scores.

Your NEP 2020 Evaluation Positioning

Position your evaluation relative to NEP reforms. These flow into your capstone.

Which NEP reform is most relevant to your programme?

How does your programme align with or diverge from the reform?

State context and implementation status

Government data you can leverage (UDISE+, NAS, state assessments)

Natural experiment or comparison opportunity

Saved

Self-check

A state has implemented NIPUN Bharat's FLN mission in all government schools simultaneously. Can you use a randomised controlled trial to evaluate its effect?

Yes -- randomly assign schools to treatment and control

No -- universal rollout means no untreated group exists; use pre-post with historical comparison or cross-state comparison instead

Yes -- randomly assign students within schools

No -- RCTs cannot be used for government programmes

Correct. When a policy is universally implemented, there is no within-state untreated group for randomisation. Your design options include: (a) pre-post comparison using ASER/NAS historical data, (b) cross-state comparison where implementation timing differs, or (c) regression discontinuity if eligibility has a cutoff.

Module 4 . ~25 min

The TaRL evidence pattern

Teaching at the Right Level (TaRL), developed by Pratham and rigorously evaluated by J-PAL, is the most-replicated education intervention in the developing world. Understanding its evidence pattern teaches you how a well-evaluated programme builds its evidence base over time.

The TaRL evaluation arc

Proof of concept (2001-2005) -- Pratham's Balsakhi programme in Mumbai and Vadodara. Two RCTs by Banerjee, Cole, Duflo, and Linden. Effect: 0.14-0.28 SD on math and language. Established that non-formal, level-grouped instruction works.
Government integration (2008-2014) -- Read India scaled through state governments. Evaluation showed that scale-up through existing government machinery reduced effects unless specific implementation conditions were met.
Mechanism isolation (2015-2019) -- RCTs in Uttar Pradesh and Haryana isolated which components matter: level-grouping + dedicated instructional time + simple materials. Teacher training alone was insufficient.
International replication (2016-present) -- TaRL replicated in Zambia, Botswana, Ghana, Cote d'Ivoire. Effects replicate when implementation fidelity is maintained. Won the 2023 Yidan Prize.

What the TaRL evidence teaches about education evaluation

Floor effects matter. TaRL works because it targets children who are below grade level. If your assessment does not capture the bottom of the distribution (children who cannot read at all), you will miss the effect.
Implementation fidelity is the moderator. The same programme design produces 0.6 SD effects with high fidelity and 0.05 SD with low fidelity. Your evaluation must measure implementation quality, not just presence.
Sustainability vs short-term gains. Short-burst TaRL camps (30-45 days) show strong immediate effects that fade after 6-12 months. Continuous integration shows smaller but more sustained effects. Your evaluation timeline determines which you see.

The replication lesson

TaRL shows that evidence accumulation is not linear. The first RCT proved the concept. Government scale-up evaluations showed what breaks. Component-isolation studies showed what matters. International replications showed what generalises. Design your evaluation as one node in this chain -- what specific evidence gap does it fill?

Your Evidence Positioning

Position your evaluation in the evidence landscape. These flow into your capstone.

What evidence gap does your evaluation fill?

How will you measure implementation fidelity?

What is your sustainability measurement plan? Will you measure outcomes at endline only, or at 6/12 months post-intervention?

Floor/ceiling effect risk and mitigation

Your honesty-test sentence "This evaluation will tell us ___ and will NOT tell us ___."

Saved

Self-check

Your education programme shows a 0.4 SD effect on foundational numeracy at endline (immediately after a 45-day camp). Is this sufficient evidence for a government to adopt the model at scale?

Yes -- 0.4 SD is a large effect in education

No -- only RCT evidence counts for government adoption

Not yet -- you need sustainability data (do gains persist at 6-12 months?) and implementation cost evidence before recommending scale-up

Yes, if the sample size exceeds 1,000

Correct. Short-burst education interventions often show impressive immediate effects that fade. TaRL evidence shows 30-50% fade-out within a year of camp-based models. Government scale-up decisions need: (a) sustainability data, (b) cost-per-child, and (c) implementation feasibility evidence alongside the effect size.

Capstone

Your Education Evaluation Design Brief

You have completed the four modules. Click Build my brief to compile everything.

Education Evaluation Design Brief

Click "Build my brief" -- your module answers will be pulled into the artefact.

Your brief will appear here when you click "Build my brief".

Where to go next on ImpactMojo

Done?

Share this brief with a colleague who works in education before circulating. The most common blind spot is confusing assessment system alignment with learning outcome measurement.

Help us improve this pack: feedback form.

All Practice Packs →