What "learning outcomes" actually means in the Indian context, how teacher practice connects to student outcomes, NEP 2020 evaluation opportunities, and the TaRL evidence pattern. Walk out with an education evaluation design brief.
4 modules~3 hoursInteractiveIndia-context
Your progress
0% complete
Your Capstone
Education Evaluation Design Brief
Walk in with an education programme. Walk out with an evaluation design brief -- learning outcome measurement, teacher-practice linkage, data plan, and analysis approach. Built automatically from your module answers.
Module 1 . ~25 min
What "learning outcomes" actually means
India has at least four different systems for measuring learning outcomes, and they do not agree with each other. Before designing an education evaluation, you must decide which definition of "learning" you are measuring, and why.
The four measurement systems
System
What it measures
Grades
Strengths
Limitations
ASER
Foundational literacy and numeracy (can the child read a paragraph? do division?)
3-8 (household-based)
Simple, comparable across states/years, household-based avoids school-selection bias
Rote-focused, not comparable across boards, massive grade inflation
PISA (India participation from 2028)
Application of knowledge in reading, math, science
15-year-olds
International comparability
India participated only once (2009, Himachal + Tamil Nadu); re-entering 2028
Choosing your outcome measure
Foundational literacy/numeracy programme (FLN)? ASER-type tools are the default. NIPUN Bharat's FLN assessment framework aligns to this. Use if your programme targets grades 1-3 under the FLN mission.
Curriculum-enrichment programme? Align to NAS competencies for the relevant grade. But NAS is administered once every 3 years -- you will need to build your own assessment aligned to NAS items.
21st-century skills or SEL programme? No government assessment captures this. You need a custom instrument. See the SEL Evaluation Practice Pack.
Ed-tech programme? Platform usage data is not a learning outcome. Measure actual learning with a test administered offline.
Worked example
Pratham's Read India uses ASER-type assessments because their programme explicitly targets foundational literacy. The assessment takes 5 minutes per child, is administered one-on-one, and produces a clear categorical outcome (cannot read, letter level, word level, paragraph level, story level). This simplicity is a feature, not a limitation -- it matches the programme's theory of change precisely.
Your Learning Outcome Definition
Fill these for your programme. Your answers flow into the capstone.
How will you know the learning gain is due to your programme, not normal schooling?
Saved
Self-check
An ed-tech company reports: "Our app improved learning -- students completed 200% more modules." Why is this NOT a learning outcome finding?
The sample size is too small
Module completion measures engagement/usage, not whether the child actually learned the content
The comparison group is missing
The effect size is implausibly large
Correct. Platform usage metrics (time-on-app, modules completed, levels cleared) are process/engagement measures, not learning outcomes. Learning outcomes require an assessment of what the child can actually do, administered independently of the platform.
Module 2 . ~30 min
Teacher practice vs student outcomes
Most education programmes in India work through teachers -- training them, providing materials, or restructuring classrooms. The evaluation question is whether changes in teacher practice lead to changes in student learning. This link is harder to establish than it appears.
The measurement gap
Teacher training programmes routinely measure: (a) training attendance, (b) teacher knowledge post-training, and (c) teacher satisfaction. None of these predict classroom behaviour change. The critical measure is what teachers actually do differently in the classroom after the training.
Classroom observation protocols
Protocol
What it captures
India use
Cost per observation
Stallings Snapshot
Time-on-task, activity type (every 10 seconds for 15 min)
World Bank India studies; robust for large samples
Rs 3,000-4,000
CLASS
Emotional support, classroom organisation, instructional support
Limited India use; needs extensive observer training
Rs 5,000-7,000
Teach (World Bank)
10 elements of effective teaching; 20-min observation
Growing India use (MP, Rajasthan studies)
Rs 3,500-5,000
Custom structured checklist
Programme-specific behaviours (e.g., "uses TLM," "groups students by level")
Most common in Indian NGO evaluations
Rs 2,000-3,000
The attribution chain
Teacher training changes teacher knowledge (maybe). Teacher knowledge changes teacher practice (sometimes). Teacher practice changes student learning (under specific conditions). Each link requires separate evidence. Most evaluations measure the first link and claim the third.
The Hawthorne problem
Teachers teach differently when observed. A single announced observation captures performance behaviour, not typical practice. Budget for at least two observations per teacher, one unannounced. The Stallings Snapshot was designed for this -- its fixed interval coding reduces subjective bias.
Your Teacher Practice Measurement Plan
Design how you will measure teacher practice change. These flow into your capstone.
Saved
Self-check
After a teacher training programme, 95% of teachers report "high satisfaction" and post-test scores show a 30% knowledge gain. Can you conclude that classroom practice has changed?
Yes -- knowledge gain predicts practice change
No -- satisfaction and knowledge are necessary but not sufficient; classroom observation is needed to confirm practice change
Yes, if the sample size is large enough
Only if the training was more than 5 days
Correct. Knowledge gain without classroom observation evidence is the most common over-claim in teacher training evaluations. Teachers may know the content but not change their practice. The World Bank's SABER-Teachers research shows that training duration and teacher knowledge are weakly correlated with actual practice change.
Module 3 . ~30 min
NEP 2020 evaluation opportunities
The National Education Policy 2020 is India's most ambitious education reform in decades. It creates both a demand for evaluation and a set of structural changes that evaluators must understand. Six years in (2026), implementation varies dramatically across states.
Key NEP reforms that create evaluation opportunities
NIPUN Bharat / FLN Mission -- every state is now implementing foundational literacy and numeracy programmes for grades 1-3. The mission has built-in assessment expectations (SAFAL, state-level FLN assessments). Evaluation opportunity: Do these state-level FLN programmes actually improve foundational skills compared to pre-NEP instruction?
5+3+3+4 restructuring -- the new curricular structure with Foundational (3-8 years), Preparatory (8-11), Middle (11-14), and Secondary (14-18) stages. States are implementing at different paces. Evaluation opportunity: How does restructuring affect learning continuity?
NCF 2023 rollout -- the new National Curriculum Framework emphasises competency-based learning and reduced content load. Evaluation opportunity: Are the new textbooks and pedagogy actually reducing rote learning?
PARAKH -- the new national assessment body replacing the old NAS model. Its mandate includes holistic, competency-based assessment. Still operationalising in 2026.
State-level variation
NEP implementation is a state subject. Kerala, Karnataka, and Madhya Pradesh are among the early movers. UP and Bihar lag on structural reforms but have active FLN missions. This variation creates natural experiments for evaluators -- if two neighbouring states implement differently, difference-in-differences designs become feasible.
Worked example
Madhya Pradesh's CM-RISE schools: MP identified 9,600 government schools for intensive improvement under the CM-RISE banner. These schools receive additional resources, teacher training, and infrastructure. The remaining ~85,000 schools serve as a natural comparison group. An evaluator could use a matched-comparison design (matching CM-RISE schools to non-CM-RISE schools on pre-intervention characteristics) to estimate the programme's effect on NAS scores.
Your NEP 2020 Evaluation Positioning
Position your evaluation relative to NEP reforms. These flow into your capstone.
Saved
Self-check
A state has implemented NIPUN Bharat's FLN mission in all government schools simultaneously. Can you use a randomised controlled trial to evaluate its effect?
Yes -- randomly assign schools to treatment and control
No -- universal rollout means no untreated group exists; use pre-post with historical comparison or cross-state comparison instead
Yes -- randomly assign students within schools
No -- RCTs cannot be used for government programmes
Correct. When a policy is universally implemented, there is no within-state untreated group for randomisation. Your design options include: (a) pre-post comparison using ASER/NAS historical data, (b) cross-state comparison where implementation timing differs, or (c) regression discontinuity if eligibility has a cutoff.
Module 4 . ~25 min
The TaRL evidence pattern
Teaching at the Right Level (TaRL), developed by Pratham and rigorously evaluated by J-PAL, is the most-replicated education intervention in the developing world. Understanding its evidence pattern teaches you how a well-evaluated programme builds its evidence base over time.
The TaRL evaluation arc
Proof of concept (2001-2005) -- Pratham's Balsakhi programme in Mumbai and Vadodara. Two RCTs by Banerjee, Cole, Duflo, and Linden. Effect: 0.14-0.28 SD on math and language. Established that non-formal, level-grouped instruction works.
Government integration (2008-2014) -- Read India scaled through state governments. Evaluation showed that scale-up through existing government machinery reduced effects unless specific implementation conditions were met.
Mechanism isolation (2015-2019) -- RCTs in Uttar Pradesh and Haryana isolated which components matter: level-grouping + dedicated instructional time + simple materials. Teacher training alone was insufficient.
International replication (2016-present) -- TaRL replicated in Zambia, Botswana, Ghana, Cote d'Ivoire. Effects replicate when implementation fidelity is maintained. Won the 2023 Yidan Prize.
What the TaRL evidence teaches about education evaluation
Floor effects matter. TaRL works because it targets children who are below grade level. If your assessment does not capture the bottom of the distribution (children who cannot read at all), you will miss the effect.
Implementation fidelity is the moderator. The same programme design produces 0.6 SD effects with high fidelity and 0.05 SD with low fidelity. Your evaluation must measure implementation quality, not just presence.
Sustainability vs short-term gains. Short-burst TaRL camps (30-45 days) show strong immediate effects that fade after 6-12 months. Continuous integration shows smaller but more sustained effects. Your evaluation timeline determines which you see.
The replication lesson
TaRL shows that evidence accumulation is not linear. The first RCT proved the concept. Government scale-up evaluations showed what breaks. Component-isolation studies showed what matters. International replications showed what generalises. Design your evaluation as one node in this chain -- what specific evidence gap does it fill?
Your Evidence Positioning
Position your evaluation in the evidence landscape. These flow into your capstone.
Will you measure outcomes at endline only, or at 6/12 months post-intervention?
"This evaluation will tell us ___ and will NOT tell us ___."
Saved
Self-check
Your education programme shows a 0.4 SD effect on foundational numeracy at endline (immediately after a 45-day camp). Is this sufficient evidence for a government to adopt the model at scale?
Yes -- 0.4 SD is a large effect in education
No -- only RCT evidence counts for government adoption
Not yet -- you need sustainability data (do gains persist at 6-12 months?) and implementation cost evidence before recommending scale-up
Yes, if the sample size exceeds 1,000
Correct. Short-burst education interventions often show impressive immediate effects that fade. TaRL evidence shows 30-50% fade-out within a year of camp-based models. Government scale-up decisions need: (a) sustainability data, (b) cost-per-child, and (c) implementation feasibility evidence alongside the effect size.
Capstone
Your Education Evaluation Design Brief
You have completed the four modules. Click Build my brief to compile everything.
Education Evaluation Design Brief
Click "Build my brief" -- your module answers will be pulled into the artefact.
Your brief will appear here when you click "Build my brief".
Share this brief with a colleague who works in education before circulating. The most common blind spot is confusing assessment system alignment with learning outcome measurement.