Method Pack -- M6 -- Interactive

Reading & Critiquing an Evidence Paper

Most development professionals read research papers for conclusions. This pack teaches you to read for design -- so you can assess whether the conclusions are actually supported by the evidence presented.

4 modules~100 minInteractive
Your progress
0% complete
Your Capstone

1-Page Critique of a Paper of Your Choice

A structured critique covering design, validity, statistical claims, and practical implications -- ready to share with colleagues or use in a journal club.

Module 1 -- ~25 min

Reading for design (RCT, quasi-experimental, qualitative)

Before evaluating findings, identify the research design. The design determines what claims the paper can and cannot make. Most over-claims come from using a weak design to support a strong claim.

The design hierarchy (for causal claims)

  1. Randomised Controlled Trial (RCT) -- random assignment to treatment/control. Strongest causal claim. Check: was randomisation actually random? Was there attrition? Were there spillovers?
  2. Quasi-experimental -- comparison group but no random assignment. Difference-in-differences, regression discontinuity, propensity score matching. Check: how was the comparison group selected? Could selection bias remain?
  3. Pre-post (no comparison) -- measures change over time in treatment group only. Cannot separate programme effect from time, maturation, or other changes. Check: does the paper acknowledge this limitation?
  4. Cross-sectional -- snapshot at one point. Can describe associations but not causation or change. Check: does the paper claim change without time-series data?
  5. Qualitative -- explores mechanisms, experiences, meaning. Not designed for causal claims. Check: does the paper stay within its design or over-reach?
Worked example -- Identifying design

Paper: "Impact of MGNREGA on rural livelihoods in Telangana" (hypothetical).

Method section says: "We surveyed 500 MGNREGA cardholders and compared their outcomes with 300 non-cardholders in the same mandals."

Design: Cross-sectional with comparison group. Not quasi-experimental because there is no pre-period and no method to address selection bias (people who get MGNREGA cards may differ systematically from those who do not). The paper can describe associations but cannot claim MGNREGA "caused" the differences.

Your Paper -- Design Identification

Choose a paper relevant to your work and identify its design.

Saved
Self-check
A paper titled "Impact of mid-day meals on school attendance" uses a pre-post design with no comparison group. Can it claim "impact"?
Yes -- pre-post measures change
No -- without a comparison group, the change could be due to other factors (time trends, policy changes, seasonal effects). It can document change but not attribute it.
Only if the sample is large enough
Only if the effect is very large
Correct. "Impact" implies causation, which requires a credible counterfactual (what would have happened without the programme). Pre-post designs cannot provide this. The attendance increase might be due to a new state policy, teacher recruitment, or seasonal patterns.
Module 2 -- ~25 min

Assessing internal and external validity

Internal validity: did the study actually measure what it claims to have measured? Were there alternative explanations the design did not rule out?

External validity: do the findings generalise beyond the study population? Would the same intervention produce the same results in a different context?

Internal validity threats

External validity questions

The India generalisability problem

Many influential development studies are conducted in one or two Indian states. A study from Tamil Nadu may not generalise to Bihar -- governance capacity, social structures, and economic conditions differ enormously. When reading India-based papers, always check: which states? Urban or rural? Which population? The answer to "does it work?" is almost always "it depends on where and for whom."

Your Validity Assessment

Does the paper acknowledge them? How serious are they?
Saved
Self-check
An RCT of a livelihoods programme in Bangladesh shows strong effects. Your organisation wants to replicate it in Odisha. What is the primary concern?
Internal validity -- the RCT may have been poorly designed
External validity -- Bangladesh's context (NGO density, microfinance infrastructure, social norms) may differ significantly from Odisha
Statistical significance -- the sample may have been too small
The language barrier
Correct. RCTs have strong internal validity by design. The question for replication is always external validity: will the same intervention work in a different context with different implementing capacity, social norms, and market structures?
Module 3 -- ~25 min

Effect sizes and statistical claims

You do not need to be a statistician to critique statistical claims. You need to know three things: what the effect size means, what statistical significance actually tells you, and when to be suspicious.

Key concepts

Worked example -- Reading a results table

Paper reports: "Programme increased test scores by 0.12 SD (95% CI: 0.03-0.21, p=0.008, n=2,400)."

Translation: The effect is small (0.12 SD) but statistically significant. The confidence interval is reasonably tight. With 2,400 students, the study had good power. The effect is real but modest -- roughly equivalent to 1-2 months of additional learning. Whether this justifies the programme cost depends on the per-student investment.

Your Statistical Assessment

Too many outcomes tested? Subgroup findings not pre-registered? Missing data not addressed?
Saved
Self-check
A paper tests 20 outcomes and finds 2 significant at p<0.05. Should you trust these 2 findings?
Yes -- p<0.05 is the standard
Be suspicious -- with 20 tests, you expect 1 false positive by chance. Two significant findings out of 20 could be entirely due to multiple testing.
Only if the effect sizes are large
Only if the sample is large
Correct. This is the multiple comparisons problem. At p<0.05 with 20 tests, you expect 1 false positive (0.05 x 20 = 1). The paper should apply a correction (Bonferroni, FDR) or pre-register the primary outcome. If it does neither, treat the "significant" findings with caution.
Module 4 -- ~25 min

Writing a 1-page critique

A good critique is not negative -- it is honest. It names what the paper does well, where the evidence is strong, and where the claims exceed the evidence. The goal: help your team decide whether and how to use this evidence.

1-page critique structure

  1. Citation and research question (2 lines)
  2. Design summary (3-4 lines) -- what method, what sample, what comparison
  3. Strengths (3-4 bullet points) -- what does this paper do well?
  4. Weaknesses (3-4 bullet points) -- what are the threats to validity?
  5. Claims vs. evidence (2-3 lines) -- do the conclusions match the design?
  6. Relevance to our context (2-3 lines) -- can we use this? What transfers? What does not?
  7. Bottom line (1 sentence) -- "This paper provides [strong/moderate/weak] evidence that [X] because [Y]."

Your 1-Page Critique

Saved
Self-check
Your bottom-line reads: "The paper is bad because it uses a quasi-experimental design instead of an RCT." Is this a valid critique?
Yes -- RCTs are always better
No -- quasi-experimental designs are valid for many questions. Critique the execution, not the design choice. Ask whether the comparison group is credible given the design.
Depends on the budget
Only if the topic is important enough for an RCT
Correct. Design choice should match the question and constraints. Many important questions cannot be studied with RCTs (ethics, feasibility, cost). A well-executed DiD or RDD can provide strong evidence. Critique the execution within the chosen design, not the choice itself.
Capstone

Your 1-Page Critique

Evidence Critique

Your critique will appear here.