"How many households should we survey?" This is perhaps the most common—and most consequential—question in impact evaluation design. Get it wrong, and you risk either wasting resources on an oversized study or, worse, collecting data that can't detect your programme's effects even if they exist.
This guide explains the core concepts of sample size calculation without the jargon, and provides practical guidance for development practitioners designing evaluations.
The Core Problem
At its heart, sample size is about detecting real effects (signal) amid natural variation (noise). Your sample size determines your ability to distinguish signal from noise.
If your sample is too small, even real programme effects can get lost in the noise. You'll conclude "no significant effect" when the programme actually worked—a costly mistake for learning and accountability.
Four Key Concepts
The intuition:
- Bigger effect → need fewer observations (signal is louder)
- More variance → need more observations (noise is louder)
- Higher power wanted → need more observations (want more certainty)
- Lower significance level → need more observations (stricter threshold)
The Basic Formula
For a simplified two-group comparison, sample size can be calculated as:
n = sample size per group
Zα = Z-score for significance level (1.96 for α = 0.05)
Zβ = Z-score for power (0.84 for 80% power)
σ² = variance of outcome
δ = minimum detectable effect (MDE)
Key insight: The effect size (δ) is in the denominator and squared. This means that halving your expected effect size quadruples your required sample size.
Effect Size: The Hardest Part
Choosing an appropriate effect size is typically the most challenging part of sample size calculation. There are three approaches:
- Prior evidence: What have similar interventions achieved? (Caution: publication bias often inflates reported effects)
- Minimum meaningful effect: What's the smallest change that would matter given your programme's cost?
- Standardised conventions: Cohen's d of 0.2 SD is "small," 0.5 SD is "medium," 0.8 SD is "large." Most social programmes achieve 0.1-0.3 SD effects.
Cluster Randomisation Complications
In many development programmes, you can't randomly assign individuals—you must randomise at the village, school, or clinic level. This creates intra-cluster correlation (ICC), which dramatically increases required sample size.
The Design Effect captures this inflation:
| ICC | Cluster=10 | Cluster=20 | Cluster=50 |
|---|---|---|---|
| 0.01 | 1.09 | 1.19 | 1.49 |
| 0.05 | 1.45 | 1.95 | 3.45 |
| 0.10 | 1.90 | 2.90 | 5.90 |
| 0.20 | 2.80 | 4.80 | 10.80 |
The message: Cluster randomisation is expensive. More clusters with fewer observations per cluster is generally more efficient than fewer clusters with more observations each.
A Worked Example
Agricultural Extension Programme
Baseline: 2,000 kg/hectare yield, SD 600 kg | Expected effect: 10% increase (200 kg) | Power: 80%, Significance: 5% | ICC: 0.05, Households per village: 15, Expected attrition: 10%
vs. 284 from naïve formula ignoring clustering and attrition
Rules of Thumb
Sample Size Sanity Checks
What If You Can't Afford Adequate Power?
Sometimes the honest answer is that rigorous impact evaluation isn't feasible given resources. Options include:
- Pool resources across organisations or sites
- Change the question—a well-powered process evaluation may be better than an underpowered impact evaluation
- Focus on larger effects—can you intensify the intervention in fewer sites?
- Accept uncertainty—an underpowered study with transparent limitations is better than no evidence, if expectations are managed
"An underpowered study isn't just a waste of money—it's a waste of everyone's time, including your participants'. You've collected their data, disrupted their days, and learned nothing useful."
Tools and Resources
Several tools can help with sample size calculations:
- G*Power — Free software for various study designs
- Optimal Design — Specifically for cluster randomised trials
- PowerUp! — Excel-based, developed for education studies
- ImpactMojo Sample Size Lab — Interactive calculator with South Asian context and INR cost estimation (coming soon)
Getting sample size right is one of the most important decisions in evaluation design. It's worth investing the time to do it properly—before you're stuck with data that can't answer your questions.