Sample Size Matters: A Practical Guide for Development Practitioners

"How many households should we survey?" This is perhaps the most common—and most consequential—question in impact evaluation design. Get it wrong, and you risk either wasting resources on an oversized study or, worse, collecting data that can't detect your programme's effects even if they exist.

This guide explains the core concepts of sample size calculation without the jargon, and provides practical guidance for development practitioners designing evaluations.

The Core Problem

At its heart, sample size is about detecting real effects (signal) amid natural variation (noise). Your sample size determines your ability to distinguish signal from noise.

If your sample is too small, even real programme effects can get lost in the noise. You'll conclude "no significant effect" when the programme actually worked—a costly mistake for learning and accountability.

Signal vs noise in sample size
[Illustration 1: Signal vs Noise]
Sample size determines your ability to detect real effects

Four Key Concepts

Statistical Power
The probability that your study will detect an effect if an effect actually exists. Standard target: 80%.
🎯
Significance Level (α)
Probability of concluding there's an effect when there isn't (false positive). Standard: 5% (α = 0.05).
📏
Effect Size
How big a change your programme is expected to create. Larger effects are easier to detect; smaller effects need bigger samples.
📊
Variance
How spread out the outcome is in your population. Higher variance makes effects harder to detect.

The intuition:

  • Bigger effect → need fewer observations (signal is louder)
  • More variance → need more observations (noise is louder)
  • Higher power wanted → need more observations (want more certainty)
  • Lower significance level → need more observations (stricter threshold)

The Basic Formula

For a simplified two-group comparison, sample size can be calculated as:

Sample Size Formula
n = 2 × [(Zα + Zβ)² × σ²] / δ²

n = sample size per group

Zα = Z-score for significance level (1.96 for α = 0.05)

Zβ = Z-score for power (0.84 for 80% power)

σ² = variance of outcome

δ = minimum detectable effect (MDE)

Key insight: The effect size (δ) is in the denominator and squared. This means that halving your expected effect size quadruples your required sample size.

Effect Size: The Hardest Part

Choosing an appropriate effect size is typically the most challenging part of sample size calculation. There are three approaches:

  1. Prior evidence: What have similar interventions achieved? (Caution: publication bias often inflates reported effects)
  2. Minimum meaningful effect: What's the smallest change that would matter given your programme's cost?
  3. Standardised conventions: Cohen's d of 0.2 SD is "small," 0.5 SD is "medium," 0.8 SD is "large." Most social programmes achieve 0.1-0.3 SD effects.
⚠️ Warning
Don't work backward from your budget ("we can afford 200 surveys, what effect can we detect?"). This often yields implausibly large MDEs that set up the evaluation to fail.

Cluster Randomisation Complications

In many development programmes, you can't randomly assign individuals—you must randomise at the village, school, or clinic level. This creates intra-cluster correlation (ICC), which dramatically increases required sample size.

The Design Effect captures this inflation:

Design Effect
DE = 1 + (cluster size - 1) × ICC
ICC Cluster=10 Cluster=20 Cluster=50
0.01 1.09 1.19 1.49
0.05 1.45 1.95 3.45
0.10 1.90 2.90 5.90
0.20 2.80 4.80 10.80

The message: Cluster randomisation is expensive. More clusters with fewer observations per cluster is generally more efficient than fewer clusters with more observations each.

Sample size calculation workflow
[Illustration 2: Sample size workflow]
The complete sample size calculation process

A Worked Example

Agricultural Extension Programme

Baseline: 2,000 kg/hectare yield, SD 600 kg | Expected effect: 10% increase (200 kg) | Power: 80%, Significance: 5% | ICC: 0.05, Households per village: 15, Expected attrition: 10%

1 Basic sample size: Using the formula = 142 per group
2 Adjust for clustering: DE = 1 + (15-1) × 0.05 = 1.70 → 142 × 1.70 = 241 per group
3 Adjust for attrition: 241 ÷ (1 - 0.10) = 268 per group
4 Convert to clusters: 268 ÷ 15 = 18 villages per group
Required Sample
36 villages, 540 households

vs. 284 from naïve formula ignoring clustering and attrition

Rules of Thumb

Sample Size Sanity Checks

Small effects (0.2 SD): plan for 400+ per group
Medium effects (0.5 SD): plan for 65+ per group
Cluster RCTs: 20+ clusters per arm minimum
Always inflate for attrition (typically 10-20%)
If calculated sample is surprisingly small, double-check assumptions
If budget only allows small samples, consider qualitative methods instead

What If You Can't Afford Adequate Power?

Sometimes the honest answer is that rigorous impact evaluation isn't feasible given resources. Options include:

  • Pool resources across organisations or sites
  • Change the question—a well-powered process evaluation may be better than an underpowered impact evaluation
  • Focus on larger effects—can you intensify the intervention in fewer sites?
  • Accept uncertainty—an underpowered study with transparent limitations is better than no evidence, if expectations are managed
"An underpowered study isn't just a waste of money—it's a waste of everyone's time, including your participants'. You've collected their data, disrupted their days, and learned nothing useful."

Tools and Resources

Several tools can help with sample size calculations:

  • G*Power — Free software for various study designs
  • Optimal Design — Specifically for cluster randomised trials
  • PowerUp! — Excel-based, developed for education studies
  • ImpactMojo Sample Size Lab — Interactive calculator with South Asian context and INR cost estimation (coming soon)

Getting sample size right is one of the most important decisions in evaluation design. It's worth investing the time to do it properly—before you're stuck with data that can't answer your questions.