From the original USAID logical framework through OECD-DAC evaluation criteria, Theory of Change, the J-PAL randomista revolution, Most Significant Change and Outcome Harvesting, adaptive learning, contribution analysis, and the recent AI-for-evaluation turn — 18 nodes tracing how the field of MEL has actually evolved as practice, not just as doctrine.
18 nodes6 eras~55 yearsCC BY-NC-SA 4.0
Filter by era:
01
Era 01
The Logframe Era
1969 – 1989
USAID’s 1969 commission to Practical Concepts Inc. produced the Logical Framework. It became the dominant planning and evaluation grammar of international development for decades. The OECD DAC was created in 1971 to coordinate donor practice.
1969
USAID Logical Framework (Logframe)
Leon J. Rosenberg for Practical Concepts Inc. · commissioned by USAID, 1969
Argued
A 4×4 matrix structuring a project’s logic: rows for Goal, Purpose, Outputs, Activities; columns for Narrative, Verifiable Indicators, Means of Verification, and Important Assumptions. Forces designers to articulate the causal logic, indicators, evidence sources, and assumptions in a single page.
Mattered
Became the universal grammar of development planning. USAID, World Bank, EU, FCDO, GIZ, JICA — all adopted variants. The vocabulary (outputs/outcomes/impact, indicators, MOVs) traces directly to Rosenberg’s 1969 work.
Critique
Critics (Gasper, Mosse, Eyben) argue the logframe imposes spurious linearity on complex change processes, hides political contestation behind technical language, and centralises authority with donors. The Theory of Change movement (1995–) emerged partly as response.
OECD Development Assistance Committee (DAC) Formed
OECD · Replaced the Development Assistance Group of 1960 · 1971 reorganisation
Argued
A coordinating forum for OECD member donor countries. Sets common definitions of Official Development Assistance (ODA), measures aid flows, and develops standards for evaluation, transparency, and effectiveness.
Mattered
DAC defined the architecture within which donor MEL evolved: the 1991 evaluation principles, the 5 (now 6) DAC criteria, peer reviews, and the Paris Declaration follow from this institutional spine. The 0.7% ODA/GNI target is a DAC norm.
Critique
DAC’s membership is exclusively rich-country donors; recipient and Southern voice is structurally limited. China, India, Brazil and other major South-South cooperation actors operate outside the DAC framework, with different norms.
The DAC codified evaluation criteria; ToC entered development from US foundation work; the MDGs cemented results-based management as the global frame. The 1990s pivoted MEL from input-output accountability to outcome thinking.
1991
OECD-DAC Evaluation Criteria — The Original Five
OECD-DAC Network on Development Evaluation · Principles for Evaluation of Development Assistance, 1991
Argued
Five criteria for evaluating any development intervention: Relevance (does it address the right need?), Effectiveness (does it achieve its objectives?), Efficiency (does it use resources well?), Impact (does it produce broader change?), and Sustainability (do the results last?).
Mattered
The most-cited evaluation framework in history. Anchored evaluation training, ToRs, donor reporting, and academic research for three decades. Coherence was added in 2019, making six criteria.
Critique
Critics (Picciotto, Carden, Patton) argue the criteria are donor-centric, frame interventions as discrete projects rather than systems, and treat values as if they were technical. Real-world evaluations often score everything “moderately satisfactory” without saying anything actionable.
Carol Weiss, Aspen Institute Roundtable on Comprehensive Community Initiatives · New Approaches to Evaluating Community Initiatives, 1995
Argued
Complex social change initiatives need to articulate not just inputs and outputs, but the underlying causal hypotheses about why activities should produce the desired change. ToC makes assumptions explicit and testable, helping evaluators and implementers work with complexity rather than reduce it.
Mattered
By the early 2010s ToC had become standard donor language (DFID, USAID, Hewlett, Gates). Versions like outcomes mapping (IDRC), pathway analysis (Rockefeller), and ToC-as-process (Vogel, James) extended the basic frame.
Critique
Critics argue ToC has become as ritualised as logframes — produced once, never revisited, often consultant-led. Patton and others advocate “developmental evaluation” as the actual practice ToC implies. The gap between ToC-as-document and ToC-as-practice persists.
The 2000s saw a strong push for evaluation quality, donor harmonisation, and the politics of aid effectiveness. The Paris Declaration set principles; UNEG codified quality standards; the methods debate intensified between rigour-as-RCTs and rigour-as-fitness-for-purpose.
2005
Paris Declaration on Aid Effectiveness
OECD Development Assistance Committee · Paris High Level Forum · March 2005
Argued
Five principles: Ownership (recipient-led), Alignment (donors align with country systems), Harmonisation (donors coordinate), Managing for Results, and Mutual Accountability. 12 indicators with targets to 2010 to track progress on each.
Mattered
Most ambitious attempt to reform donor practice in a generation. Drove sector-wide approaches (SWAps), country-led joint performance frameworks, and the explicit obligation to use country M&E systems. Successor conferences (Accra 2008, Busan 2011) extended the framework to civil society and South-South cooperation.
Critique
2010 evaluations found mixed compliance: alignment improved but harmonisation lagged; donor proliferation continued; Busan opened the door for non-DAC actors but quality concerns remained. The aid effectiveness agenda largely faded after 2015 as climate finance and SDGs took centre stage.
United Nations Evaluation Group (UNEG) · Norms (2005), Standards (2005), Code of Conduct (2008)
Argued
Codified what professional, independent, ethical, and quality evaluation looks like across the UN system. Norms cover purpose, principles (independence, impartiality, credibility, utility), and roles. Standards cover institutional framework, evaluator competencies, design, conduct, and reporting.
Mattered
Set the floor for evaluation quality across UN agencies, with influence on national VOPEs (voluntary organisations for professional evaluation), UNEG-RBM partnerships, and the EvalPartners movement. Most contemporary evaluation policy documents (governments, INGOs) trace lineage to UNEG.
Critique
Standards are clearer on independence and rigour than on equity, gender, decolonial frames, or community-led evaluation. UNEG has updated several times (2016, 2020) to address gender-responsive evaluation, equity-focused evaluation, and Made in Africa evaluation principles.
J-PAL, IPA, 3ie. RCTs became the “gold standard” for impact evaluation. Cochrane methods extended to development. The systematic-review and replication agenda took root. The methods debate sharpened: rigour-as-design vs rigour-as-context.
2003
J-PAL Founded — The Randomista Movement
Abhijit Banerjee, Esther Duflo, Sendhil Mullainathan · MIT, June 2003
Argued
Many development questions can be answered using randomised controlled trials — the gold standard from medicine. Random assignment of the intervention controls for selection effects; impact = treatment outcome − control outcome. Build evidence policy-by-policy.
Mattered
Within a decade J-PAL had run over 1,000 RCTs across 80+ countries; 2019 Nobel Prize in Economics to the founders. Reshaped how donors and governments make programming decisions. Built the evidence base on deworming, conditional cash transfers, microfinance, teacher absenteeism.
Critique
Critics (Deaton, Ravallion, Pritchett, Reddy) argue RCTs answer narrow what-works-here questions but say little about why or whether results travel; the movement depoliticises development by avoiding macro and structural questions; selection of researchable questions itself reflects donor priorities.
3ie — International Initiative for Impact Evaluation
Howard White (founding ED) · Established with 17 founding members · February 2008
Argued
Funded high-quality impact evaluations and systematic reviews on development questions, with a methods-pluralist (not RCT-only) stance. Built infrastructure: the impact evaluation repository, gap maps, evidence-and-gap maps (EGMs).
Mattered
Funded ~250 impact evaluations and 75+ systematic reviews to date. Made evidence synthesis accessible to policymakers through gap maps; evidence-informed policy units in Bihar, Punjab, Andhra Pradesh, and South Africa drew heavily on 3ie’s outputs.
Critique
3ie’s influence has waned post-2018 with funding cuts; the focus on rigorous impact evidence sometimes excluded relevance to political and contextual decision-making. The wider evidence-informed policy movement has matured but faces “evidence fatigue” from over-supply of high-quality findings.
A counter-current to the RCT turn: methods designed for complex, emergent, contested change — PDIA, MSC, Outcome Harvesting, Contribution Analysis, Developmental Evaluation. The 2019 DAC criteria revision added Coherence — a quiet acknowledgement that interventions sit in systems.
2011
PDIA — Problem-Driven Iterative Adaptation
Matt Andrews, Lant Pritchett, Michael Woolcock · Harvard CID, working paper 2012; book 2017
Argued
Most failed development interventions fail because they import “best practice” solutions and force-fit them onto problems that are politically and contextually different. Better: define the problem locally, iterate rapidly with feedback loops, build broad agency for change, and accept that solutions emerge.
Mattered
Influenced USAID’s Collaborating Learning & Adapting (CLA), DFID’s Adaptive Programming, the Doing Development Differently manifesto (2014), the Building State Capability programme. Adaptive management is now mainstream in donor practice (at least rhetorically).
Critique
Critics ask whether adaptive practice is genuinely different from good consultancy; whether reporting requirements really allow iteration; and whether donors can resist the pull of pre-defined logframes when accountability pressures rise. Implementation often falls short of doctrine.
MSC: collect stories of change from beneficiaries, have stakeholders select the “most significant” ones through iterated review — surfacing what mattered to whom, not what donors expected. Outcome Harvesting: identify outcomes (changes in actor behaviour) that have already happened and trace back to what contributed.
Mattered
Both are now standard methods for advocacy, governance, and behaviour-change programmes where pre-set indicators miss the actual change. Adopted by Oxfam, ICCO, Hivos, Christian Aid, and many CSO networks; influential in policy advocacy MEL where attribution is structurally hard.
Critique
Both methods rely heavily on facilitation quality; results depend on whose voices are heard. Without rigorous sampling, “most significant” can collapse into “most articulate.” Outcome Harvesting’s contribution analysis can become post-hoc rationalisation if not held to evidentiary discipline.
DAC Criteria Revised — Coherence Added as Sixth Criterion
OECD-DAC Network on Development Evaluation · Better Criteria for Better Evaluation, December 2019
Argued
Coherence joined Relevance, Effectiveness, Efficiency, Impact, and Sustainability. Coherence asks: how well does the intervention fit with other interventions in country, sector, institution? Internal and external coherence both matter. Each criterion was also more clearly defined and aligned with the SDGs.
Mattered
First major revision in nearly 30 years. Formal recognition that interventions sit in systems, not vacuums; aligned with the rise of nexus thinking (humanitarian-development-peace nexus, water-energy-food nexus, climate-development).
Critique
Coherence remains the criterion evaluators find hardest to operationalise; many evaluations treat it perfunctorily. Climate justice, decolonial, and feminist evaluators argue the framework still privileges donor and intervention-centric evaluation, with values implicit and unexamined.
COVID forced remote MEL; mobile data, satellite imagery, and call-detail records became standard data sources; generative AI is reshaping report writing, qualitative analysis, and synthesis. The decolonial evaluation movement and the Made-in-Africa principles (2019–) push back on whose knowledge counts.
2020
COVID Forces Remote & Real-Time MEL
Global evaluation community · UNEG, EvalPartners, INTRAC, Better Evaluation hubs · March 2020 onwards
Argued
Field visits ceased overnight. Evaluators rapidly pivoted to phone surveys, mobile data collection, satellite imagery, call-detail-records, online deliberative methods, and locally-led data collection. Real-time monitoring became a necessity.
Mattered
Permanently shifted MEL practice toward remote-first methods. The locally-led evaluation movement (Pact, Equal Access International) gained ground partly because international consultants couldn’t travel. Acceleration of the “evaluator at a distance” debate.
Critique
Phone surveys systematically under-sample women, rural respondents, people without phones; online methods exclude those without bandwidth. The “digital divide” became a measurement bias. Some pre-COVID gains in beneficiary engagement were reversed by remote-first defaults.
Global evaluation community · Foundation models (GPT-4, Claude, Gemini) · 2023–
Argued
Large language models are now used in evaluation for: drafting and synthesising reports, qualitative coding (interview transcripts), evidence synthesis, indicator construction, and even evaluation design. Multiple INGOs, UN agencies, and consultancies have launched AI-assisted MEL pilots.
Mattered
Substantial productivity gains in report drafting and qualitative analysis. Made systematic review and large-N qualitative work feasible at smaller budgets. Opened access to evidence synthesis for under-resourced organisations and southern researchers.
Critique
Hallucinations risk fabricated quotes; bias in training data is reproduced; the “efficiency” gain can substitute for relationship work and field judgment that AI can’t replicate. UNEG, ALNAP, and EvalPartners are developing guidance; the locally-led evaluation movement is wary that AI re-centralises authority with model-providers.
Mainstream evaluation frameworks — logframes, DAC criteria, RCTs — embed Northern epistemologies that subordinate other ways of knowing. Decolonial evaluation centres community ownership, relational accountability, indigenous evidentiary norms, and reparative purposes. AfrEA’s Made-in-Africa Evaluation principles (since 2007, formalised 2019–) are an institutional anchor.
Mattered
Reshaping major donor practice (Hewlett, Ford, MacArthur on equitable evaluation; FCDO and USAID locally-led monitoring agendas). National VOPEs in the Global South are growing; Cuba, Brazil, India, Kenya, South Africa have active evaluation societies. Re-Imagining INGOs (2022) asks what evaluation looks like in a decentralised civil-society architecture.
Critique
Sceptics argue the principles risk becoming new orthodoxy; some equity claims are unfalsifiable; donor adoption can be performative. Practical implementation requires sustained investment in southern evaluation capacity — a structural rather than methodological question.