Innovative M&E from the Sandbox and beyond

21 min readJan 20, 2023

By Søren Vester Haldrup with support from Samuel Tran

UNDP has set up an M&E Sandbox to nurture and learn from new ways of doing monitoring and evaluation (M&E) that are coherent with the complex nature of the challenges facing the world today. This earlier blog post discusses the rationale and focus of the Sandbox.

There’s been overwhelming interest in the Sandbox since we launched the initiative. Already, people from over 100 different organizations have signed up, we’ve run several webinars and sessions, and had great conversations with outfits such as the International Development Innovation Alliance, MacArthur and Akin Fadeyi Foundations, USAID and SIDA, the Global Evaluation Initiative, and many more. The Bill and Melinda Gates Foundation has also provided generous grant support to the initiative (more on this soon!).

In this blog we are sharing a digest of some of the many useful and innovative monitoring, evaluation and learning resources and efforts that have come through the M&E Sandbox in 2022. A lot of these resources have been shared by our community in response to the overwhelmingly positive feedback from the launch of the Sandbox (please keep them coming!). We hope you find it useful.

We have grouped these efforts and resources under six broad questions:

How do we measure systems transformation?
How do we know if we are on track?
How do we rethink complexity and independence in evaluation?
Why, how and for whom do we measure?
How do we generate insights and learn?
How do we make decisions and adapt?

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

How do we measure systems transformation?

An increasing number of organizations, movements, and people aim to contribute to systems change or transformation. Some use these phrases synonymously while others distinguish between change (which can also be incremental) and wholesale transformation.

In order to measure systems change we must first define it. This is easier said than done, and we could easily spend this whole piece on a theoretical discussion of this one topic. Outfits such as the OECD Observatory of Public Sector Innovation (OPSI) and Catalyst2030 offer overview discussions for those interested. Forum for the Future provides a helpful distinction between ‘a system’ (a configuration of parts connected by a web of relationships towards a purpose), a ‘systems approach’ and ‘systemic change’.

The best starting point for measuring system change is to work with the definition and conceptual framework that makes sense to you, your stakeholders and your intent (i.e. what you are trying to achieve). Different frameworks focus on different things (i.e. elements, levels or functions of a system) and have (explicitly or implicitly) different assumptions about how change happens. In a recent piece, Donna Loveridge reviewed 28 different systems change frameworks to understand their similarities and differences. She identifies four categories of elements in these frameworks: parts of a system, characteristics of change, interventions, and outcomes/results. The Building Better Systems Green Paper identifies three levels of system change: micro (niche), meso (regime) and macro (landscape) and describes change through phases of alignment, disalignment, and realignment. It identifies four keys to unlock change: purpose, power, resources and relationships. Another popular and useful framework is the Water of Systems Change which, drawing on the logic of the iceberg model, operates with six conditions of systems change at three different levels. Australian Aid’s Market Development Facility has championed the Adopt-Adapt-Expand-Respond framework for systemic change (which includes a helpful discussion on the topic). See also the Springfield Centre’s pragmatic approach to measuring system change.

Source: FSG — The Water of Systems Change

The next step is to identify ways of measuring and tracking change in the different elements of the system. Nancy Latham from Learning for Action offers a practical guide to evaluating systems change in a human services system context, while Mark Cabaj proposes an inquiry framework for evaluating systems change results. The inquiry framework includes single, double, and triple loop learning, and identifies three types of systems change (changes in drivers of system behavior, changes in behaviors of system actors, and changes in the overall behavior of the system).

So far we haven’t found many (written up) examples of organizations actually tracking change in complex systems on a continuous basis, but there are some interesting examples. ACDI/VOCA has developed a Market Systems Diagnostic which analyzes changes in market structures and enterprise behaviors to understand how and whether the market system as a whole is changing to become more inclusive, competitive and resilient. Collaborate and New Philanthropy Capital have developed a maturity model to help track the development of ‘system conditions’ in Save the Children UK’s Early Learning Communities programme. In this model they look at key enablers which signal that the system is changing in ways that will lead to better outcomes for children in the longer term. Lund University is using a ”layer model” as a way to track leverage and ”ripple effects” of collaborative action in an innovation ecosystem. UNDP is incorporating a system perspective in its M&E activities in its portfolio on circular economy in Pasig City (Philippines). The Transformative Innovation Policy Consortium also provides some useful resources. Furthermore, Rohini Nilekani Philanthropies’ shares its experience in a 3-part series (Part 1: What are We Learning?, Part 2: How Do Systems Change?, Part 3: Our Role as Needle and Thread) and DESTA Research offers an example of the use of systems thinking and modelling for enhancing climate action. Lastly, the Small Foundation articulates a methodology for comparing conditions for systems change against entry points: Tracking systems change — difficult but necessary. If you are interested in this area you may also want to check out Clear Horizon’s e-course on Evaluating Systems Change and Place-based Approaches as well as the Esade Center for Social Impact’s Community of Practice on Impact Measurement.

As a third step, we may then interrogate why these changes have come about and whether we have contributed. There is an arsenal of different methods available for this such as contribution analysis, process tracing, and Qualitative Comparative Analysis. In UNDP’s Strategic Innovation Unit we have used a combination of outcome harvesting and contribution tracing to analyze our how we have influenced wider processes of (systems) change. Related, a report from the Walton Family Foundation on causal pathways identifies, through five field-level experiments, various ways to strengthen the ability of social change agents to see more deeply into how change is happening and learn from it to inform their strategies (present and future).

How do we know if we are on track?

It’s difficult to know if we are on track when working on systems transformation. First, transforming a system is a long-term process that may take years or even decades to materialize. Second, we may not know exactly what such transformation will look like (though we might have a sense of the direction). Third, as we start out we will probably not know how to best support transformation (i.e. the solutions are unknown), so we need to begin to probe, learn and adapt. Fourth, because we need to continuously learn and adapt, we don’t want simply look at the implementation of activities (which will need to evolve) as an indicator of progress. So how on earth do we then know if we are on track?

There are various ways of dealing with this challenge. For starters, we may focus on learning and adaptation as results in their own right. This makes sense because working to tackle complex problems requires learning and adaptation — so surely these things should be recognized as necessary intermediate results? For instance, we could articulate a set of learning questions and regularly check whether we are generating answers to these. There are some examples of this type of thinking. For instance, UNDP’s Innovation Facility (which is generously funded by the government of Denmark) has a learning agenda as well as a results framework which includes indicators such as the number of learning and reflection sessions. Another example is the UK funded Partnership to Engage, Reform and Learn (PERL) programme in Nigeria which had built-in learning explicitly from its inception and introduced a specific learning performance indicator. Tom Aston talked about the PERL experience in a past Sandbox webinar (watch it here), including the challenges associated with introducing a stronger learning focus and the difficulties of trying to bridge accountability and learning mechanisms. Lastly, UNFPA’s 3+5 Framework offers an interesting take on results-based management with a focus on learning and adaptation.

Second, we can check our progress against how well we adhere to key principles rather than whether we deliver a set of activities. This idea builds on Michael Quinn Patton’s concept of Principles-Focused Evaluation, though it can also be used in monitoring. The idea is that we use principles to ensure a consistent focus on achieving overall objectives, while maintaining operational flexibility in terms of how to get there. There are some examples of this approach being deployed. For instance, USAID has deployed it in the Dominican Republic, UNDP Armenia has used this type of logic in their SDG Innovation Lab. Furthermore, we are at the Strategic Innovation Unit and UNDP’s Green Commodities team currently piloting a principles-based approach to monitoring and reporting in a new collaboration with the Bill and Melinda Gates Foundation (more on this in the near future).

Third, we may use a theory (of change) based approach to track progress. Based on our ‘theory’, we can then look for early signals of change. In terms of theories of change, the Transformative Innovation Policy Consortium and EIT Climate-KIC offer a useful handbook for developing a transformative theory of change (ToC) as does Hivos in their stepwise approach to ToC thinking in practice (including the role of power and gender!). Furthermore, CausalMap has an online research tool for identifying and visualizing causal connections, Laudes Foundation shares its interactive ToC, and a 2021 article in the journal ‘Evaluation’ describes how to build a system-based theory of change using participatory systems mapping. Lastly, Duncan Green shares useful thoughts on ToCs and urges us “to distinguish between theories of change (how the system itself is changing, without our involvement) and theories of action (the small differences we can make, usually in alliance with others). If theories of change start by putting us at the centre of everything, that is a serious problem — we almost never are.”

When it comes to looking for early wins and signals of change, UNDP’s Green Commodities Programme uses a neat Self-Assessment Tool to help document systems change through effective collaborative action. Several UNDP offices are also experimenting with a new depth and breadth framework for capturing intermediate results (we discussed an early version of this framework here). Oxford Policy Management has used a governance assessment tool to document early results and their likelihood of influencing wider sustainable change in the context of climate change resilience efforts in South Asia. The Smallholder and Agri-SME Finance and Investment Network is using Network Health Surveys and Social Network Analysis to capture and visualize the levels of communication, coordination, and collaboration among impact networks — do also check out a Health Foundation-hosted discussion from earlier this year of how to capture the value in networks and communities of practice. When it comes to ‘reporting’ results and impact, the Small Foundation’s Impact Report 2021 provides a nice illustration of the use of ‘impact stories’, attention to learning, and moving beyond simple quantitative KPIs.

How do we rethink complexity and independence in evaluation?

Evaluation can be used in a broad sense to refer to any systematic process to judge merit, worth or significance by combining evidence and values. However, evaluation is often understood in more narrow terms as an activity that seeks to determine the merit or worth of an intervention. The OECD evaluation criteria focus on the relevance, impact, effectiveness, efficiency and sustainability of an intervention, but the usefulness and relevance of these criteria are being challenged in a world facing increasingly complex problems. In an interesting article, Juha Uitto, the Director of the Global Environment Facility’s Independent Evaluation Office, discusses evaluation in the Anthropocene, arguing that our approaches must be open to the full human and natural systems within which we and interventions operate.

This has given rise to a range of new evaluation approaches and principles that are more attuned to complex changing contexts and innovative ways of working. Rather than striving to find a single ‘gold standard’ method in complexity-aware evaluation, Tom Aston and Marina Apgar argue for bricolage and note that the ‘best’ method is dependent on what questions an evaluation asks, the attributes of the intervention being evaluated, and available designs linked to the intended uses of the evaluation. The authors proceed to outline a framework based on a review of 33 methods to support evaluators to be more intentional about bricolage and to combine the component parts of relevant methods more effectively. In a recent piece on futures-focused evaluation Katri Vataja and Rose Thompson Coon from Sitra discuss how foresight and futures-thinking, combined with evaluative thinking can be a powerful tool in work that aims for societal impact in a turbulent environment. Imago has developed an adaptive evaluation approach and is implementing it in India with the Self-employed Women’s Association (SEWA) Bharat and Bill and Melinda Gates Foundation. Similarly, developmental evaluation is a popular evaluation approach which seeks to support innovation development to guide adaptation to emergent and dynamic realities in complex environments. It has been used by organizations such as UNFPA and the Open Government Partnership (read more here and watch this Sandbox webinar). A 2021 study of 10 years of Developmental Evaluation in USAID documents how USAID has incorporated this practice into the agency. For detailed case studies and practical tips from USAID efforts see here and here. Do also check out Clear Horizon’s e-course on developmental evaluation.

In many of the approaches discussed above, the role of evaluation changes from a point in time assessment to a continuous learning journey. This shift is not without its challenges. For evaluators, it entails a transition from fierce independence towards a more embedded way of working, but some fear this may result in less objective and ‘skewed’ findings. It can also be a difficult shift to make for people and organizations whose identities have been framed around independence and ‘arms-length’ evaluation principles.

Another important aspect of change in the evaluation field is the need to de-colonize the discipline and strengthen non-Western voices and mindsets. Peace Direct’s Time to Decolonise Aid provides a comprehensive overview of this challenge. Oxfam’s Sophie Walsh discusses ‘control-hungry’ approaches that ‘exacerbate imbalances of power’ between donors and partners. Lastly, we also recommend the summary of a recent conference session at Bond which discusses the colonial roots of established MEL approaches.

Why, how and for whom do we measure?

Measurement is a fundamental aspect of our monitoring, evaluation, and learning efforts. It is about the way we estimate or assess the extent, size, quality, value, or effect of something. However, measurement is not a value-free technical activity. How and for whom we measure matters greatly for our capacity to learn, our ability to adapt, and our awareness of whether we are on track with our (portfolio of) interventions that interact with complex systems.

The most recent M&E Sandbox session focused on measurement and featured insightful presentations on this topic from the MacArthur Foundation, Akin Fadeyi Foundation, and Search for Common Ground as well as practitioners John Mortimer and Roxanne Tandridge (watch the session and read the recap here). Building on that session, we have identified (at least) four themes worth considering when discussing measurement:

Why we measure?
For whom do we measure?
What counts as evidence?
How to measure the things that are hard to measure?

Why do we measure? This is important to clarity. Do we do it to learn, to ensure accountability, for fundraising purposes, or all of the above? Often, measurement (and M&E in general) is done for accountability and compliance purposes to satisfy donors or other principals’ need for information rather than for the sake of learning. Accountability is important and sometimes the more traditional ‘logic models’ may be useful. In his piece “In defense of logic models”, Ian David Moss argues that these tools can be flexible and have been misunderstood. However, we may also want to measure things primarily for the purposes of learning and adaptation. This may require a bit of a different mindset. Ignited Word provides a useful set of practical ideas on how to do ‘results measurement on your own terms’, including shifting focus towards measuring what matters, drawing funders into learning conversations, and making the case for how doing less of what doesn’t matter to the organization frees up time and energy to go after more valuable results measurement. In this past Sandbox session we explored how different organizations are trying to shift to a learning focus, including Toby Lowe’s work on applying a Human Learning Systems approach.

For whom we measure? Usually, we measure for donors and other ‘principals’ (remember the old ‘principal-agent problem’?). However, it may help to think about the ‘who’ along a continuum: from those furthest from the problem to those closest to the problem. The pendulum has long swung heavily towards (the extractive activity of) measuring for those furthest from the action: donors, funders, high-level decision makers, and CEOs (however, these actors may not make full use of the data — read more about evidence use below). If we are interested in measuring for learning, empowering local actors (not least women and marginalized communities) and accelerating change, it makes sense to also measure for and with those closes to the problem, such as local officials and communities. Colleagues at UNDP working on prevention of violence extremism makes this point forcefully in No more data “blah blah” — this is about empowerment. In a similar vein, the Center for Global Development recently produced a report on new evidence tools for policy impact report. A core theme in the report is the importance of shifting agenda-setting power and resources to those who best understand local policy contexts and priorities. A blog from Ignited Word echoes this shift arguing for results measurement led by implementers rather than donors. Similarly, the Global Resilience Partnership has traced its own journey to resilience measurement that prioritizes information needed by subsistence farmers to navigate to a better future.

What counts as evidence? Quantitative data and standardized metrics and, generally, Western-centric ways of ‘knowing’ have long dominated the conversation around M&E (especially accountability-focused M&E). This mindset, however, is changing. IFAD and WFP have begun using Participatory Narrative Inquiry (PNI) to collect stories from stakeholders, who then have the opportunity to self-interpret the information. Likewise, FSG has long championed a mixed-methods approach to evaluating complexity, drawing on various tools such as social network analysis, reflective practice, and learning memos. The Center for Global Development advocates for more technology-based measures, thanks to “geocoded survey data, administrative data, remotely sensed data, low-cost remote surveys, and big data.” And as part of our Deep Demonstrations, we (the Strategic Innovation Unit) encourage approaches that are methodologically eclectic, that go beyond numbers to address learning needs, and inform decision making. In Yemen, UNDP’s work has included the use of micronarrative research, such as writing short stories, pictures, videos, and using voice messages via WhatsApp.

How to measure things that are hard to measure? Measuring things like culture and changes in mindsets has proven to be an enduring challenge for researchers and M&E practitioners. However, we are seeing interesting efforts to tackle this. One place to start may be recognizing that hard to measure things cannot be captured in a meaningful way through simple number-based KPIs. Rather, a more useful approach could be holistic measures (narratives, case studies) that combine quantitative and qualitative data. In this vein, organizations such as IM Sweden and You’ve Got This have been exploring the use of storytelling as an evaluation technique. However, the Mayors for Economic Growth programme reminds us that we should not overengineer things from the get-go. It can be useful to begin with simple measures as opposed to overengineering measurements from the start. Other useful practices include work by Rohini Nilekani Philanthropies — they have articulated a helpful typology to help funders capture impact — including those intangible and slowly emerging results. In healthcare contexts, John Mortimer has outlined the process of creating appropriate service design measures and Search for Common Ground is working on a Social Return on Investment (SROI) methodology where communities and other local stakeholders define the impacts that they are experiencing, rather than impacts being defined by distant programme designers and evaluators.

How do we generate insights and learn?

For most organizations, M&E tends to serve several purposes. One common (and useful) distinction is between learning and accountability — are we doing M&E to learn or to ensure accountability towards a funder or constituency? While both are important, many M&E frameworks and practices tend to be focused primarily on ensuring accountability. This INTRAC brief provides an overview of the learning vs accountability debate.

Our M&E Sandbox has a strong focus on learning because learning is crucial for navigating uncertainty and tackling complex problems. As we discussed in our first sandbox blog we often don’t know up front how to best help tackle wicked problems so we need to continuously learn and, in turn, adapt what we do based on learning. As noted above, we explored in a previous Sandbox session how different organizations are trying to shift their ways of working (including their M&E systems) to help them continuously learn. Read a summary of the session and watch the recording here.

Many organizations are investing heavily in building their muscle for learning. This is evident in the growth of job titles that include the word ‘learning’ and in the increasing use of learning partners and critical friends (rather than describing these as external evaluators or third-party monitoring functions). This trend aligns with the increasing popularity of developmental evaluation (see above). Foundations in particular seem to be embracing the use of learning partners — examples include MacArthur and Rockefeller.

A helpful starting point for learning is to articulate a learning agenda. A learning agenda can be as simple as a series of questions where we identify our knowledge gaps and learning needs. Articulating learning questions may sound obvious, but too often we jump to the process question of how to learn before reflecting on what we want to learn about. For us, learning questions provide the north star for MEL activities. In the Strategic Innovation Unit we have a learning agenda (a list of questions) that guide our M&E data collection, analysis and sensemaking. USAID offers guidance and tools on learning agendas as does the Evidence Collaborative. Even the United States Internal Revenue Service has a learning agenda!

Inclusive and collective learning: it is important to learn with others rather than alone because no single organization can understand or tackle an issue on their own. The Transparency and Accountability Initiative offers a useful overview and resources on how various funders can bring elements of participation and power-sharing into the design of their strategies while Bond provides a guide for effective consortia. In a similar vein, the Center for International Private Enterprise offers a tool for building and learning in coalitions — especially where there are inequities in the distribution of power and resources — and MacArthur and Shehu Musa Yar’Adua Foundations provide a case study on their use of a cohort model to anti-corruption work in Nigeria.

Another helpful practice is to use a structured approach for reflecting and strategically generating insights. This approach is about the how we will learn while the learning agenda is about what we will learn about. The Cynefin sense-making framework is one well-established approach. Many other organizations practice a form of sensemaking, such as the Centre for Public Impact, Agirre Lehendakaria Center and Red Associates. In UNDP we use a particular sensemaking methodology and we’ve published a guide that we hope others can find useful (check out our Sensemaking Workshop Preparation Guide and Facilitator Guide). You can read more about how our sensemaking protocol came about here.

Learning is about more than process though. A key element is to build a culture, mindsets, and practice around learning. Often, people may feel too busy to take time to reflect. They may also not be interested in or comfortable with challenging their own or others’ established ways of working and thinking. Gautam John discusses how Rohini Nilekani Philanthropies has sought to build a learning culture: emphasizing the importance of identifying learning as a core organizational value and desired behavior and demonstrating as well as systematizing it across the organization (in leadership style, communications, policies, etc.).

‘Live’ or ‘real-time’ learning is another helpful concept. Working alongside Health Equality Development Grantees for nine months, InnovationUnit.org has found that “some of the most valuable learning occurs during emergent or ‘live’ learning — where, as a group, participants identify and solve issues as they unfold, through carefully designed activities and drawing on their own experiences.” They describe this process in their health equalities storybook. Similarly, Pakistan’s Ehsaas Programme provides lessons about learning ‘live’ through implementation challenges: an important insight being that Ehsaas’ leadership’s commitment to strong feedback loops allowed teams to remain flexible and rapidly learn and evolve with each implementation challenge.

Technology can sometimes help increase the frequency and cost-effectiveness of rapid learning. Folktale offers a platform for stakeholders to contribute their stories, observations and experiences through a simple mobile video experience. In a new piece, Project Tech4Dev describe what iterative models look like in practice. They note that one promising iterative model is a reinforcement learning method known as ‘multi-armed bandits’ (MAB) — a strategy for uncovering the most effective programme option from a pool of many variations through continuous experimentation. UNDP’s Regional Innovation Center in Bangkok is also exploring the use of technology for rapid learning, specifically the use of artificial intelligence to improve sensemaking processes.

Application of AI for sensemaking: identifying connections between different teams and projects

How do we make decisions and adapt based on what we learn?

Learning is not of much value unless it is used to improve and adapt practice. Adaptation and decision making are therefore critical links between learning and impact. They enable an organization to course correct towards desired outcomes in real-time, as a ship might in response to changing weather conditions. One of the first steps toward becoming an agile organization (or team / project / coalition) is linking feedback loops with planning and implementation. Feedback loops may happen in various frequencies and tend to depend on how complex or uncertain the operating context is. Many prefer semi-annual learning loops, though some may do it on a quarterly basis.

Implementation and adaptation through 6-monthly learning loops in a new UNDP-BMGF initiative

Much has already been written about adaptive management. Its early champions include the Overseas Development Institute (ODI) with its DFID/FCDO partnership, USAID’s Learning Lab, and the Danish MFA. A list of the relevant literature on its emergence can be found at BetterEvaluation. Tom Aston’s analysis is well worth a look for a helpful history and state of play on adaptive management, as well as a resource hub on the credibility of the practice. For a closer look at one organization’s journey, check out UNFPA’s piece about creating and using the A-Compass — an approach built in response to the agency’s challenges with results-based management.

One of the key challenges we (in UNDP) are facing at the moment is the practical operational aspects of adaptation. We understand the principle, but are still working through the kinks of integrating adaptation into programme governance, operations and day-to-day practice. For instance, UNDP Yemen is grappling with how existing IT systems for management and finance can allow for adaptation. One helpful resources for the more practical aspects of adaptive management is ODI’s reflections on three years of the LearnAdapt programme which includes over 20 guidance notes developed by DFID/FCDO on various topics like “when should I consider flexibility and/or adaptation in programming?” and “things to try” cards. Similarly, Abt Associates’ toolkit offers helpful templates and worksheets that address practical questions like entry points and organizational readiness, all part of its PILLAR approach to development — politically informed, locally led, and adaptive responses. Furthermore, DT Global provides helpful practical guidance on when to use adaptive management. This entails asking ourselves two questions. (1) Does the initiative in question need this approach? It requires “significant strategic and day-to-day management and therefore can be resource intensive.” Adaptive management is not a cure-all nor a good fit for all programs, so the guide offer a helpful diagnostic to determine where the specific initiative might land (see image below). (2) is the team or organizational culture ready to accept the approach? If the values do not align on flexibility, experimentation, and responsiveness, then adaptive management may be ineffectual or even counterproductive.

Decision making is a key aspect of adaptation. But decision makers may not automatically use learning and evidence as they weigh what to do. In this connection, it is useful to consider how to maximize evidence uptake. Evidence uptake is about the extent to which an organization applies evidence and learning to inform decision making. To get started, Kadambari Anantram provides an overview of the challenges of evidence uptake in policy making, while Matthew Jukes offers ideas for how we can best make decisions without a conclusive evidence base. In his piece “The art of influencing: how to maximize impact in a complex, interconnected world” ICRC Diplomatic Adviser Nick Hawton takes a look at the lessons learned from the fields of communication and diplomacy and how to maximize the chances of influencing the decision-makers and power brokers of today.