How to do M&E when you’re working with complex problems

13 min readFeb 1, 2024

We launched UNDP’s M&E Sandbox almost two years ago. Our vision was to create a space and vehicle to help nurture and learn from new ways of doing monitoring and evaluation (M&E) that are coherent with the complex nature of the challenges facing the world today. There’s been great interest in the Sandbox and strong support for its vision. The enthusiasm, openness, and curiosity that we’re seeing has been much more than we dared to hope for. Today, the Sandbox has over 700 members from close to 200 different organizations around the world and it’s still growing (join our community on LinkedIn here).

I’ve been fortunate to help build and lead this initiative. I’m now moving on to new adventures, and therefore thought it appropriate to reflect on what I have learnt these past 2 years about how to do M&E differently when working with complex systems challenges.

What we’ve been up to the last 2 years

We officially launched the Sandbox with a blog post that explained why we need a different type of M&E when working with complexity and high levels of uncertainty. The blog also outlined a rough but ambitious line of inquiry posing questions such as: how do we measure change in complex real-world systems? How do we measure progress and results across a portfolio of interventions? How do we design and implement M&E frameworks and practices that allow us and others to continuously learn from, adapt, and accelerate our efforts to transform complex real-world systems?

Our work to answer these questions is ongoing and is taking place at several levels. First, we have built a community of people who are passionate and curious about how to do M&E differently. We convene regular conversations in which we collectively explore key questions based on practical examples from organizations in our community and beyond. Past topics include: using M&E as a vehicle for learning, measurement and complexity, progress tracking and reporting, how to measure systems change and how to evaluate whether systems change is good. Second, we try to consolidate what we are learning in new resources such as blogs providing an overview of the wealth of innovation in the M&E field or in in-depth case studies (a series on ‘measuring systems change’ will be published in the weeks ahead). Third, we directly experiment with new approaches to M&E and we help spark spin-offs in partnerships with other organizations looking to rethink M&E. For instance, we are working with SIDA and UNDP colleagues in Bangladesh and North Macedonia to dynamically manage and adapt a portfolio of interventions based on learning generated through sensemaking. Importantly, we are also collaborating with the Bill and Melinda Gates Foundation on a range of activities around rethinking M&E.

Towards a blueprint for complexity aligned M&E

These activities have generated a wealth of useful learning. In this blog I identify a handful of useful practices and frameworks that I think provide a solid blueprint for how to do M&E when you’re working in uncertain environments and with complex problems. As such, I won’t attempt to summarize everything we’ve learned in the Sandbox these past years. Below I focus specifically on how to approach M&E if you are managing or otherwise involved in designing or implementing a portfolio of synergistic interventions intended to effect systems transformation (here’s more information about UNDP’s portfolio approach).

Start by clarifying what problem you’re dealing with and what you need from M&E

Clarify who your M&E activities are intended to serve and design your framework based on the type of problem you’re dealing with. Remember that M&E serves different purposes for different types of stakeholders (communities? Funders? Project managers?) and that M&E needs differ depending on how much uncertainty and complexity you’re dealing with (the Cynefin framework is a useful tool for distinguishing between different types of problems and their implications).

Here it is also important to be cognizant of power dynamics and how M&E may enable and empower or disenfranchise and limit local actors working at the front lines of change (it links to a broader discussion on decolonizing M&E discussed in our recent Sandbox webinar).

M&E should help you do 3 things when you’re dealing with a lot of uncertainty and high degree of complexity:

Regularly learn and adapt: help you navigate uncertainty, improve your efforts, manage risks, and seize new opportunities by allowing you to make better decisions based on learning.
Capture system-level change: help you understand what is happening in the wider system (context) that you operate in and that you seek to effect change in. It includes learning about why the system is stuck, whether change is happening, what role you might be playing.
Track and report on intermediate progress: help you get a sense of whether you are on track at regular intervals. Furthermore, most of you are accountable to a stakeholder, so M&E should also help you provide progress updates to them.

Your M&E framework should serve all these three functions — not just one of them (i.e. not just the one about reporting progress to a funder). Your M&E framework therefore needs different components that cater to these different functions.

How to build out the ‘learn and adapt’ function

Start by convening the stakeholders who’ll be doing the learning and adapting, and collectively articulate a learning agenda: write down and collectively prioritize the questions you hold and would like to explore in your initiative (this may include gaps in your knowledge about causal relationships or how change may happen). Your questions may be specific (such as ‘what assumptions guide decision making among municipal planners in city X’) or broad. USAID offers guidance and tools on learning agendas as does the Evidence Collaborative. I think the following 3 (fairly generic) questions are a useful starting point:

What are the dynamics (incl. trends and patterns) in our system and how is it changing / what keeps it in place?
Can we observe early or weak signals of change (in disconnected, singular, one-off events) that might indicate changes in the system we are trying to transform?
Based on what we are learning, does our portfolio logic and composition (or Theory of Change) still look fit for purpose?

Next, collectively agree on the frequency of learning and schedule regular learning (sensemaking) sessions. I find UNDP’s sensemaking protocol a very helpful guide for how to run a sensemaking session. You may need frequent sessions if you are operating in a very uncertain and fast-changing context. Generally, it’s useful to convene different types of learning events with different degrees of frequency. For instance, you may convene monthly reflection sessions with a core team of stakeholders that have a light touch project management-y focus where you ask questions like ‘are we doing things right?’ (single loop learning). Every six months, you can convene in-depth strategic sensemaking sessions with a wider group of stakeholders where you dive deeper (questioning your assumptions and other learning questions) and explore what is happening in your operating context, what change you’re seeing, etc. (c.f. double and triple loop learning). The visual below provides an illustration of the frequency of different reflection sessions deployed by UNDP and SIDA in North Macedonia.

Source: UNDP’s Strategic Innovation Unit

Following a learning event, you need to directly link learning to decision making. In UNDP we try to do this by introducing set procedures for how we capture insights and identify decisions. At (or immediately after) a sensemaking session we consolidate key insights into a set template and proceed to identify and make decisions (or propose them higher-level decision makers). This step is crucial as it closes the loop between learning and adaptation. For more on adaptation I suggest you look at USAID’s Learning Lab, resources on BetterEvaluation, and Tom Aston’s analysis of the state of play on adaptive management.

How to measure system-level change

A diagram meant to portray the complexity of American strategy in Afghanistan. Source: New York Times

This is something we’ve covered several times in the Sandbox. However, it is an area where experimentation and learning are still fairly nascent. My thinking on this is still evolving, but to date, this is how I think it makes sense to approach it:

First, clarify why and for whom you are looking to measure systems-level change. Are you looking to do this primarily to generate insights that can inform what you or other stakeholders do next (e.g. shape future interventions) or is this an accountability exercise in demonstrating results to a funder? In my view, efforts to measure system dynamics are most useful when they are focused on helping an ecosystem of actors to come together, reflect on what is happening in a system, and use these insights to inform what they do next. Setting out to measure systems change because you need to demonstrate and report on your own results is a tall order, not least because system change may take many years to materialize (Laudes Foundation explained this well in our webinar on evaluating systems change) and because it can be very difficult to isolate your own contribution to change at a systems level (some would even say that it is impossible to demonstrate your impact in complex environments).

ACDI/VOCA’s Market System Diagnostics in Honduras and Lankelly Chase Foundation’s System Behaviours Framework are two great examples of learning-focused systems measurement efforts. For instance, ACDI/VOCA are using their approach as a learning tool to help themselves and local market system actors to collectively understand the dynamics, drivers, and opportunities in the Honduran economy — the approach is not intended as a tool to capture whether specific interventions or stakeholders have contributed to systems change. We just released an in-depth case study of ACDI/VOCA’s approach and will soon release one on Lankelly Chase so stay tuned!

Second, agree on an analytical framework that helps to address the needs of your users. Remember to include a human-centric focus — ensuring that your approach is capturing the real-world dynamics and felt experiences of people in the system you’re trying to understand. Different frameworks or approaches are designed for different purposes and can help you gain different types of insights about how a system functions and how it changes. Which framework you chose should be informed by what you want to learn about. Some approaches (e.g. stakeholder network analysis) focus on actors and their behaviours and relationships, while others may pay more attention to the underpinning structural elements of a system (e.g. iceberg model) or causal connections (e.g. causal loop or connecting circles diagrams). Lastly, some frameworks try to combine structural and relational elements such as the water of systems change or four keys of systems change. ACDI/VOCA’s approach has a strong focus on causal factors shaping the Honduran market system. However, these factors are not pre-defined but instead identified through an ‘inductive’ approach (a series of workshops with local stakeholders). In UNDP we are using a combination of the iceberg model (with a strong focus on structural factors) and more dynamic relational elements (stakeholder relationships and resource flows).

Third, develop ‘rubrics’ if you are looking to evaluate whether change in the system is good or bad. One thing is measuring if change is happening in a system, another is evaluating whether this change is ‘good’ or ‘bad’. Here it’s crucial to try to shift the power: to critically consider ‘whose’ vision for change you are looking at and the extent to which this reflects the needs, wants, and values of local communities, marginalized groups and other actors living in the system. In this connection I find it useful to draw on the concept of rubrics: a rubric is a framework that sets out criteria and standards for different levels of performance and describes what performance would look like at each level. Our webinar on evaluating ‘good’ systems change provides an overview of how Laudes Foundation has developed and applied a rubrics-based approach to measuring systems change. In UNDP, we are piloting a similar approach with a number of country offices. In each case, our colleagues identify 3 systemic shifts that they are working towards and specify, for each of them, what the current situation (dominant practice) looks like and what the desired situation (emerging practice) would look like. This approach allows them to evaluate change in the system against a set of criteria.

Fourth, identify how you will measure the key elements of your framework, using a mix of data and drawing on ‘other ways of knowing’. We often tend to see a fairly strong bias towards quantitative measures and data. I think that quantitative information is very useful for measuring some things, while others have to be captured through other types of data and ways of knowing. I therefore recommend combining different types of qualitative and quantitative information (and indeed other ways of conveying information such as video, images, tastes, or smells). In this connection, I find great inspiration in the people-centric approach to measurement deployed by the Poverty and Human Development Monitoring Agency in Odisha, India.

Fifth, proceed to put your approach into practice, and repeat it at regular intervals. Use the first application as a ‘baseline’ against which you can track and discuss trends and change. Repeat the exercise every six months (or at a frequency that makes sense to you). Use each application has an opportunity to collectively (amongst a range of stakeholders) reflect on what is going on and what this means for what you all do next. In this connection, don’t focus too much on single metrics, but ask what the variety of data as a whole is telling you about what is going on. I recommend you ask three key questions: 1) what are we seeing (in the data)? 2) As a whole, what is this telling us about (change in) the system? 3) What does this mean for what we do next?

Finally, analyze causes and contributions to change, as needed. Over time, you may spot early or mature changes in the system you are tracking. These changes provide opportunities for in-dept analysis of why change has happened and what contribution you (or other factors) may have had. There is a range of methods available for this such as contribution analysis, process tracing, and Qualitative Comparative Analysis. AGRA is one of the organizations looking to adopt this approach as they seek to first monitor change in the system across a range of quantitative variables and subsequent analyze causes and contributions, drawing more heavily on qualitative data and analysis. Keep an eye out for our upcoming case study series where we’ll have a deep dive into AGRA’s approach.

How to track and report on intermediate progress

Tracking and reporting on progress is challenging when you work on complex problems. Transformational (systems) change is a long-term process, it is fraught with uncertainty, and you rarely know up front how to best support change. In this connection it is hard to gauge if you are making progress. For instance, you may not want to look at the implementation of activities as an indicator of progress because you will be adapting these activities on a regular basis. Similarly, traditional quantitative KPIs (with a baseline, value, and target) may also not be that helpful. Our webinar on rethinking progress tracking and reporting provides a number of examples of how to deal with this.

A useful framework for tracking and reporting on progress should help you do three things: 1) enable you to check yourselves on whether you are making progress; 2) give you signals / intel about progress that you can share with those you are accountable towards (such as a funder or a board); 3) ensure that you have flexibility to adapt and incentivizes you to adapt based on learning.

The challenge is to design an approach that serves all three functions. For instance, what progress metrics can you use if you don’t yet know how to help catalyze transformational change? What types of metrics will remain relevant even if activities and outputs change?

Fortunately, we’ve seen a number of innovations in how to deal with these challenges. I think a combination of the following two ideas is a useful place to start. These can of course be combined and adapted in a variety of ways.

First, recognize learning and adaptation as interim results. If you operate with a logframe, make sure these are included (accountability for learning). For instance, an interim result may be that you are adapting your theory of change every six months based on what is happening in the wider context. This is an idea that is gaining increasing recognition. For instance, the UK funded Partnership to Engage, Reform and Learn Programme in Nigeria had a specific learning indicator. It was based on a payment by results model and it linked payments to this learning indicator (with a 20% weight). Similarly, several foundations are shifting reporting requirements to focus more on learning, stories, and adaptation, rather than on hitting targets on quantitative KPIs. Examples include BHP Foundation, Luminate and Humanity United (learn about these examples in our webinar on progress tracking). While this may seem like an easy fix on paper, remember that it requires a larger organizational shift from a culture of compliance to a culture of learning. This doesn’t happen overnight.

Second, use ‘forward looking’ metrics for ‘momentum’ rather than measures that only look back on whether you implemented what was in your workplan. At BHP Foundation, they have made a shift from retrospective and unstructured program evaluations towards deeper and “forward looking” focus of program evaluation structured around testing the most critical and uncertain issues, assumptions, and hypotheses about impact, scale, and sustainability. At UNDP we have introduced a new results framework for portfolios that work to effect systems transformation. This framework has ‘momentum’ as one of two categories of intermediate results (the other being learning). The idea with momentum is that it should be forward looking and allow for adaptation and emergence. For me, useful indicators of momentum include degree of coherence across interventions in your portfolio, whether you are generating and seizing on new opportunities and entry points, and how responsive your portfolio is to trends and changes in the system.

A variation of the above is ‘principles-based’ monitoring and reporting. We are currently piloting this in a grant with the Bill and Melinda Gates Foundation. In a recent blog we explain what principles-based progress reporting entails.

Signing off…

The blueprint I have presented above constitutes my current take on how I would design and implement an M&E framework equipped to deal with complexity and systems challenges. What I have presented is tentative and incomplete. The ideas above are rarely my own. I have drawn inspiration from a wealth of people, organizations, and efforts around the world (as you can see in the many links and shout outs) — not least all those who have shared their thoughts and learning in the Sandbox. As I look ahead, I am excited to see how our collective practice will evolve in the years ahead and I look forward to contributing to it.

If you would like to join or partner with the M&E Sandbox please reach out to contact.sandbox@undp.org.