Addressing the evaluation-of-sustainability-paradox: A relational rubric for evidencing sustainability prospectively

8 min readFeb 12, 2024

Florencia Guerzovich and Alix Wadeson

*Source:* *https://blogs.ibo.org/2015/06/12/complex-interconnected-systems/*

A key challenge amongst us Monitoring, Evaluation & Learning (MEL) folks is assessing whether and how the scale-up of complex interventions contribute to long-term sustainability and actual system change. Without a crystal ball of the future, we only have a partial view when monitoring and evaluating programs. In complex governance systems, such results are emergent, depending on interactions and relationships between a multitude of actors. Uncertainty is rife. We do not know what will happen after the project and the final evaluation are over.

And yet, we are asked to make meaningful assessments about the sustainability of benefits, including scale-up. We have found there is a lack of useful methods to do this coupled with unrealistic expectations of what sustainability should look like. For instance, idealized visions that the work will continue being replicated and grow in its same form, long past the program and its resources, or it is deemed as an unsustainable failure.

We are increasingly dealing with this tension — programs are often expected to perform and be evaluated based on models for sustainability that dismiss causal analysis and complexity, while at the same time proclaiming trendy rhetoric about systems thinking and adaptive management. If MEL practitioners are expected to spot, sense-make, and learn from patterns and build knowledge at the organizational level to facilitate better decision-making and strategies, the current situation needs to change significantly.

What does it look like to produce more useful knowledge and learning for action?

This blog post briefly introduces an innovative approach of how we built and applied a relational rubric to square the circle about the expectations vs the reality of program sustainability. We call for more consensus and compromise on the conceptual framework for sustainability itself as well as more relevant tools to evaluate it. And, they need to be workable in a context where most organizations operate with limited budgets and timelines for their interventions and MEL.

Our work over the years, especially in the realm of inclusive governance and social accountability, has shown us that there are important insights and signals that can help us understand whether we are on track towards contributing to medium and long-term results, once we shifted our thinking and tools about sustainability and its many forms. To get there, we started by really breaking down what sustainability really means and looks like in the ‘real world’. Then we had to figure out how we could capture it in a practical and transferable way, within the limits of resources, heavy demands on overstretched staff, and the capacities of partners and communities in contexts prone to instability with frequent shocks and shifts.

A new way to evidence sustainability

The 2019 OECD Development Assistance Committee’s (OECD DAC) revamped evaluation criteria for assessing sustainability opened the door for developing innovative approaches to evaluation. It acknowledges that such results are often emergent, and should be monitored and evaluated with this in mind. It therefore emphasizes a “more thoughtful” turn towards applying the criteria by assessing complex processes prospectively — so, to consider the likelihood of sustainable results in the future based on reasonable logic and evidence of signals at the time of monitoring and evaluation. This is a positive step.

Unfortunately, there are persistent unrealistic expectations for programs while many of the traditional evaluative methods and tools still do not incentivize using a systems-lens or think about the future potential for sustainability or the many forms it could take (i.e., what counts). A blind spot exists in which evaluations and research often fail to capture factors in the local system that could favor continuity of results and scale-up over time, even if they have not materialized yet. In fact, guidance about prospective MEL, and more generally, about how to take time seriously in MEL is rare (for earlier attempts, including with Tom Aston see here and here).

To facilitate more productive MEL that accounts for prospective sustainability and the emergence of results in complex operating and governance contexts, we developed and tested an innovative operational approach — a sequential, relational rubric.

Rubrics seem to be a promising approach used by MEL practitioners looking at complex, systemic processes and portfolios (e.g. here, here and here). Rubrics are a form of qualitative scale to denote levels of performance and support assessment, explain what the standard means and clarify the reasoning behind an assessment. They “provide a harness but not a straitjacket for assessing complex change and they help stakeholders build a shared understanding of what success looks like” (Aston, 2021). Our rubric is grounded in systems thinking, co-production and social learning theory, as well as links with collective governance and social contract theory for development. Applying and iterating it over the past four years has helped us to better understand and communicate results about the causal processes, relationships and steps involved in scale-up of interventions with an eye towards prospective sustainability. But because it is transferable and easy to apply. we believe it can also support others to do the same.

To document our learning and offer others in the field a detailed explanation, test evidence and concrete examples, we wrote a paper recently published by the World Bank’s Global Partnership for Social Accountability. The paper discusses at some length the context in which we developed the rubric to assess a complex portfolio of diverse projects (i.e. geographies, sectors, implementing partners) and many of the key choices we had to make, such as purposively embedding a relational and sequential approach into the rubric (see figure below).

Figure: Sustainability Relational Rubric Levels with Criteria

Relationships and Sequences to Assess Complex Causality: The pay-offs

While there are many ways one could develop a rubric, including adapting what we developed, two key elements were essential for us:

Placing relationships front and center: The processes we are interested in are mediated by many actors working jointly on complex social and governance problems. The process brings together unique combinations of stakeholders, dynamics, norms, perspectives, and experiences, amongst other variables. Therefore, these processes will vary depending on the quality of the interactions between stakeholders and frequency over time. This engagement may support deliberation, compromises and, eventually, coordinated action or not.
Opting for a sequential approach: As a critical part of evidencing the likelihood for scale-up and prospective sustainably is to first understand and then investigate the concrete and sequential steps often involved in these processes. While the sequence is not an absolute, framing it in steps helps us organizes relevant actions and events in a temporal order identifying if and how scale-up is on the right track or not, with an eye towards prospective sustainability. Such sequencing can provide significant leverage and support for project teams and evaluators ions when concrete outcomes are still unknown to causally trace complex change processes and produce plausible explanations.

The dataset for our paper focused on the case of a complex portfolio of social accountability interventions. In the paper, we provide the methodological details and outline some of the challenges we faced, such as the lack of resources to collect new data to test the rubric, and how we overcame them (i.e. testing the rubric by retrofitting data from existing evaluations).

Here we want to highlight the pay-offs:

- We were able to better understand whether ‘absence of evidence about scale up for sustainability’ meant ‘evidence of absence of scale up for sustainability’ (i.e. scale up for sustainability is not happening). Fit-for-purpose concepts and methods enabled us to distinguish both possibilities and move forward. In this case, a focus on social learning and compromise — (i.e. a ‘resonance pathway to scale’ ) — made it possible to observe loosely coordinated scale up processes at work in many (but not all) social accountability interventions and identify tangible evidence of prospective sustainability.

- We were able to better track how these processes, the outcomes they generate, and the corresponding evidence often look qualitatively different from the original intervention design and predictions for scale-up at that point in time. The first diagram in the Figure below illustrates this assumption as a ’scale-up transmission belt’ by showing a multi-colored network that replaces the black box and then produces bigger or more replicated multi-colored balls. Yet, when the changes interventions seek to make are complex and contingent on many other actors in a system who bring with them their own circumstances and agendas, what happens inside the black box is critical to informing the expectations for and assessment of results. Inside the black box, the causal path towards scale-up is rarely linear while the results to which they contribute are diverse, as illustrated in the second diagram in the Figure below.

Coda

After this initial test that is outlined in the paper, Alix did a second round of testing of the rubric with another set of projects in the GPSA portfolio. Doing it again was worthwhile to test the iterated version of the rubric and see if it is value and relevance still held up. Perhaps unsurprisingly but still reassuring, it was easier doing it a second time, showing that repetition will ease the resources required to use the rubric and support consistent, systematic MEL. Because the rubric format is transferable between different types of projects in the portfolio and over time, not only did the second exercise yield its own evidence and learning on sustainability, it could also be aggregated with the first test, thereby building an evidence base across several years.

The projects and respective evaluations in the sample that applied a prospective sustainability lens throughout — so in its design, concepts, reporting, monitoring, evaluation questions and datasets — were better suited to using the rubric. While this may seem obvious, it underscores the importance of thinking this way from the project onset, not just at the time of final evaluation.

The long game

Overall, we think that when there is a commitment to operationalizing prospective sustainability, we can deliver more useful MEL systems, knowledge, and action. Our experience has taught us that relational, sequential rubrics can help us in this aim. The social accountability case also suggests that doing so can help create insights about wider field and systemic dynamics, filling an important gap in the function of MEL at the organizational, program and interventions level.

The paper includes more recommendations. We hope you will dive deeper by reading the paper and considering if it resonates with you and your work on MEL around sustainability. We also invite you to share your insights in the comments section or send us a message!