Measuring to Learn:

Jason Pearman
27 min readOct 4, 2024

--

Some lessons from stretching traditional monitoring and evaluation work so that a social program can learn, and respond, faster.

I’m currently on a mini-sabbatical from my role within one of the Government of Canada’s youth employment programs.

My team is responsible for a broad portfolio of activities aimed at helping the program better understand and grow its impact. Ultimately, we use what we learn from this work to develop new program design/implementation options and then assist the responsible teams to implement.

When I took on this role, demonstrating results, leveraging administrative data, and evidence-informed decision making were major priorities across the Canadian public sector. The term “impact measurement” was also gaining traction.

Over the years, we’ve learned a lot about monitoring and evaluation (M&E) and impact measurement, primarily by trying to adapt practices being pioneered by Climate KIC, UNDP, SITRA, Code for America, J-PAL, Cynefin Centre, and others to evaluate and improve programming in complex policy areas.

While M&E sandboxes, Delivery-Driven program streams, and Parallel Learning Systems may be beyond our current scope, I feel that after a few years of concerted effort we’ve arrived at a useful place: stretching more traditional M&E activity — primarily designed to support top-level reporting and accountability functions — so they can also support timely learning and more nuanced policy advice/program design.

Some of our experience may be useful to others working in organizations focused on social and environmental outcomes, so this blog post is intended to share some of what I covered during a deep dive with my team before logging off.

Disclaimer: M&E and impact measurement are not my areas of expertise, so treat what follows below as learner’s reflections. Apologies in advance to the experts in my network for missing important nuances that you no doubt explained more than once!

Figure: A visual representation of our team’s mandate. The “Measuring Impact” side reflects most of our M&E and impact measurement work, while the “Growing Impact” side reflects most of our R&D work¹. The people represent the youth the program aims to serve, and the platform is the collection of program interventions that ultimately have an impact on the lives of youth. It’s a lot messier in practice, but it helps us think about where we fit amongst all the of the moving pieces needed to deliver a large government program.

Weaving Impact Measurement and Management into M&E

The reform of the youth employment program launched over 2018 and 2019, with the big policy shift being to increase the focus on youth who face the greatest challenges entering and staying in the labour market.

Recognizing that these youth encounter multiple barriers, the program supplemented traditional workforce development interventions — funding third-party organizations that provide tailored support to help youth gain the skills, abilities and experiences needed to get good jobs — with a series of new initiatives. These new initiatives were designed to respond to some of the more systemic challenges that these youth face, which if left unaddressed would limit the impact of the overall policy shift. Additionally, there was a renewed emphasis on collecting better quality outcomes data and increasing the program’s capacity for impact measurement so that officials would be equipped to provide stronger policy recommendations to elected leaders.

When I joined the program five years ago to help support the implementation of the reform agenda, I inherited the combination of the new systemic initiatives and the impact measurement work.

Admittedly, I came to “Impact Measurement” somewhat late. What attracted me to the program was primarily the opportunity to weave together a collection of R&D supports around a social policy area, such that stakeholders were in a better position to achieve breakthroughs (see the Missing Link). However, impact measurement has now become an essential part of our team’s mandate — an anchor that has allowed our public innovation team to integrate well and positively influence the program’s trajectory.

Ultimately, the goal for our impact measurement work is to answer the question: “what works, for which youth, and in what context?” Of course, we’re also focused on improving the program’s capacity for legislated M&E to ensure accountability for public spending. But to achieve the impact measurement goal, we’ve progressively shifted our standard M&E functions towards helping the program learn, in a nuanced way, faster.

Speed and nuance matter. It’s what will allow for program design and implementation decisions to be as responsive as possible so that the program can better meet the evolving needs of diverse youth populations².

It’s taken several phases of work, but we now have most of the key elements and infrastructure in place to begin realizing this goal. If you’re curious about the whole backstory, read the next four sections, otherwise, skip to the Impact Measurement and Management Stack.

Goal/Strategy: A clear purpose, and aligned resources based on demonstrated models and leading practices

Data: Coherent logic model, measurement framework, and associated outcomes data

Data Infrastructure: Robust database, data linking and data analytics tools

Capabilities: Combining basic in-house skills with knowing when and where to seek help

Outside game: Supporting external evidence pipelines to complement insights from administrative data

Protocol (under development ): Established roles, procedures, and products across key teams

Feedback Loops: Injecting right-sized evidence and program design options, at the right times.

- This is the high-level view of current impact measurement stack -

Backstory — The Evidence Gap

High-performing social impact programs rely on evidence-based investment strategies to effectively and efficiently deliver on their objectives. In Canada, federal programs over a certain size must be formally evaluated every five years, generating evidence that informs major policy and program decisions, including funding levels.

Fortunately, the program that we’re in has the right administrative data and data-linking authorities for incremental impact assessments — one of the stronger summative evaluation techniques for generating high-quality evidence on the impact of a program or service. In our case, this lets the department’s evaluation team compare the difference in earnings between program participants and other youth from similar backgrounds with similar characteristics.

The summative evaluation capabilities at the program’s disposal are quite rigorous and have definitely informed policy advice, but they occur every five years. In a fast moving world with interconnected policy drivers, waiting five years for rigorous signals on what’s working, what isn’t, or identifying positive deviants simply isn’t fast enough.

There are also key performance indicators (KPIs) that are captured annually and publicly reported. These KPIs provide a quick snapshot of the youth served through the program and their immediate next steps: for example, the percentage of participants from equity-seeking communities, how many participants found jobs, or returned to school or training. These measures are important for informing operational decisions and provide officials with a course-grain understanding of whether the program is on track to meet basic implementation goals year over year. However, they may miss early signals on how a sector is evolving, or promising opportunities to shift strategy to improve outcomes.

Both the formal evaluation and KPIs are valuable for understanding, at a high level, who’s being touched by the program, some immediate results, and 5-year longitudinal results. However, on their own, these measures don’t provide enough timely information for teams to develop program or service adaptions that can optimally respond to new social, environmental, or economic shocks — or to shifting political direction.

This left a gap.

N.B. One way that I started to think about the M&E functions was that the annual KPI reporting was like the beginning of a multi-year journey where we counted our steps every year to make sure we were moving at the desired pace. The summative evaluation was then like a formal pause to assess how well we had reached our current position after 5-years of travel — ideally compared to a logical proxy. Between these two extremes, there was a lot of missing information. We didn’t have the insights needed to know which paths were the most efficient, how those paths were changing, which paths were optimal for different youth, or if there were radically different paths worth exploring.

The standard M&E tools at our disposal are good at yielding big, high-level, insights and conclusions. However, our teams are increasingly being asked to provide nuanced insights and options for program adaptations to respond to what’s happening on the (non-linear) ground.

N.B. Policy, program, service, regulatory, evaluation, operations staff are constantly searching for additional evidence to guide their work, but there is a growing consensus that the evidence pipeline that we need is rarely timely, is notoriously hard to operationalize, and is often too limited for triangulation. Neil Bouwer elegantly describes the challenge in Canada, here: Canada’s policy ecosystem is in need of major updates.

Backstory — Measurement to Learn

When I started, much of the discourse that I could find regarding Impact Measurement was focused on improving the ability to measure contributions to social, environmental, and economic outcomes using robust administration data, new outcomes database technologies, and advanced analytics techniques. While these capacities are critical, they often seemed to further enhance funders’ existing M&E and accountability functions without addressing the evidence gap that I was becoming increasingly aware of.

Luckily I rediscovered a book that had been recommended to me a few years earlier. In it, the authors described a 5-level impact measurement maturity model. The lower levels focus more on measuring outputs, while the upper levels prioritize facilitating learning and adapting strategy and operations.

This book was incredibly useful. It provided clarity on where we were and what we needed to focus on to enhance the program’s measurement capabilities. It also set the bar: if we wanted to reach Level 5, we needed to close the gap in our existing measurement systems. We needed to build the capacity to continuously identify patterns in the outcomes, get ongoing signals about changes in the policy landscape, continuously look for interventions and adaptations and organizational forms that are getting promising results, and use these insights to develop options to adjust program design and implementation to maximize outcomes. Essentially, Level 5 measurement is about learning.

This became an important heuristic for our team and, ultimately, the stated goal of our impact measurement work.

Backstory — Impact Measurement (and Management)

Reading Measuring and Improving Social Impacts and the associated AH HAs happened in early 2020, about 3- to 4-months into the job. Then the COVID pandemic happened. Needless to say, the program’s need to learn quickly ramped up alongside a larger portfolio of work for our team related to supporting the youth employment stakeholder community adapt to the social and economic turmoil (see COVID recovery portfolio).

Given the increased urgency, and by that time, a sense of a few domestic and international public sector organizations dabbling in this space, we commissioned an environmental scan of best-in-class and emerging impact measurement practices through the former MaRS Solution Lab. We were particularly interested in understanding the practices of organizations taking a portfolio approach to influence a system via the orchestration of many interventions going at the same time. One of the things that came out of this scan and the case studies was the notion of Impact Measurement and Management (emphasis, mine):

Impact: Any effects arising from an intervention. This includes immediate short-term outcomes as well as broader and longer-term effects. These can be positive or negative, planned or unforeseen.

Impact Measurement: The process of trying to find out what effect an intervention is having on people, organizations or their external physical, economic, political or social environment.

Impact Management: The ongoing process of measuring those outcomes, in context, to reduce the negative and increase the positive.

- Excerpt from the MaRS report -

At this time there wasn’t a consensus definition for Impact Measurement across federal government teams interested in this space (beyond the 5-year Incremental Impact Assessments), but this report and the emphasis on Impact Measurement and Management gave us tangible examples of what it could look like if the program’s existing M&E work also supported learning and an ambitious pursuit of better outcomes.

This impact measurement approach also fit well to our actual context as a larger program with social and economic objectives, with multiple levers/interventions to advance policy goals (i.e. funding 3rd party service provision, funding for capacity building and program R&D, convening power, etc.).

Fig. An amazing breakdown of evaluation models/practices by Sebastian Lemire. Hat tip to Mark Harris who posted this a few weeks back, and Sam Rye who amplified it. I’m not exactly sure what train we’re on, but it’s crossing multiple Evaluation lines.

Backstory — Getting to the Impact Measurement and Management (IMM) starting line:

Building our impact measurement and management (IMM) stack has been an iterative process over multiple years. A summary of the 5 primary phases of work is below so you get a sense of how we got to where we are now.

Though it was messy, each phase of the journey created tangible business value for the program and our leadership: these “wins” were critical to fuel the next round of work.

Phase 1 (Most of 2019) — Strengthening the evidence base by aligning the program’s outcomes measurement framework with international and domestic studies

During the program’s reform, our senior leadership was particularly interested in impact measurement. At that time, the Privy Council Office’s Impact and Innovation Unit (IIU) had a stream of work on building impact measurement capacity across federal government teams. Their services included access to technical experts (Impact Measurement Fellows) on temporary assignments with the federal public service, and a standing offer with Mission Measurement, an impact measurement intermediary and evidence aggregator working with government and philanthropy.

A project with the IIU was negotiated, which included commissioning Mission Measurement to add high-performing employment and workforce development interventions to their synthesized outcomes database: this database codifies the core elements of programs that have demonstrated the best outcomes for a specific policy objective, based on available literature and program evaluations.

With a clearer picture of the latest evidence, the responsible staff worked to adjust the program’s logic model and outcome indicators.

This investment was what really got the ball rolling in terms of improving how we use measurement to increase the impact of the program.

Phase 2 (Fall 2019 — Spring 2020) — Testing impact measurement opportunities given more rigorous outcomes data

Once the most relevant activities, outputs and outcomes for equivalent programs were clarified and integrated into the program’s measurement framework, the second part of the IIU/Mission Measurement project explored what could be done with this improved data.

Two use cases were tested:

  1. Outcomes Dashboards for Funding Recipients: Templates and processes were developed to create visual outcomes dashboards, useful for a variety of internal users. The goal here was to visualize relevant outcomes data in a way that was useful to policy staff looking for trends, operational staff managing funding agreements and monitoring for delivery challenges, and senior leadership to better communicate results, etc. Sample dashboards were created for a few funding recipients, using the new outcomes/performance measurement framework to highlight core project elements and results to date.
  2. Rapid Review of Funding Applications: In parallel to the department’s standard assessment process for funding applications, we worked with Mission Measurement to conduct a “rapid review” of approximately 100 anonymized funding applications, coding them based on ratio of planned activities aligned with the evidence base (High, Medium, and Limited alignment). The goal was to revisit these projects over time to compare the initial coding with actual outcomes.

N.B. Both these use cases were done behind the scenes, and not part of the formal assessment or departmental recommendations process. Separating this part of the work from the formal departmental processes allowed the organization to maintain the integrity of established practice, while also exploring different options to realize policy objectives in a more effective and efficient way.

I joined the team midway through Phase 2.

Phase 3 (Summer 2020 to Spring 2021) — Iterating on the work to date, and growing in-house capabilities

Phase 3 is when things really took off. The first wave of the COVID pandemic forced a temporary pause of this stream of work, but as the pandemic dragged on it became clear that disaggregated outcomes data was critical to inform responsive policy. It also became evident that some shifts to established workforce development practices were getting better results.

Initially there was pressure on our team to double-down on the previous impact measurement pilots, but we managed to advocate for an environmental scan of leading impact measurement practices before committing to any next steps.

While our executives were concerned that slowing down to commission an environmental scan might cost us momentum and risk the work, by this stage we had built a high level of trust, and thanks to the Measuring and Improving Social Impacts book (see Backstory — Measuring to Learn, above) we had crafted a tagline to express an overarching goal for the impact measurement stream of program work: Uncovering “what works, for who, and in what context.”

Landing on this precise, yet aspirational, goal helped us to secure buy-in from leadership and clarified the kinds of questions we should be asking to chart the next round of the program’s impact measurement work, which we had started to describe as a “Strategy”. For example, How should we build out our in-house capacity if learning is core to our impact measurement goal? What sources of data (thick and big) can we bring together to draw out new insights about how the sector is or isn’t working for youth? What are the optimal moments to introduce new evidence into the program-policy cycle?

When the environmental scan results came in and we had the chance to review and workshop implementation scenarios with other teams, we had a lot more clarity in terms of compelling work to undertake. (see Backstory — Impact Measurement (and Management), above).

By this time a number of other youth employment programs had also started to collect the new outcomes data from Phase 1. So we were also validating the quality of the new data, and initiated a rebuild of an outcomes database to house the inter-departmental data for analysis and reporting (the old database had too much technical debt).

Phase 4 (Fall 2021 — Spring 2023): Laying foundation to measure impact across common programs, and boosting access to timely evidence

Given the work to date, we had a better understanding of what core elements needed to be in place to achieve the program’s impact measurement goals.

First, while the outcomes data collection improvements from Phase 1 were solid, they needed more depth and standardization.

Given the complexity of the task and our team’s other responsibilities, we engaged the CSPS Surge Team to help bring relevant youth employment programs through a curated process to address data limitations.

By bringing the right partners (program, policy, data stewards, Gender-based Analysis Plus experts, etc.) together in a purposeful and systematic way, within 10-months we were able to itemize data issues, prioritize which issues to tackle, build solutions that 12 youth employment programs could support, develop supports for program teams to implement (a detailed logic model breakdown, a data dictionary, technical assistance, etc.), and get senior level approval and support to begin implementation³. This work was simply required to have high quality admin. data to build a richer understanding of the impact of these programs between, and as part of, summative evaluations.

Second, alongside getting our data right, we made a big push on building the database infrastructure needed to transfer, store and analyze this program data. By advancing the data, user requirements, data infrastructure, etc. in parallel we were able to streamline a lot of the work and make sure that we were building towards impact measurement infrastructure that would meet both standard M&E as well as our more ambitious goals. (All of this may seem like an odd mix of work for policy and program-policy staff but it’s a trend that is growing (see Moving Digital Upstream)).

Between the outcomes data and database work, we were making really great progress, but how it would all come together to help us make better policy recommendations was still very hypothetical.

To give us a chance to actually see what it would take to actually engage with our outcomes data for learning, we commissioned a small research project to help us evaluate the results of the “rapid review” of contribution agreement applications from Phase 2.

While we were never able to answer the question of whether outcomes could be predicted upfront, the project did uncover critical limitations in our data and data management systems that have since been addressed.

If you’re interested in leveraging your admin. data, engage with it early, and often: you’ll get clear signals on what you need to adjust in order to realize your goals (e.g. changes to your data, your M&E infrastructure, your processes, your capabilities, or all of the above.).

Third, to catch up to the leading and emerging impact measurement practices, we needed to find ways to pull in robust evidence (from both quantitative and qualitative sources) beyond the data collected through implementation of the program. We opted to test the utility of two approaches:

  • Portfolio Sensemaking: this pilot focused on using a collaborative approach to generate rigorous, timely, and actionable evidence derived from the experiences of a range of stakeholders.
  • Academic Partnerships: this pilot focused on working with one of the national social science granting bodies to increase the flow of information between academia, non-profits and the public sector — both to influence the research questions being explored, and to facilitate access to evidence by those who could use it.

Finally, we commissioned a pair of studies on Knowledge Mobilization (and Adoption) practices: one in a non-profit and the other in an employer context. These studies were to help us uncover options to support increased diffusion of stakeholder insights, as well as increase uptake by peer organizations.

This phase was incredibly busy for us. Fuelled by a two-year funding bump and senior executives who actively supported our work, we were determined not to waste the opportunity.

Fig. This image captures what our working hypothesis was at the time regarding what it would actually take to learn ‘what works, for who, and in what context’.
Fig. This image leads off Søren Vester Haldrup’s article summarizing his time leading the UNDP’s M&E Sandbox… This image also captures how our team was feeling at the end of Phase 4.

Phase 5 (Fall 2023 — ongoing) — Moving Past the Starting Line: Finding a Rhythm for a Level 5 Measurement/IMM Practice

Since Phase 1 we’ve had moments of Level 5 Measurement/IMM. For example, we’ve found evidence on new challenges emerging in the youth employment sector as well as promising adaptations, we’ve used admin. data for early signals on if the program was reaching into specific youth communities, we’ve taken what we’ve learned from many sources and injected new options into the program-policy cycle that are now being implemented.

All encouraging, however, these moments were ad hoc, not systematic.

But now with some key elements in place (a concrete goal, quality data, technical infrastructure, in-house capabilities, diverse streams of evidence, and multiple access points into the program design/implementation cycle), we are well-positioned to establish a systematic IMM practice for the program that operates continuously between 5-year summative evaluations.

This is the focus of Phase 5, and will hopefully result in an IMM protocol that specifies roles, procedures, products, and resourcing.

One crucial requirement for this protocol is that it integrates into existing work and align with current organizational expectations of the implicated teams. Without this, we’d need to convince enough internal stakeholders that working in this way better equips them to deliver on their core work, or we’d need to secure additional resources to run the new IMM capabilities on top of the existing M&E and research activities, as they are.

The importance of integration cannot be overstated. As with so many new public sector capabilities, until these institutions are better equipped to evolve, success will ultimately depend on our ability to embed the IMM activities within existing accountabilities, teams and workflows.

Even if we don’t fully land the IMM protocol during this phase, we now have a solid base of capabilities to explore other advanced impact measurement practices. For example, we’ve been discussing data linking opportunities with teams in our department that have access to national datasets, and teams at the national statistics office that are building new datasets to be used by researchers and policy staff. Our first attempt at linking data was fast, but didn’t meet our needs: the available data didn’t have sufficient granularity for what we were interested in. We are now exploring more granular microdata sets and are on track for another data linking project at some point next year⁴.

Impact Measurement and Management Stack: some foundations for impact measurement that informs responsive program design and delivery

Through Phases 1–5, we have stretched the program’s existing measurement, reporting, evaluation and research infrastructure to build a stack of supports that enable learning and the development of stronger policy recommendations, at pace. Here are the key elements, thus far:

Goal/Strategy

Our strategic intent is clear — uncovering “what works, for who, and in what context” — and mobilizing resources and activities to pursue the goal. Our strategy has been guided by demonstrated models and leading practices.

Data

We established a coherent logic model and associated measurement framework to generate the administrative data that serves both accountability and learning needs. For example, alongside KPIs and personal information needed for incremental impact assessments, the program now has standardized data that disaggregates youth communities and interventions, integrates data from different collection tools into a common record, includes user satisfaction indicators, and captures data related to the program’s R&D efforts and sectoral infrastructure investments.

Data Infrastructure

While the outcomes database project for multiple youth employment programs is still under development, we now have access to the necessary database tools for our program’s specific data. Time and resource permitting, these tools will allow us to investigate trends in anonymized datasets and link these to other federal government microdata.

Capabilities

Over the last four Phases, we’ve built internal capacity through recruitment and experiential learning. While we now have the skills needed to interrogate our data for insights and to turn those into program options, the most important competency we’ve developed is knowing when we need help, and who to ask.

Outside Game

Recognizing that the administrative and outcomes data generated by the program provides an incomplete picture of what improves or limits youth employment outcomes, we have worked hard to support the growth of external evidence pipelines.

Protocol

An IMM protocol is currently under development, but it will likely consist of a series of shared routines across teams, as well as templates that facilitate the transformation of fresh evidence into thoughtful and well delivered recommendations on program design and implementation.

Feedback loops

Our team is an in-house R&D and Impact Measurement unit, with a mandate to help the program understand and grow its impact. We’re fairly new: custom built to help drive the new initiatives launched as part of the program’s reform. Instead of doing this like an innovation lab on the periphery of the program, or a standalone R&D or experimentation service, we are responsible for a mix of standard program work as well as atypical work (the atypical work stemming from the program reform exercise, as well as work initiated by the team to explore new opportunities or respond to new challenges).

This position gives our team a seat at multiple tables that influence program design and implementation, allowing us to inject right-sized evidence and program design/implementation proposals at the right times.

Figure. Sketch of the program’s nascent IMM stack.

Big Lessons

Although we still have a way to go to in moving from moments of Level 5 Measurement/IMM to a continuous practice, here are the key pieces of advice that resonate most when I describe our work to other teams in the public sector or granting foundations:

1. It’s work, but not rocket science: By making targeted adjustments to existing program activities, we have been able to build towards a program equipped for dynamic learning. When we look at the IMM stack, we’ve introduced little “net new” work. What we have done is maximize the value of existing requirements, roles, routines, and tools via a phased exploration of what’s possible. Figuring out what adjustments to make was hard; implementing those changes was harder. But we managed to get it done in a large bureaucracy while juggling many other responsibilities.

With some policy cover, marginal upfront investments, a little technical support, ambition, patience, and luck, I think getting to the Level-5 Measurement/IMM starting line is an achievable goal for most social impact programs. Moving from the starting line to a new core practice, however, will likely require tough internal conversations and realigning expectations on what M&E activities should be prioritized versus de-prioritized.

If in-house impact measurement is simply out of reach and you don’t have recurring summative evaluations, just focus on getting your data right. Then use impact measurement services that are becoming increasingly available (e.g. in Canada we have Statistics Canada SDLE), or explore partnerships with peer-teams, organizations, or networks that have the necessary capabilities (e.g. in-house data teams, data and evidence consultancies, data collectives, academia, etc.).

2. “Thoughtful Implementation” was the Strategy: Throughout this blog post, I’ve used the word “Strategy”, but we didn’t have an upfront strategy in the typical sense — no theory of change, road map, governance table, etc. Instead, we focused on a) disciplined application of theory in our context, b) treating our M&E files as a mini-portfolio, and c) continuously testing new options to pursue our ultimate goal.

Essentially the formula was: Praxis (x) Portfolio Design/Management (x) R&D. When I talk about the “Strategy” I’m describing the path we’ve taken, and the immediate next steps. To our senior leadership’s credit, they tolerated a strategy that evolved one phase at a time. Though it definitely helped that we could articulate an objective, point to examples of what the final destination could look like, speak coherently about critical next steps, and show how those steps aligned with core work that needed to be done. At every phase, we also created tangible value.

If you or your team is building out an impact measurement strategy, do consider what leading teams are doing, but give yourself the flexibility to prototype some changes and make smart bets (see “trojan mice”) to inform your next steps. Just be sure to identify upcoming M&E or program lifecycle work and use those windows of opportunity to formalize what you’ve learned.

3. Don’t Skip the Portfolio Design/Management and R&D: In Phase 3, when we really started to build the IMM stack, we knew that there would be a lot of moving parts to manage if we wanted to stretch the program’s existing M&E activities towards more dynamic learning. For instance, disaggregating program outcomes data, updating the program’s data collection tools, conducting internal training, and delivering additional pilots to close known gaps.

Each of these individual efforts was interesting and important, but insufficient on their own; the real value would come from them working together. So, we needed to bundle most of the work as what we learned in one area triggered necessary changes in other areas. For example, change in policy direction and insights from internal training and pilots drove updates to the program’s logic model. Likewise, changes to data collection tools to account for service delivery realities led to adjustments to the program’s outcomes measurement framework and outcomes data dictionary.

Taking a portfolio approach has allowed us to manage both the trees and the forest. It also stream-lined our work, reduced duplication and dead-end tasks, and made it easier to explain our efforts to critical stakeholders.

Incorporating R&D into the portfolio was also critical. We continuously tested whether the work was leading to the production of new evidence, and explored opportunities to apply leading practices. This approach helped us stay focused on efforts that would yield a big return on our time and the program’s resources.

R&D is also a great risk mitigation tactic. You can rarely tell if something will work until you actually test it. Going back to the “Rapid Review” pilot, the small study during Phase 3 to test the predictive coding made visible a significant data collection challenge that we were able to resolve before a big funding call. Had we not addressed it when we did, it could have become a multi-year frustration for many teams⁵.

We’re all busy, and working this way requires additional focus and discipline, but it’s worth putting on your portfolio and R&D manager hat. This approach will save you significant time and resources, and could make the difference between realizing your impact measurement ambitions or falling short.

4. Talent matters: There’s been much discussion about re-imagining the composition of policy teams to include diverse skills and experience — digital, policy, program delivery, lived-experience, etc. This is now the mix for our team, and it has worked well in our context. For example, the policy expert and team members with lived experience help us think about what are the right questions to ask the data. The digital expert helps us think through structuring and managing program data, and the program delivery expert helps us think about the data collection and anonymization processes. Digital capabilities are especially important as you’ll probably need to do some database work to set your data up in a way that meets your needs.

If you don’t have this mix of talent in-house right away, gather a group of seasoned practitioners around you. While they may not fully appreciate your context, their guidance should be enough to help you pass through the “literacy gate” so you know what to do next, and what to avoid.

The only other thing worth emphasizing is the skills to share evidence, well. For us, this has looked like having the ability to transform the evidence into credible options or recommendations or advice (i.e. they are viable, desirable, and feasible — think UX/HCD), communicating critical concepts over time in digestible chunks, seeing and seizing windows of opportunity to present ideas. It’s also critical to build thought partnerships with stakeholders, so they are willing to consider your input or engage in discussions about it.

5. The quality of your evidence will have an impact on the quality of your policy advice: When this work began, our focus was on how to improve the use of the program’s outcomes data to inform policy advice. While critical, that data alone is insufficient. The program’s admin. data provides an incomplete slice of what’s producing or limiting outcomes. So it’s also critical to complement evidence derived from program and service delivery with evidence from other sources.

I know most public servants and others guiding investments into social, environmental, and economic outcomes are already drawing from many sources of evidence, but much of this happens episodically, typically at the beginning of a policy or program-design cycle. What we’ve learned is the importance of having access to multiple evidence pipelines — qualitative and quantitative — that cover various time horizons (immediate and longitudinal), are intersectional, and span different levels of the system (service-user, institutions, networks).

You won’t have all of the evidence when it’s time to make a decision or provide recommendations, nor will the evidence you do have be conclusive. But if you strive to be in constant “detective-mode”, when the moment arrives, you’ll be equipped with quality source material.

N.B. If you’ve read Moving Digital Upstream, you’ll notice some overlap between how we’ve built up the program’s digital and measurement capabilities.

Fig. A range of impact measurement models visualized in the excellent Griffith Centre for Systems Innovation blog post: Now we are all measuring impact — but is anything changing. Similar to the Evaluation Metro image, we’ve been trying to stretch the program from one into multiple impact measurement paradigms.

Getting to Better Outcomes

Our IMM work is to help the program learn well, and learn continuously, so that the program teams are equipped to make stronger policy recommendations on demand. All of this work also yields better annual reporting and stronger accountability for public investment through the traditional M&E channels. It’s a win-win dynamic that has allowed us to stretch the foundations of existing measurement and research infrastructure so that an IMM practice can emerge.

If we can maintain this momentum the program should be well-positioned to leverage measurement to achieve better outcomes, more quickly.

Looking to the future, though we will need to remain clear-eyed on the benefits and pitfalls of impact measurement (you really should read Now we are all measuring impact — but is anything changing), the work to date opens up some exciting possibilities.

Given the program’s investment in system-level interventions, I’m particularly interested in exploring outcomes frameworks that align well with system change models like the Berkana Two Loop, Field Catalysts, Mission-Oriented Innovation, Slow Lane Movements, Doughnut Economics, etc. I’m also curious about finding the right M&E blend for public institutions engaging in systemic investment. The team is especially excited about how to use the growing data linking capabilities to explore how Social Mobility, and the other socio-economic factors, influence outcomes.

We’re always on the lookout for inspiration, so if you know of public sector teams working in this space, let me know! (If you’re looking for inspiration, you really should check out the UNDP M&E Sandbox.)

P.S. If you have any questions based on our experience, feel free to reach out. Building these kinds of capabilities within public institutions and social purpose organizations is important, so happy to chat.

P.P.S. Special thanks to the following individuals, in alphabetical order, who have shaped how I see M&E, learning infrastructure, and evidence-informed decision making: Brock Auerbach-Lynn, Georges Awad, Gina Bell, Neil Bouwer, Kelly Campbell, Ian Capstick, Marianna Connors, Patrick Cyr, Glennys Egan, Christopher Duff, Donyah Farhat, Angie Fleming, Reuben Ford, Jenilee Forgie, Ashraful Hasan, Vince Hopkins, Jennifer Jackson, Cory Jansson, Indy Johar, Mary Kay Lamarche, Jean-Noé Landry, Myra Latendresse-Drapeau, Pam Lefaive, Michael Lenczner, Katharina Magdalena Wolff, Tom Mitchell, Karen Myers, Max Palamar, Pinja Parkkonen, Anil Patel, Giulio Quaggiotto, Vinod Rajasekaran, Andrew Reddin, Diane Roussin, Alex Ryan, Sarah Schulman, Nick Scott, Jason Sukhram, Supriya Syal, Anne White, Rachel Wernick.

[1] I was able to provide an overview of our R&D work, and R&D in social mission contexts more broadly, at an event hosted by The Australian Centre for Social Innovation: https://vimeo.com/817134115.

See here for the whole event, including Geoff Mulgan’s keynote: https://www.tacsi.org.au/our-work/hero-initiative/introducing-future-social-r-and-d-in-australia

[2] It’s not just youth whose needs are evolving — the social, economic, environmental, and technological landscape is more dynamic than it has been in the past. Public institutions need to adapt to keep pace. For Canadian public servants, if you haven’t yet, check out this Cascade Institute presentation on the polycrisis at the DM breakfast. If you want to dive deeper, the Sante Fe Institute, Dark Matter Labs, and the Long Now Foundation are great places to go to better understand the new reality that the public sector is being called to help societies navigate.

[3] For anyone reading this and getting nightmare flashbacks of clunky whole-of-government committees with multiple-levels of governance, terms of reference, many meetings, and hours of executive prep for each meeting we were simply too busy for a “business-as-usual” approach. This work was initiated and led at the working-level (though we did provide updates to a senior executive table that meets quarterly). Most of the working group meetings focused on tackling challenges and giving the Surge team the source material to work asynchronously with our team’s support. The group figured out an inter-departmental TEAMS service — before it was standard issue — to facilitate document sharing and maintain online discussions that didn’t vanish when the meeting ended, etc. There was a lot of Miro (virtual whiteboard) usage to support planning and product design, and online survey tools to capture ideas during meetings. Additionally, our core team of analysts with backgrounds in policy, digital, and lived-experience brought their unique perspectives and constantly tapped into their networks to inform the work.

Collectively, these approaches allowed us to systematically engage a wide range of stakeholders, encourage active participation, ensure a shared understanding amongst working group members, and build and iterate on solutions in record time.

[4] Data linking holds enormous potential to inform policy and program design, but we’ve been learning that teams must be able to navigate the constraints. For example, you need to be clear about which policy questions are worth asking — you can’t go fishing in a sea of data without some goals. Second, you need to understand the strengths and limitations of the datasets available for linking, as this will directly impact which policy questions can be explored. Lastly, the complexity of projects and bandwidth of the technical teams handling the data linking can vary widely; some projects can be completed in weeks, while others may take months or longer.

[5] I also love this story because it illustrates how the benefits and outcomes of R&D are often non-linear: we started this Rapid Review pilot with a specific goal, but along the way, we uncovered valuable insights that proved critical later on. In this case, identifying a significant limitation in the program’s data collection approach on — well before when it surfaced a year later during the normal M&E cycle. This gave us enough runway to address the issue before it became a major problem.

--

--

Jason Pearman
Jason Pearman

Written by Jason Pearman

Exploring how social mission R&D and public administration collide.

No responses yet