AI Case Study: Analyzing Canvassing Conversations with Fair Count in Mississippi

Kate Gage
Cooperative Impact Lab
10 min readMar 14, 2024

--

By: Jeremy Blanchard and Han Wang, with Kate Gage, Oluwakemi Oso of Cooperative Impact Lab, and Thomas Whitaker and Jeanine Abrams McLean of Fair Count

In the fall and winter of 2023/2024, Cooperative Impact Lab worked with a cohort of 13 organizations to support experimentation with AI tools ahead of and after the 2023 election. This post is part of a series of AI Case Studies documenting that work, highlighting lessons, best practices, and recommendations for organizations — especially those that organize and campaign — as they consider incorporating AI into their work.

Read our first AI blog post for more learnings: Unleashing AI’s Potential in Campaigns and Organizing: Lessons from the Front Lines

Background

Fair Count is a nonpartisan organization founded by Stacey Abrams that focuses on promoting voter education, fair and accurate counts in the United States Census, and initiatives that promote equity and social justice. Fair Count was a member of the Cooperative Impact Lab’s Generative AI Cohort in 2023–4 and received coaching and support to experiment with integrating generative AI into their strategies and practices. After receiving trainings on the basics of generative AI tools and technologies and discussions of challenges and opportunities within Fair Count’s model, CIL, and Fair Count collaboratively identified the following project as one that had the potential to leverage AI to improve Fair Count’s ability to achieve their organizational goals.

Problem definition: Get Out The Vote (GOTV) canvassers learn a lot of information in their brief interactions with potential voters, but current canvassing platforms, in this case, MiniVan, only collect basic details like voting likelihood scale. There’s a lot more qualitative information Fair Count or other canvassing organizations could be collecting and learning from; however, it’s hard to collect and analyze in a way that is convenient for the canvassers and staff.

Can AI enable the collection of insights from canvassers’ conversations with voters to inform organizing strategy?

Opportunity: Advancements in AI can allow for the collection of unstructured data in the form of transcribed canvasser voice notes and quick analysis of this information to provide insights into voters’ opinions, plans, and current sentiments. Before generative AI tools, an audio/text transcript solution like this would’ve taken months to develop and would only be able to answer a narrow range of questions (ex: sentiment analysis) about the transcripts from the canvassers. Now, using existing tools to answer any question about these large data sets is possible.

Research & Planning

Through the CIL AI Cohort, an Coach from AI Impact Lab, Jeremy Blanchard, met with Fair Count to understand their needs and design a solution. Fair Count came in with a relatively high level of understanding about AI tools and had already envisioned a possible pathway forward. CIL and AI Impact Lab provided the expertise in generative AI and overall guidance to help them create a proof-of-concept.

Research: CIL researched several software solutions for each step of the process: recording audio files, transcribing them, collecting the transcripts, filtering the relevant transcripts, and analyzing the transcripts using generative AI. We aligned on the tools we thought would work best for this proof-of-concept and helped Fair Count connect the systems for an initial trial run with their canvassers.

Solution & Impact

Workflow Diagram of Prototype Credit: Fair Count and Jeremy Blanchard

Proof-of-concept Workflow:

  • Recording: Collect short 30 to 90-second voice recordings from the canvasser after each door, responding to prompts like: Please summarize what the person said about what is motivating them to vote or not vote. What were their emotions like? Tool Used: Native voice note app on canvasser phone
  • Transcription & organization: Transcribe the audio files and collect them in a spreadsheet, along with the county the canvasser was in to allow for regional analysis. Tool Used: Fireflies.ai
  • Analysis: Use Claude or ChatGPT to ask free-form questions about the data we’ve gathered to learn about motivations for voting or not voting. Our team selected Claude for this test due to its policy not to use prompts and user conversations to train their model.

Testing

The Fair Count team tested this over a few different canvassing days, with eight different canvassers, and collected 120 voice memos. Most canvassers were willing to try this new step in their workflow, but some newer canvassers needed help with the additional step since they were still learning the basics. Most canvassers appreciated using the simple voice recording app that came with their phone, but some people didn’t like switching between MiniVAN and the voice recorder app after each door.

Analysis questions

Using an AI model for this analysis allowed us to include all the collected voice-memo data without needing significant data clean-up and the ability to use natural language to develop our queries, lowering the barriers for this project. If we had used one of the dedicated tools or user interfaces, we may have been limited to specific predetermined reports or required more structured or cleaned data.

Here are some sample test questions used for this experiment:

  • Summarize how people are feeling about the election.
  • What are some reasons people are not voting (or hesitant to vote) in the upcoming election?
  • What ideas would help encourage people to vote based on the information provided?
  • Has anybody brought up issues related to barriers that made it difficult for them to vote?
  • What are the differences in concerns about voting by county?

Results:

Question 1 : Can Claude (Anthropic’s Generative AI tool) can make sense of the data?

Yes! It was able analyze the transcripts and make sense of the information. From the text uploaded, Claude was able to provide:

  • County-level breakdowns of where people lived
  • A basic sentiment analysis of each county
  • An estimated turnout prediction based on those numbers.
Sample Prompt Questions Credit: Fair Count and Jeremey Blanchard

Question 2: Could Claude help identify where vote pledges might be inflated?

Yes! Based on the analyzed data, Claude highlighted Forrest County as a place where the self-reported vote intention might be overstated. It contrasted this to Coahoma County where people used more “make a plan” language around voting. Interestingly, Forrest County did end up as one of the lowest voter turnout counties in 2023 with just 34% turnout.

Sample Prompt Questions Credit: Fair Count and Jeremey Blanchard

Assessment & Insights

What worked & what we learned

  • It was a pretty impressive qualitative analysis!
  • Getting comfortable with back-and-forth conversations with Claude during analysis was critical. Asking follow-up questions or for itto explain the reason it drew a particular conclusion from the data was helpful for more insight.
  • This approach may help solve a big problem of data/story collection in organizing (not just canvassing)
  • The tech was relatively simple (no coding involved at all)
  • Based on our tests, the analysis accurately matched what was in the raw transcript data. We also did a rough quantitative analysis and confirmed that the numerical summaries it gave were close to accurate.
  • The prompts for analysis referred only to the context window of transcript data, which likely limited hallucinations, but this needs further investigation.
  • The Fair Count Team was very interested in asking questions about trends in the data based on geography (and differences between regions). This ability would allow them to adjust their canvassing scripts and regional campaign messaging based on the sentiments that are showing up in a particular area.
  • Claude gave a more nuanced, succinct, and engaging analysis than ChatGPT.
Sample Analysis Credit: Fair Count

What needs improvement

  • Audio transcripts were not linked to individuals, so the observations are generalized and not connected to a specific voter.
  • It was challenging to coordinate uniform styles of audio capture across canvassers to ensure that the style and tone of conversation summaries were similar. More detailed scripts/training were needed to ensure that direct quotes from voters were provided in a similar way.
  • Our tests indicate that everything is in place for the insights to be very useful, but we need more data to work with to confirm that hypothesis.
  • Fireflies.ai was a useful tool for quick transcription of audio files, but the raw audio lost some accuracy.
  • Figuring out how to make the analysis actionable depends on who the end user is. Is this a daily dashboard for executive staff? ChatBot for organizers? Tool for the data team to provide recommendations? A better definition of the end user will help refine the process.
  • Pretty clunky process to collect audio memos currently. Our solution was basic and required manual intervention from the Fair Count staff at many points in the process.
  • Further study is needed to ensure data privacy and security before proceeding with further development of similar tools.

Future Possibilities

This proof of concept of a pipeline from unstructured field data to analysis using generative AI went better than we expected. We were all impressed by the wide range of questions we could ask about the data and the level of accuracy and detail in the AI’s responses.

A modest investment of engineering hours in developing this solution and automating the steps that are currently manual would allow Fair Count to test this with thousands of recordings instead of hundreds. We’d then be able to see what kind of analysis the team could do after each day/week of canvassing and how that might shape the canvassing strategy that’s needed in each region as the election cycle progresses.

Highly interesting features for the next iteration:

  • Automate the currently manual steps (ex, gathering the audio files from each canvasser).
  • Connect the audio recording to specific voter data to gain access to demographic information like age, gender, race, and vote propensity for deeper analysis.
  • Create a custom interface to ask questions about the data and improve the voice notes
  • Include example questions so people learn what kinds of questions they might want to ask.
  • Filter based on what prompts the canvassers were asked to speak about in their voice recording.
  • To make the responses more accurate, include ways to filter the data by geography and voter demographics, creating the ability to segment the responses and reduce the amount of “noise” in the data.
  • Use AI tools inside spreadsheets like Claude for Sheets or GPT for Sheets and Docs to do some initial analysis and data cleaning on each response before providing the full spreadsheet for analysis might increase confidence in the results.

Additional feature ideas discussed:

  • Have canvassers have a two-way conversation with an AI bot while recording the voice debrief so it can ask you follow-up questions that might help evoke important parts of the story.
  • Develop detailed training processes and materials to prepare canvassers for this new tool.

Possible future applications

  • A “Soft report” that organizers could do as part of their day/week could be an interesting way to use this analysis. We could use this to determine how the canvasser is experiencing their work (instead of the current voter-focused approach). What if you had a 3-question prompt that organizers responded to daily or weekly that was audio-focused?
  • Help trainers design a training agenda for what they want to say to organizers and coach canvassers on improved conversational tactics during door-to-door outreach.
  • Communication recommendations overall — social media posts, email copy, scripts, talking points based on what we’re learning from voters at the door.
  • In a 501c4, PAC, or campaign context, you could test messaging about candidates, policy positions, and other messaging at the door with voters and learn quickly about their reactions.

Note: Claude struggled with some of the analysis, especially when it included mathematical calculations and seemed to extrapolate from smaller sample sizes. This functionality will likely improve as the models improve and as there are larger sample sizes of data. Some issues we encountered:

  • The rankings are sometimes incorrect, resulting in improper weighting of responses
  • Percentages are incorrect

With larger datasets or in situations where exact numbers matter, we recommend experimenting further with prompt structure and tooling for quantitative analysis. Analysis might be more reliable using in-spreadsheet LLM tools, such as Claude for Sheets, GPT for Sheets and Docs, or GPT 4.0

Note: We did a manual sentiment analysis on a couple of the counties to see if the AI model was hallucinating or not in the response it gave above.

  • Harrison — 7 positive, 8 negative/neutral (AI was off by 3 in the negative/neutral category compared to human analysis)
  • Forrest — 20 positive, 22 negative/neutral (AI off by 2 for both categories)

We recommend manual checks for this type of analysis to see the accuracy of the model. These counts weren’t off enough to deeply impact the analysis, but it was important to know the difference, especially if making decisions based on the results.

Thank you to our partners at Fair Count, Trestle Collaborative, and AI Impact Lab, Shamash Global, and Zinc Labs for their work and support on this project.

Please contact CIL at ai@cooperativeimpactlab.org or cooperativeimpactlab.org with questions or any follow-up.

--

--