How to Build a Great Meeting Summarizer App with Indexify

Published in

Tensorlake AI

4 min readJun 4, 2024

Last week, we introduced Indexify, a new open-source project for building scalable and reliable pipelines that extract structured data and embeddings from unstructured data, such as documents, videos, and audio. Imagine being able to instantly transcribe, summarize, and deliver insights from meetings to enhance collaboration and streamline team decision-making. This is a prime example of where Indexify shines.

Although LLMs and Automatic Speech Recognition (ASR) models have made basic transcription accessible, summarization and question-answering for speech-based data still pose numerous systems engineering challenges for developers. A typical workflow for summarizing meetings involves:

Gather meeting recordings.
Transcribing these recordings.
Summarizing the transcriptions.
Generating final notes from the summarized transcriptions.
Making the summaries and transcripts searchable.

This sounds like a straightforward model however, there are a few considerations before we get there.

Native Architecture for Building a Meeting Summarizer

Typically, LLM frameworks suggest writing applications where the data flow is sequential.

Ingest
Process and Extract
Retrieve

While it’s easy to prototype this on a notebook, it doesn’t map to production workloads for the following reasons —

Ingestion is throttled by resources consumed by extraction or limited by the resources available on a single machine.
Extraction on a single machine is limited when running multiple models in parallel (speech transcriptions, diarization, embedding, summarization).
Retrieval can be slow during extraction as it is compute-heavy.

Production-Ready Meeting Summarizer

To create a production-ready version of this application, we require the following —

Concurrent Ingestion: Hundreds of meetings may end simultaneously as one another, and we need to be able to ingest them without any additional latencies.
Parallel Extraction: Ensure extraction finishes within a given Service Level Agreement, regardless of how many meetings finish simultaneously.
Concurrent Retrieval: This requires a dedicated API reading various data stores.
Compute-Bound Workload Offloading: Extractions must be offloaded to a real-time batch processing engine so that I/O-bound tasks such as ingestion and retrieval are not throttled.

Indexify makes building production-ready applications easier by —

Running extraction workflows asynchronously using a built-in real-time batch processing engine.
Distributing the extraction tasks in a cluster.
Enabling infinite ingestion scalability by leveraging blob stores and having dedicated processes to handle I/O.
Ensuring retrieval happens in dedicated processes, reads directly from storage, and is never blocked by extraction workloads.
Offering plug and play functionality with various models for handling any data type or extraction in a pipeline.

Ingestion Pipeline — Declarative Extraction Graphs

With Indexify, you would create a pipeline capable of handling these steps. We refer to these pipelines as Extraction Graphs. An Extraction Graph consists of one or more Extractors that transform unstructured data using models or other algorithms and then pass the processed data to another extractor or directly to a storage system.

name: 'meeting_notes_processor'
extraction_policies:
  - extractor: 'tensorlake/asrdiarization'
    name: 'diarizer'
    input_params:
      batch_size: 24
  - extractor: 'tensorlake/chunk-extractor'
    name: 'transcription-chunks'
    input_params:
      chunk_size: 1000
      overlap: 100
    content_source: 'diarizer'
  - extractor: 'tensorlake/summarization'
    name: 'summarizer'
    content_source: 'diarizer'
  - extractor: 'tensorlake/arctic'
    name: 'transcription_index'
    content_source: 'transcription-chunks'

Once you have created a declarative definition of a pipeline, you can set it up once, and Indexify will continue to wait for new meeting recordings and continue extraction.

from indexify import IndexifyClient, ExtractionGraph

client = IndexifyClient() 
def create_extraction_graph():
    with open("graph.yaml", "r") as file:
        extraction_graph_spec = file.read()
        extraction_graph = ExtractionGraph.from_yaml(extraction_graph_spec) 
        client.create_extraction_graph(extraction_graph)

Ingestion

From there, you can upload files to the Extraction Graph, and the graph will start transcribing the audio chunks and embed them so that they can be searched for.

content_id = client.upload_file("asrrag", "interview.mp3")

Ingestion of new meeting recordings is never blocked on the throughput or latency of extraction. Indexify can horizontally scale to keep up with any amount of volume. The scheduler will reliably orchestrate the extraction graph on available hardware and write the extracted information into storage.

Auto Scaling

It is possible to auto-scale the ingestion servers and the extractor clusters by looking at metrics related to ingestion and the pending extraction tasks in the scheduler. That allows you to complete extraction and summarization of meetings within a fixed time period.

Retrieval

This is the part closest to your users and the presentation layer. Using the retrieval APIs, you can retrieve extracted data from your application from Indexify.

Retrieval has two modes —

Search — You can search embedding-based indexes when a pipeline emits embedding in one of the steps. In the extraction graph we created above, we produce embedding at the last stage transcription_index. You can search the index using K-NN search to get relevant chunks of the transcription.
Get Extracted Content — You can get the raw transcripts of the meetings from transcription’ stage and the summarizer stage

Take Aways

In this post, we show how to create reliable production-ready pipelines for a data-intensive application by —

Defining declarative extraction graph for processing speech by applying Speech, LLM, and embedding models.
Distribute data processing in a compute cluster, which can auto-scale.
Leverage Indexify as a retriever for applications built in any LLM or application frameworks such as Spring Boot or React.

Code — https://github.com/tensorlakeai/indexify/blob/main/docs/docs/examples/asrdiarization_rag.ipynb

Website — https://getindexify.ai

Discord Community — https://discord.gg/4eS5yqr9kR