Introducing Gecko, an Open-Source Solution for Effective Annotation of Conversations

Golan Levy
Gong Tech Blog
Published in
4 min readJul 7, 2021

Automatic speech recognition (ASR) is at the core of what we do at Gong. Our Revenue Intelligence platform empowers sales leaders to close more deals and manage their pipeline better as we capture customer interactions, analyze what was said and deliver data-driven insights. Building upon the tremendous advances ASR has made in the past decade, we can now process human conversations using ML and NLP algorithms easier and more effectively than ever before. However, most of these algorithms still rely on huge amounts of annotated data. That’s why we are proud to present Gecko, a new open-source tool we developed at Gong for annotating human conversations.

Meet Gecko

Gecko (github.com/gong-io/gecko) is an open-source tool for the annotation of the linguistic content of conversations. It can be used for segmentation, diarization, and transcription. With Gecko, you can create and perfect audio-based datasets, compare the results of multiple models simultaneously, and highlight differences between transcriptions. Gecko is a standalone web-based JavaScript application that runs on both desktop and mobile devices without requiring a server, making it easy to use and update.

Gecko’s user interface

The Gecko interface, which was designed to be clean yet interactive, integrates media player and editing capabilities. The main view features a waveform display of the audio file, a video player display if a video file was uploaded, on which the segmentation and speaker identification is overlaid and color coded. If a transcript was uploaded, it is synced with the audio so that the word currently heard in the audio playback is highlighted. You can zoom in and out of the waveform display and use the auto-center button to automatically center the waveform on the section currently playing. The Segment Labeling box shows the list of labels and allows you to add additional labels.

How to Use Gecko

To use Gecko, you’ll need an audio or video file and one or more files with annotations of segments. Gecko supports various file formats (such as the ones usually generated by speech-recognition frameworks): RTTM, CTM, JSON, SRT and TSV. You can upload several annotation files simultaneously in order to compare multiple models.

What You Can Do with Gecko

Label segments
You can use Gecko for a variety of segmentation tasks, including Voice Activity Detection (VAD), diarization, and speaker identification. With Gecko, you can label the speaker in each segment (automatically color-coded) or label a segment as a sound event such as music, cross-talk, or whatever else you’d like. You can also set start and end times for segments and add or delete segments.

Compare annotations
To easily compare between various models (for example, the ground truth with an output of a diarization system or the results of multiple diarization algorithms), you can upload multiple annotations to Gecko. You can then edit the input annotations, including speaker segments and words in the transcripts. And you can always use the “undo” feature to go back.

Refine automatic transcripts
Instead of manually transcribing the results of an ASR system from scratch, with Gecko you can save time by refining the results and labeling the dataset. Gecko highlights the word heard in the playback and allows you to edit it to improve the quality of the transcription.

Compare transcripts
Gecko makes it easy to compare two different transcripts by presenting the differences between them in a table and identifying insertions, deletions, and substitutions, as well as discrepancies, which you can search or hide if irrelevant. With both transcripts in front of you, you can listen to the audio and identify the correct transcript or enter the correct text if neither is right. Gecko can also generate a report showcasing the comparison.

Read video and subtitles files
Gecko can read video files, as well as read and generate SRT files, the standard format for subtitles, which is supported by most video players.

See Gecko in Action

Check out this video we created to present Gecko’s core capabilities:

Wrapping Up

We want to share Gecko and help serve an ever growing community of enthusiastic users. After all, we know that many others encounter the need for annotated conversations and one of our company’s leading operating principles is to create raving fans anywhere and everywhere. Therefore, Gecko is already available at https://gong-io.github.io/gecko, and we are continually updating it and enhancing its capabilities. For example, since its launch in September 2019 at INTERSPEECH 2019 in Graz, Austria, we’ve added support for annotation of videos and subtitle creation. Our goal is to continue making Gecko richer to better serve professionals across the industry and academia. Try it out and let us know how you’re using it — we’d love to hear from you!

If you use Gecko for your published work, please cite:

@inproceedings{Gecko2019,
Author = {Golan Levy, Raquel Sitman, Ido Amir, Eduard Golshtein, Ran Mochary, Eilon Reshef, Reichart, Omri Allouche},
Title = {GECKO - A Tool for Effective Annotation of Human Conversations},
Booktitle = {20th Annual Conference of the International Speech Communication Association, Interspeech 2019},
Year = {2019},
Month = {September},
Address = {Herzliya, Israel},
Url = {https://github.com/gong-io/gecko/blob/master/docs/gecko_interspeech_2019_paper.pdf}
}

--

--