Hivemind Release — Data Quality for Text Annotations and more!

Riaz Karim
Hivemind
Published in
3 min readDec 12, 2019

We’re delighted to announce our final release for 2019 with key new features that dramatically enhance Hivemind’s text annotation capabilities, and integrate Hivemind tasks more seamlessly with your internal workflows.

Data Quality for Text Annotation

When annotating text, you can increase the quality and usability of the collected data by having multiple Contributors independently work on the same source material and combining their results. In this release, we’ve introduced three new methods to aggregate these results:

Union: Combine all the annotations from all Contributors.

Intersection: Only return the annotations that are present in all Contributor’s results.

Frequency: Return a count of the occurrence of each annotation across all Contributor results, subject to an optional minimum occurrence.

Advanced agreement options for text annotation

It’s also possible to apply additional parameters to these data quality methods to re-define what it means for annotations made by different Contributors to be deemed as agreeing with each other. For example, you may want to ignore case when performing comparisons, or ignore the difference between different variations of the same entity (E.g. Microsoft Vs. Microsoft Corp.) by applying an algorithm like Jaro-Winkler.

Further details are available in the Hivemind Docs.

Instance Annotations

We’re always looking for new ways to help you integrate Hivemind into your own offline workflows. With this release, we have made further strides in this endeavour by introducing an annotations functionality for instances.

The new instance annotation feature allows you to attach custom key:value pairings to an instance on creation, and update the value at any time after. One way you can use this feature to reduce complexity and storage costs is to mark instances that have been processed offline with an annotation designating as such directly on Hivemind. This could in practice eliminate the need for your maintaining a local database of instances.

This feature is currently only available via the API, and we’re excited to build tooling around it once we hear more from you about how it can be even more useful to your workflows.

API endpoints for the instance annotations feature

Other Improvements

In this release, we’ve improved the way Hivemind handles special characters and formatting in text annotation inputs. This means that Contributors can easily select the intended characters in bodies of text without having to use their keyboard and is an example of how we’re constantly working to streamline their work to improve the collection process.

Additionally, we’ve changed the CSV and Excel file downloads to include the instance agreement score of instances (E.g. 2/3 Contributors agreeing on an answer) for tasks using on-platform agreement checking.

--

--