The adjudication process in collaborative annotation

The next step to create high-quality training data, efficiently

Image for post
Image for post
Photo by Wesley Tingey on Unsplash

Labeling internally

A common shortfall of supervised systems is the availability of labeled data. Where can I find suitable datasets? is there enough data? how biased are these? is the label selection aligned to my needs?

Subject-matter experts team

The most common setup for preparing the ground for data quality is to build a team of subject-matter experts (SMEs). They should be trained iteratively to adhere to the set of annotation guidelines defined for your project.

Image for post
Image for post
Fig. 1. In this example, each document is annotated by a single annotator
Image for post
Image for post
Fig. 2. In this example, each document is annotated by two annotators

Inter-Annotator Agreement

Estimating annotation quality is often difficult. It requires a standard to validate your data. However, this standard might be ambiguous or complex. For example, labeling parts of speech, is a well-defined task, whereas classifying question pairs is ambiguous.

Adjudication

If multiple SMEs work on the same data, as a result, there are multiple annotation versions. We define adjudication as the process to resolve inconsistencies among these versions before a version is promoted to the gold standard. This process is manual, semi-automatic or automatic.

Manual

The manual case involves the creation of a single version that integrates the annotations of all SMEs, including the explicit representation of divergences. The reviewers resolve these differences. It is recommended to use such a process, especially at the beginning, with all the team resolving conflicts together to ensure the guidelines are well understood.

Image for post
Image for post
Fig. 3. The merged version contains the versions of both annotators: SME A, SME B. The reviewer or adjudicator resolves the conflicts (marked in red) to produce the final version

Automate

If our team is large enough and the degree of ambiguity for annotation tasks is limited, we can partly or fully automate this task using IAA.

Image for post
Image for post
Fig. 4. IAA calculations for an annotation project. Vega has the best IAA average (76,29%).
Image for post
Image for post
Fig. 5. Automatic adjudication based on the IAA for each annotation task. In this example SME A has the highest IAA for task A and SME B for task B. The result are the annotations for task A by SME A plus the annotations from task B by SME B

Conclusion

In-house labeling projects makes sense. You can control quality better and your data is easier to maintain. Depending on the complexity of the domain, to organize a team of SMEs within your organization is cost-effective.

Written by

Co-founder @tagtog_net . Building web interfaces to train #AI #NLP machines. Agile enthusiast. Living in Poland, Tricity. Father.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store