A short introduction to tagtog, text annotation made easy

🍃tagtog
Jun 12 · 3 min read

By Jorge Campos

The challenges of Machine Learning (ML) start with collecting training data. First, labeled datasets are scarce. Second, the increasing complexity and changing nature of linguistic nuances, such as in humanities, healthcare or finance, require the constant knowledge and verification from subject-matter experts (SMEs). In the context of natural language processing (NLP), this knowledge comes in the form of text annotations.

tagtog is a collaborative text annotation platform to find, create, and maintain NLP datasets efficiently. Accessible on the Cloud and On-Premises.

Collaborations between data analytics/AI professionals and SMEs often fail. This is partially due to the lack of accessible tools, which could allow SMEs to participate in Name Entity Recognition (NER) or text classification tasks. To bridge this gap, tagtog was designed as a collaborative annotation platform with an easy-to-use interface.

Three entities annotated, two annotations are overlapping

Creating training data on tagtog is as simple as highlighting text. In addition, you can associate relations, attach attributes to entities, or classifying the whole document. Annotations might be done both manually and automatically.

Automatic annotations reduce the effort required to produce labeled datasets. There are two methods available:

- Dictionaries: import or create collections of terms and extend them during the annotation tasks.

- ML: tagtog learns continuously from your annotations to generate precise predictions out of the box. If preferred, an external ML model can be plugged into the platform. SMEs review the ML predictions creating a continuous learning loop to train and keep the model up to date.

To quickly bootstrap annotation projects, tagtog supports several file formats natively. It enriches the annotating experience, eliminates unnecessary parsing steps, and allows users to annotate directly over PDFs, import PubMed articles, HTML, CSV, source code, or even Markdown files. For tighter integration, an API is available to import annotations and files, export annotations and metrics, and search.

Malicious code flagged on tagtog

To track annotation projects and data quality, tagtog measures the progress of the project members along with their agreement with other annotators (Inter-Annotator Agreement). Simply spot biases, unbalanced classes, or oversampled data by checking the distribution of your annotations.

Inter-annotator agreement matrix. It contains the scores between pairs of users. For example, Vega and Joao agree on the 87% of the cases.

I hope this helped. Please let me know if you have any questions or feedback. You can find more tutorials about this text annotation tool here or following our blog.

Documentation: http://docs.tagtog.net

At 🍃tagtog.net we aim to democratize text analytics.

👏 👏 👏 if you liked the post and want to share it with others!

🍃tagtog

Written by

The text annotation tool to train #AI. Easy. 🔗tagtog.net

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade