Annotating Images & Markdown

Oct 13 · 3 min read

Label images and text in their original context

By Juan Miguel Cejuela👐Open Link to this article

A recent update of tagtog, gave support to annotate Markdown files. Therefore now images, nested lists, or code blocks are fully supported. This opens many new possibilities for annotation. Let’s focus on 3.

Image 1: Annotating News with Images & Markdown on

1. Annotate in Context 👁 (+ it just looks better 💅)

It's not the same annotating the original text in its original context than annotating whatever unformatted text your parser strips out. First of all, text in images and even the visual style convey meaning. Are we capturing these nuances in NLP systems right now?

What would you prefer to annotate? this?

NLP has different sources of bias: 1. The selection of the training data. 2. The biases of the annotators. 3. The inductive bias of the model. 4. How the task is designed overall. @eurnlp #eurnlp

or this?

Image 2: Tweet in its original context; better suitable for annotation

How about this? Note that some of the text’s meaning is lost without is complementing image (Image 3).

Image 3: Tweet’s text full meaning is lost without the image

For clarification, tagtog doesn't show tweets yet in their original formatting. We will have soon a specific presentation mode for tweets and you can follow our updates on Twitter @tagtog_net🐦. However, nicely formatted news like this post's header (Image 1), for example, are now perfectly possible.

2. Label Images 🖼

Several image annotation tasks are concerned only with assigning concrete labels to the images. Often the labels are binary, enumerations, or free strings. This is possible on tagtog with document labels, which in this case they refer to the images.

Moreover, several other NLP tasks are concerned with matching text to images. Take for instance the NLVR dataset on tagtog (Image 4), from the original NLRV dataset. In this dataset, the task was to determine whether the caption text (e.g. "There is a box with a blue circle, a black circle and a black square”) correctly described, true or false, the presented image. These types of annotations are now supported on tagtog.

Image 4: Labeling images and text associated to an image.

3. Annotate Markdown Documentation 🅜📝

With more and more documentation being written in markdown (from README’s, to comments, to even entire theses and books), it makes sense to label and reap this vastness of data.

CommonMark Logo
CommonMark Logo
CommonMark, the de-facto specification for Markdown.

How about annotating security flaws in code shared in StackOverflow? (Image 5).

Image 5: finding security threats in code posted in StackOverflow.

We will review the many possibilities for annotating code (and verbatim-like-styled text) in a future post.

In the meantime, hope you liked this one!

What would you annotate with markdown? 🤔

Need training data for #NLP? Find & create it for free on: 🍃tagtog

Are you on Twitter? 🐦Follow @tagtog_net🐦

👏👏👏 Clap if you like the post, and want to share it with others! 🧡💚


Written by

The text annotation tool to train #AI. Easy. 🔗

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade