From Corpus to Multi-Label Classification

#3 Stakeholder Alignment

Bryan
3 min readNov 26, 2023

In this article we will introduce the Entity-Aspect framework and align with stakeholders on which labels to use to annotate part of the corpus. This step is crucial for training an effective classification model. Our focus will be on how to prepare and utilize this framework in a working session aimed at label selection.

Photo by Ashkan Forouzani on Unsplash

Prep for working session

For a productive working session, you’ll need to prepare:

  1. A sample of 10 to 20 unannotated records from your corpus.
  2. A proposed set of labels, complete with definitions.
  3. Annotated examples demonstrating the labels in use, minimum of 3 to 5 total but you may want to collect 3 per label.
  4. The labels organized into an Entity-Aspect framework, if applicable to your scenario.

Entity-Aspect framework

The Entity-Aspect framework is particularly useful if your labels pertain to multi-faceted subjects including products or services. It’s a conceptual model that organizes content around entities (like objects, people, places) and their aspects (attributes or features). This framework can adapt to various scenarios, from focusing solely on aspects of a single entity to exploring attributes across multiple entities, revealing trends and patterns. For instance, an entity that we care about that’s mentioned frequently in the data could be “Pressure Cooker” and the aspects may be “Price”, “Controls”, or “Auto On/Off” to name a few.

Partial examples of Entity-Aspect frameworks related to a hotel with a restaurant (left) and laptops and cameras (right) are below. These examples are based on SemEval-2014 Task 4 and SemEval-2016 Task 5.

Partial examples of Entity-Aspect frameworks for a hotel with a restaurant (left) and for consumer electronics (right). You can view the complete lists here.

Depending on the breadth or complexity of your scenario and the available data you may even want to introduce a hierarchy to your Entity-Aspect framework with multiple aspect levels. For instance, in the consumer electronics example we could have something like:

Once you have a proposed Entity-Aspect framework use it to categorize a sample of your data to ensure its viable.

Additional benefits of Entity-Aspect framework

Using this framework offers several advantages:

  1. Organizes existing labels efficiently.
  2. Easily extends to new products, services, or entities.
  3. Fosters a common vocabulary within your organization.
  4. Identifies missing but logical labels.
  5. Enhances downstream analytics through additional data filtering.
  6. Facilitates Aspect-Based Sentiment Analysis for a more granular version of sentiment analysis.

Working session with stakeholders

At this point we should have all the pieces of the puzzle that we need. To initiate the working session, have the PM recap the project’s background and clarify the session’s goal: to agree on a provisional set of labels and definitions for annotating a substantial portion of the corpus. Present the latest clusters/labels identified by BERTopic, demonstrate the labels in use, and address any questions or concerns. Introduce and discuss the Entity-Aspect framework, gathering feedback on simple versus Entity-Aspect labels. Finally, have stakeholders annotate a sample set of records using one of the two label sets and discuss any discrepancies to achieve alignment on an authoritative annotation approach.

Next, we will dive into the annotation process including calculating Inter-Annotator Agreement.

--

--