The Validation Process

Published in

Synesis One

4 min readOct 22, 2022

Overview:

Synesis One crowdsources the domain specific knowledge needed to power the Mind Expression natural language processing engine. Through our train2earn app, users (called builders) draw upon their tacit knowledge and creativity to come up with ‘utterances’ (the technical term in linguistics) to express different ways a customer might make a request such as booking a flight. But not all utterances become part of the AI’s mental map. They must first go through quality control. The task of evaluating utterances falls to our validators, whose job is to accept or reject the utterances supplied by the Builders. In what follows, we’ll explain what happens to utterances once they enter the system and how they are evaluated. Our hope is that this information will help builders improve their validation rates (and earnings!).

The Process

There are four fundamental questions which Validators ask when validating a given utterance:

(1) Is the relation correct?

(2) Is the utterance relevant to the domain and the topic subject?

(3) Is the utterance unique in terms of sentence structure?

(4) Is the utterance natural-sounding?

The questions are hierarchal rules of thumb, which can be expressed as a decision flow chart (figure 1). They are organized sequentially in order of importance. That is, making sure the utterance is in the correct category (general, specific, entailment) takes precedence over whether it sounds natural: an awkward sounding sentence with correct relation is acceptable, while a perfectly worded sentence with incorrect relation is not. For more information on the different kinds of utterances, check out our data creation guidelines here.

Figure 1: Decision Tree Diagram for Utterance Validation

Rejection Criteria

1. Relation issues

Utterances with incorrect relation (i.e. under the wrong category) should be rejected. For example, putting utterances which are more specific than the topic subject under the General category. This criterion is the most crucial out of the four. Note: Some utterances may fall under both specific and entailment categories.

2. Relevancy issues

Utterances which are irrelevant to the topic subject should be rejected. By “irrelevant”, there are two main possibilities: Incorrect intent and Misinterpretation of the topic subject or domain

Example of incorrect intent (too context-free):

Example of incorrect intent (too context-limited):

Example of misinterpretation of the topic subject or domain

3. Pattern diversity issues

Utterances without pattern diversity (relative to other submitted utterances) should be rejected. In the case that a set of repetitive utterances pass all other criteria, only the first submitted utterance of a given sentence pattern will be accepted based on first come, first served. More general utterances are preferred; for example “I want my food now” is preferred to “I want my eggs now”.

4. Naturalness issues

Utterances with untypical or non-standard grammar or spelling variations, and other “unnatural” qualities should be rejected. Exceptions include chatspeak, or the genre of informal language you would expect in conversations via social media platforms. Note that this criterion has the lowest priority.

Validation Rubric

To summarize, our validators review each utterance based on the following rubric:

Each utterance is reviewed by multiple validators, none of whom know how the others voted on a given submission. The goal is to make the process as objective as possible. Let’s go through a couple of examples to show how the rubric is applied.

Example topic subject:
What is your program about?