Super Model: Kata Entity

Published in

Kata.ai Tech Blog

3 min readApr 22, 2019

One of the biggest hurdles in NL Studio on Kata Platform now is that there are only a small number of pre-built entities that a User can use. Even after we provide a general model that Users can easily use, they need to train it with their own data. Many Users don’t have a large and good dataset to train their NL. Our ‘inherit’ feature, which allows Users to duplicate a previously trained model, is still limited to several entities. To tackle this problem, we provide a pre-built model we call Super Model.

Super Model is a pre-built quality model that is maintained by Team Kata to provide a general purpose NL without training data. This feature aims to help our Users build their own NL easily and quickly.

Currently, our Super Model offers Kata Entity, a general entity tagger that can categorise words in a sentence into several entities. The entities that we currently provide are person, location, phone, email, number, datetime, currency, and units (area, length, duration, temperature, volume, weight). We think these entities are quite general since their semantics do not change much across domains. Kata Entity is now automatically added to every new project using NL Bahasa Indonesia.

For the dataset, we use NER annotator dataset which consists of thirteen tags. There are 12597 sentences in total in which we split into train, development, and test set with a ratio of 80:10:10, respectively. The evaluation is carried out using exact-match CoNLL evaluation scheme, in which an entity is correct only if it is an exact match of the corresponding gold standard entity in the data. For evaluation results, standard precision, recall, and f1-score metrics are further calculated using the exact match counts. Precision is the percentage of named entities found by Kata Entity that are correct. Recall is the percentage of named entities in the corpus that are found by Kata Entity. F1-score is the harmonic mean between precision and recall.

Using Kata Entity, Users can automatically label entities on sentences with a general entity tagger without providing any training data. Furthermore, Users would also benefit from regular model’s updates, which are aimed to improve the model’s accuracy. The updates are carried out by Team Kata, either by feeding new datasets or updating the model’s algorithm.

In order to maintain the quality and performance of our model, Kata Entity has limitation in the training process. The training data that Users provide to the model will not immediately be used for training. Our team needs to check the data before adding it to our model. That said, we still encourage Users to correct their data if they found any errors or mistakes as it would be a great improvement for our model in the scheduled update. As our model improves, every User would also get better prediction results.

In the future Team Kata is going to build more Super Models on different tasks to further help our Users in building their NL.

Super Model: Kata Entity

Written by Made Nindyatama