Introducing Wisely

Your tool for text analysis and data extraction

Toufic Yammine

Published in

Empathic Labs

7 min readAug 12, 2020

Wisely’s Logo

Hello reader,

We are Cheryl Sarrouh and Toufic Gerges Yammine, Computer and Communication Engineering students at the University of Saint-Joseph Beirut. For our final year project, we have chosen to work on a subject in collaboration with the HumanTech institute of HEIA Fribourg, Switzerland.

Throughout this article we will be displaying the main components of our project for all of you to read and enjoy.

Hopefully, we will leave you with something to reflect upon.

The Problem with Textual Data

Humans, being the most advanced species on earth, have created an even more complex and diverse way to communicate and share information. With a total of 6500 languages, this information that is mainly present in textual form is not exploitable.

Text is highly unstructured by nature and we use it in every interaction. By speaking, tweeting and sending messages on various online platforms, we are generating a huge amount of data daily, out of which only 21% are structured and can be used to produce significant and actionable insights.

Only 21% of the textual data we emit is exploitable.

Most of the technologies we use today rely on verbal interactions with machines. Siri, Alexa, Google Assistant, everything from voice recognition to machine translation, spell check and voice text messaging, can be hugely improved with a proper exploitation of the textual data we mentioned earlier.

However, intelligence and value within a company is done through data analysis and machine learning. In order to define working models, different steps are required: clean, format, annotate and store the data, then train and evaluate the model, to finally predict outcomes based on new data. A lot of work can be simplified and streamlined. This project aims at building tools that simplify these processes.

Natural Language Processing

The science that deals with the structuration of the textual data that we previously mentioned, is Natural Language Processing or what is known as NLP.

NLP is a branch of Artificial Intelligence whose purpose is to analyze, structure and find meaning in text and speech. NLP deals with subject such as machine translation, sentiment analysis and information extraction from text. There are many sub-tasks to NLP, the two addressed in this project are Named Entity Recognition and Natural Language Understanding.

Thus, Our Solution

The need for a platform permitting textual analytics techniques for all, becomes essential. This is where Wisely comes in handy.

In a nutshell, Wisely provides two of the most used sub-tasks of Natural Language Processing: Named Entity Recognition and Natural Language Understanding. Using our platform, a non-technical user can import their own dataset, do the necessary treatments and export the results for future usages. The results will be the dataset entries joined by the entities in each line for NER, or by the entities and intents in the case of NLU treatment.

This article has the intention of helping you get a better understanding of how Wisely works by giving you its implementation details from all the aspects.

Data Import

Our first task was to prepare the datasets uploaded by the user for treatment.

The user specifies the dataset’s name, language, description and type.

The types we support currently are either WhatsApp conversations or dialogues like debates for example. The whole is saved in an editable table in the “My datasets” tab.

WhatsApp conversations start with the date, the username then the message that we need to extract. Users can import their own WhatsApp conversations by selecting a chat then clicking on “Export Chat”.

Whereas debates start with the name of the speaker, then the message.

Named Entity Recognition

NER, also known as entity identification, entity chunking and entity extraction is a sub-task of information extraction that seeks to locate and classify named entities mentioned in unstructured text into predefined categories.

The entities we support on the platform are Person, Organization, Location, Date and Other.

We provide automatic annotation for our users, using a basic model, and for that we have used the SpaCy framework that we found to be the easiest to use, the most efficient, and the fastest in terms of getting results according to multiple studies that has compared it to other available frameworks.

The user also has the option of improving the annotation by manually highlighting certain words that the basic model might have missed. The model will therefore be trained to detect these new entities for future annotations thus improving its accuracy.

Natural Language Understanding

NLU is a subtopic of natural-language processing in Artificial Intelligence that deals with machine reading comprehension. There is considerable commercial interest in the field because of its application to automated reasoning, question answering, archiving, and large-scale content analysis.

To know more about NLU, we would recommend this article written by our very own Jacky Casas.

Every data entry’s intent is specified in addition to highlighting the intent’s relevant entities. The user has the possibility of correcting the intent thus improving the model’s accuracy.

Like NER, the first time the user uses our NLU functionality, they will be using the basic model. Each model in NLU is defined by its list of intents.

The user adds the intents and each intent should have a name, a language, a description and a list of expressions that is essential for it to be detectable by the model.

We have imported some intents into the platform as a base for the user, and they are saved in the “List of intents” in the “NLU” tab.

For this functionality we have decided to use RASA which is an open-source AI framework that can be easily customized. Also, RASA can be run on our servers in our own environment without passing through a third party. This distinction ensures that the data stays in the platform, and one does not have to go through several servers in order to do what they want, which is really convenient for the user.

Data Export

Finally, the user exports the results that are stored in a dataset containing the original phrase, the entities in the case of NER, and the intents and their entities in the case of NLU.

The training data is exported in JSON format and the original dataset can be exported (if the user wishes to re-save it) in “txt” format.

What’s to Come

This project is a modest approach to the topic of pipelining an AI process. However, Wisely‘s journey is not over yet.

While the platform as it is enables useful functionalities, the field of NLP is broad, and we have only addressed 2 of its sub-tasks: NER and NLU. This doesn’t fully cover all future applications like sentiment analysis, machine translation and more.

For NER, only the 5 most basic entity categories (Person, Location, Organization, Date and Other) are now available on the platform and the user can still not add their own entity category and start using it to retrain models and annotate.

For NLU, the results largely depend on the intent’s expressions’ lists, and the availability of many expressions provided by the user: the more the platform is used for NLU, the better the models and the intent recognition will get.

Finally, data on our platform is mainly imported, converted to JSON and exported in JSON. This part can be also developed to include more formats like csv and BSON.

Before We Leave You …

We would like to thank everyone involved in this project for this enriching and eye-opening experience. Working on Wisely, supervised by supportive tutors and instructors, will forever stay with us leaving a positive mark on both our professional and personal lives.

We hope you enjoyed this article, and that it helped you grasp the idea of Wisely and how you would use it for your upcoming projects.

Best of Luck,

Cheryl Sarrouh and Toufic Yammine.

Demonstration Video