Talk on robojournalism — Introduction to TecnoNews

Image for post
Image for post
https://www.youtube.com/watch?v=nt8gVxvLq9g&feature=youtu.be&t=6801

Talk lectured on June 2018 at Universidad Abierta Interamericana (UAI) https://uai.edu.ar/ at the first Journey of Artificial Intelligence and Natural Language Processing (JAINLP)

Media coverage of news produces huge amount of information. Those interested in media cannot afford to read all what is being produced around breaking news or main topics in today’s press. Human processing of data is prohibitive for dealing with tons of news in a timely manner.

Our product is called SentiLecto, a Natural Language Understanding engine applied to read news in Spanish and Portuguese press (and soon in English press) from all around the world in order to understand text the way native speakers do by discovering actionable insights among tons of news. …


SentiLecto is Natural Tech’s NLU engine. This Aspect-based Sentiment Analysis solution yields a highly fine-grained representation for the sentiment values involved in each opinion. Unlike other approaches, this solution can deal with polarity shifting in the same sentence (‘I like chocolate but I hate strawberry ice-cream’), within embedded clauses (‘Norwegians, who are an aggresive People, export the exquisite herring’), or even onto the very same word (‘Somebody who wasted a chance to do something’ means that person did something bad about something good). SentiLecto better represents the premise whereby the entities involved in the opinion are syntactically mapped onto SVO (subject-verb-object) slots for their sentiment assignments: ‘Mary hates John’ (2 entities but only the object has a negative presentation) vs. …


Perhaps you have heard the term ‘sentiment analysis’ and wondered what it is all about. Perhaps you have a vague idea of what it means and would like to learn a little bit more in depth. If that is the case, please read on.

Image for post
Image for post

Nowadays, the amount of text that is produced around the web is enormous, so much so, that its strategic analysis for different purposes, — ranging from business analytics of various industries to political campaigns — , becomes a difficult, if not impossible task. …


In this new post, we are going to talk about how SentiLecto compares to two well-known NLU engines for text in Portuguese: Google Natural Language, and Spacy across three basic syntax tasks: subject extraction, object extraction and passive voice sentences role extraction. We chose these two APIs because they are among the very few products that offer adequate full parsing for Portuguese.

Image for post
Image for post

We created a battery of sentences for each task, taken from the web portal of BBC news in Portuguese. We hand checked every single output and awarded marks even if it was not the same as the expected input, as long as the answer was plausible. …


Many news articles are published on a daily basis. And yet, in hindsight, some of them are not particularly relevant and quickly fade away, whereas others are of utmost importance. Moreover, some pieces build on previously published ones, forming relationships that might be entangled and complex. Information is power, so how do we separate the wheat from the shaft in a way that is painless and quick?

Image for post
Image for post

At NaturalTech, we’ve pondered how our cutting-edge technology might help us in this quest. We wanted to show you which articles from different media outlets are related, in the sense that they cover the same event, and how relevant the event is because it has a lot of repercussion in the media. …


If you go to EntretenimientoBit, a news blog or augmented newsroom where 200 quality posts are published per day, out of 300 sources, you will be able to witness TecnoNews’ powerful rewriting algorithm in action. The articles that it publishes are created by merging and enriching different coverages from various media outlets. How does it do it? The purpose of this blog post is to give you an overall idea of how it works.

Image for post
Image for post
News generation algorithm

The pipeline of the algorithm starts with a news text, we’ll call it T0. When it enters the system, it is categorized, tagged and geolocalized. With this information, T0 is checked against the existing news articles in the data base to see if it has something in store that cover the same event. If the news article doesn’t appear to be related to any stored article, it is labeled as trivial (although this can change if a new article with the same coverage enters the pipeline and the process is triggered again). We use several criteria to determine if two articles refer to the same events: the title, the content (syntactic cues, synonyms, etc.), …


In this post, we’ll make a qualitative analysis of the output of four Natural Language Understanding APIs: IBM Watson, Google, MeaningCloud and SentiLecto. We will make comparisons across different tasks: complex syntactic phenomena, named-entity recognition and sentiment analysis. Our results show how SentiLecto, NaturalTech’s Natural Language Understanding engine, consistently produces quality text analytic output that outperforms the other APIs.

Image for post
Image for post

No matter what industry you are in, written text is a crucial component of all aspects of life. To learn more about how you can leverage SentiLecto and earn an edge over your competitors, contact us.

If you want to know how SentiLecto performed in basic syntax operations, read our previous post. Those metrics are important because the resolution of more complex operations depends on the resolution of simpler phenomena. …


Knowledge graph generated from tons of related news at SentiLecto project
Knowledge graph generated from tons of related news at SentiLecto project
Knowledge graph generated from tons of related news at SentiLecto project

TecnoNews is a module that makes possible to identify related texts by using SentiLecto’s capabilities to identify and normalize different references of the same facts. It uses linguistic analysis performed by SentiLecto in order to make possible to granularly implicate and connect text in big data sets of texts (like news, business communications, etc.)

SentiLecto is being used to automatically generate this newsroom https://entretenimientobit.com with more than 200 quality posts (including images) from 150 feeds on a daily basis. Every post is enriched with new information, reports and knowledge graphs.

For example: Robojournalism for breaking news: report on terrorist attack in New Zealand, generated by TecnoNews powered by SentiLecto https://natural.do


Image for post
Image for post

Today, we are going to compare how some of the most popular NLU APIs perform when confronted with basic syntactic challenges. Why is this important? Being capable of resolving the syntax of a sentence is usually the step that preludes more complex tasks, such as topic modelling and sentiment analysis. In other words, if the performance in this task is poor, it is usually a strong indicator of less-than-stellar performance in tasks that take syntactic roles as the input for their analysis.

Image for post
Image for post
Entry point to SentiLecto NLU API at http://dev.natural.do/docs

We are going to compare SentiLecto (powered by NaturalTech), IBM’s Watson, MeaningCloud and Google’s APIs across three basic syntactic tasks: subject extraction, object extraction, and syntactic role recognition in sentences that are written in the passive voice. …


Image for post
Image for post

Keywords: LegalTech, Information Extraction, text mining, statistical methods, n-grams, named-entities recognition (NER), big data, legal documents

1. NLP and LegalTech

Legal industry is other vertical where NLP technologies have been flourishing during recent years. This vertical is also known as LegalTech. The areas of growth in LegalTech focus on:

· Providing tools or a marketplace to connect clients with lawyers

· Providing tools for consumers and businesses to complete legal matters by themselves

· Data and contract analytics for e-discovery of insightful relationships

· Automation of legal writing or aspects other substantive aspects of legal practice

Technological applications — in contract management, e-discovery, and other high-volume areas– are standardizing, automating, and ‘productizing’ what were once labor-intensive tasks performed by lawyers at law firms. …

About

NaturalTech

NaturalTech is a technology company with a specific goal: make computers understand natural language the way native speakers do.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store