Analytics Vidhya
Published in

Analytics Vidhya

Concepts Extraction: How I learned to stop worrying and love multilingual data

How to make text data from different languages understandable

Let’s dive in!

import jsondef get_data(path):
return json.load(open(path, 'r'))
data = get_data('articles.json')

First steps with the API

pip3 install PyNeutralNews
from PyNeutralNews import Clientclient = Client("<email>", "<password>")
from collections import Counterdef get_concepts(text, lang=None):   res = client.nlp.semantic_analysis(text, lang, concepts_properties=["titles.en", "titles.auto"])   concepts = Counter()   for concept in res.concepts:        titles = concept.properties["titles"]        title = titles.get("en") or titles[res.lang]        concepts[title] += concept.weight    return concepts
>> get_concepts(data['ko'][0], 'ko')
Counter({'United States': 0.16974641714195385,
'President (government title)': 0.11700783103451891,
'Justice Party (South Korea)': 0.11516047368626284,
'Facebook': 0.0977350464727538,
'Diplomacy': 0.09208318129951651,
'Washington, D.C.': 0.09057665492040855,
'Mass media': 0.07374152731976248,
'German reunification': 0.06379742089362807,
'Literature': 0.06115040296658773,
'Ambassador': 0.06047045207677463,
'Human': 0.058530592187832624})
def get_all_concepts(corpus):
concepts = {}
for lang, data in corpus.items():
print('get concepts from', lang)
for i, article in enumerate(data):
if (i + 1) % 10 == 0:
print(i, '/', len(data))
break
res = get_concepts(article, lang)
for concept, weight in res.items():
if concept not in concepts:
concepts[concept] = (0, 0)
c, w = concepts[concept]
concepts[concept] = (c + 1, w + weight)
return concepts
concepts = get_all_concepts(data)
import matplotlib.pyplot as plt
import numpy as np
concepts_occ = {k: v[0] for k, v in sorted(concepts.items(), key=lambda item: item[1][0])[::-1]}def plot_concepts(concepts, limit=10):

fig, ax = plt.subplots(figsize=(20, 10))
ax.bar(np.arange(limit), height=list(concepts.values())[:limit], tick_label=list(concepts.keys())[:limit])
plot_concepts(concepts_occ)

What’s next?

--

--

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store