Wine and Data Problem

Arunedwin Prasath
Jul 20, 2017 · 4 min read

I cannot find a better Quote to start with ……

“Data is a precious thing and will last longer than the systems themselves.”- Tim Berners-Lee, father of the Worldwide.

Data explosion and need for organizing data:

“Experts now predict that 40 zettabytes of data will be in existence by 2020”. 40 Zettabytes will raise-up new question on how it is going to be organized and utilized. Data is captured by every system across the enterprise and goes to the vineyard, sorry the graveyard. “Humankind has stored more than 295 billion gigabytes (or 295 exabytes) of data since 1986, according to a new report based on research by scientists at the University of Southern California “ — source from web.

Why we need to compare a wine processing to a data problem:

We make a juicy grape extract, store it for years in a vineyard storage facility and can taste it years later, the quality of the wine grows by the year and years later it becomes precious. But this is not the case with today’s enterprise data. We store the data in our datacenter’s, preserve it and we never use the gold reserve, These old preserved data speaks the past strategies and help the enterprises to make decisions. The same success story can be brought out and reused again.

Today internet produce a whole lot of open data. Intelligence is shared in a new paradigm using the world wide web. Many enterprise try to tap this information and use if for many analytical insights. We scrap lot of unstructured data from the web and do a lot of processing over the data. Yet it is hard to make meaning out of the data. How to link data is something comes quickly to every ones mind. The efforts we put in goes for a waste. I have seen many enterprise trying this problem in different approach but they have failed. So the base problem is how to unify open data available in web to use it for insights is something we are going to solve today.

Data Unification:

Data unification is trying to make all possible meaningful links and data should be ready to be explored. The end of the road is to convert variety of data to produce usable insights. its is one of the problem which companies are trying to solve for 30 years and is still not proven to be successful. Because we are not able to come with all possible connections between data. We can solve this semantically.

How do we link to different unstructured data:
We have taken two slices of data from wikipedia to explain our problem. One is a page of Narendra Modi ji and the other one is a wiki page of India. Now we all know that definitely there is a link between Modi ji and India. Lets see how we use graph to solve our problem. Run a nlp analyzer over the text data and extract RDF ( RESOURCE DESCRIPTION FORMAT) structure data, nothing but a graph. There are couple of open source tools to do this like apache Any23.

Converting to text to RDF:

Slice of a content from wikipedia Modi page: (weblink -https://en.wikipedia.org/wiki/Narendra_Modi)
Narendra Damodardas Modi (Gujarati: [ˈnəɾeːnd̪rə d̪aːmoːd̪əɾˈd̪aːs ˈmoːd̪iː] ( listen), born 17 September 1950) is an Indian politician who is the 14th and current Prime Minister of India, in office since May 2014. He was the Chief Minister of Gujarat from 2001 to 2014 and is the Member of Parliament for Varanasi. Modi, a member of the Bharatiya Janata Party (BJP), is a Hindu nationalist and member of the right-wing Rashtriya Swayamsevak Sangh

Slice of content from wikipedia India Page: (weblink -https://en.wikipedia.org/wiki/India)
In the 2014 general election, the BJP became the first political party since 1984 to win a majority and govern without the support of other parties.[168] The Prime Minister of India is Narendra Modi, who was formerly Chief Minister of Gujarat.

Converted RDF’s:

http://something.org/narendra-modi something:fullNameAs “Narendra modi”@en
http://something.org/narendra-modi something:born “17th september 1950”
http://something.org/narendra-modi something:country http://something.org/india
http://something.org/narendra-modi something:occupation “Prime Minister of India”
http://something.org/narendra-modi something:occupation “Chief Minister of Gujarat”
http://something.org/narendra-modi something:member “Bharatiya Janata Party”
http://something.org/narendra-modi soemthing:cardinalOrder 14

http://something.org/india rdfs:label ‘India’
http://something.org/india something:hasPrimeMinister http://something.org/narendra-modi

How did we processed the text data:
We are creating IRI’s out off entities and relating two entities using properties making the data unifications possible. And I can keep creating triples for the rest but this is more than enough for explaining the problem. We will be able to strip information from sentences and start answering questions like:
who is the prime minister of India?
when did Narendra Modi was born?
what is the cardinal order of Mr. Modi ji among the prime ministers of India?

How Do we create Questions:
We have done the BEAST PART, Now going to the beauty part we can create SPARQL queries, I am not going to explain more on sparql’s, (Please feel free to read this for understanding sparql:https://en.wikipedia.org/wiki/SPARQL)
select ?x {
?s something:occupation “Prime Minister of India” .
?s something:FullNameAs ?x
}

This query will return who is the president of India and I can keep writing queries for this all day. For organizing the graph we use ontologies - data dictionaries, used to explain each entity. Keep adding relations because semantics is a schemaless world bound by w3c standards, for web based RDF’s Schema.org stands as a standard.

What we achieved:

We have made a link of two different unstructured data from web, made meaningful links and pulled out the insights using sparql. I would call this a base camp to artificial intelligence, because the text queries can be easily converted to sparql queries unlike it was not possible to convert a text query to a SQL query. Using Sparql data can be extracted back from the graph store. We can start using some decayed data which we crawled and stored and been there for years.

)
Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade