A quick summary of my(and many other) talks at KGC-2019
This week, School of Professional Studies at Columbia University’s Morningside campus in New York City hosted Knowledge Graph Conference. I also had an invite to speak about my recent research work and I meet various researchers and practitioners of Knowledge Graph. Considering, this was a two-day conference, I was surprised by the diverse group of speakers and attendees they were able to gather.
Let’s talk about the talks…
I really enjoyed some of the talks, learned what other people in the field are doing and also got some ideas for extending my research work. Let’s talk about some of them, I won’t go into much of the details but my personal highlights and takeaways from them. Let’s start with my talk and another one form my colleague form AccentureLabs.
Staying continuously compliant
My talk was about using advancement in Natural Language Understanding and Knowledge Graph to create a semantically connected knowledge base of regulatory press-release and alert businesses about future actions and possible mistakes so that they can stay continuously compliant.
- An army of people with very specific domain knowledge is continuously working to keep businesses compliant
- A data model can represent the event’s actor and their relation
- Event-specific information can be captured using domain-specific entity extractors and semantic role labeling methods
- Advancement of NLP and KG can be leveraged — semantically connecting domain specific facts with specific event information
Knowledge Graph for Customer 360
Colin Puri from AccentureLabs gave his talk with Joe Pindel of Pitney Bowes. He spoke about our recent collaboration with Pitney Bowes on Intelligent Customer Service using Knowledge Graph. It was a great example of AccentureLabs doing co-research and working with clients.
- Knowledge graph 360 can help with a much more holistic view — know a little more about the complaints, promotions or connect the patrons to the most relevant service provider ASAP
- Knowledge graphs help us understand complaint context and guide us to better customer interactions- help you lower the wait time on a support call
WikiData is not about Facts
Founder of Wikidata, Denny Vrandečić also gave a great talk about wiki-data, how it works and what it is for. I also enjoyed talking to him on various topics during break time. A very approachable person. My takeaways from his talk:
- No matter what language of content users are editing in wikidata, the resulted should remain the same in all the language
- wikidata links more than 4000 databases and more and more databases are connecting to it
- We don’t need to understand language all the time — we can extract information even without understanding the language — a reason for optimism
- Knowledge graph gives us — a very connected multilingual world
In the end, it is just a mapping problem
Dieter Fensel from Online talked about importance mapping to get higher accuracy in a knowledge graph. We focused on data quality: data is important
- We need both correct and incorrect examples — using NLP for knowledge graph creation, using both successful dialogue and unsuccessful dialogues are important
- Garbage in — Garbage out
- For more accuracy: 95% — 99% of knowledge is created using mapping
- Evaluating knowledge graph for the correctness and completing is also important
- Just getting the knowledge is not enough, we also need to deploy it and deployment is going to be use case specific
- In the end: it is just a mapping problem
Tom Plansterer from Astrazeneca talked about the need of fairness in data. No matter what kind of data we work with, it should be FAIR.
A FAIR data is:
We can’t keep gathering the same data again and again in various ways.
Knowledge Graph != Product Graph:
Every product has a story
Subhabratha Mukherjee spoke about Amazon product graph, various methods they developed ( Joint Relation Inference with Dual Attention, Distantly Supervised Knowledge Extraction from Product Profile), challenges and future research directions. I really liked the use of Joint Relation Inference with Dual Attention work to understand what kind of relations will get more prominent, “Steven Spielberg the Director should have more prominence in results than Steven Spielberg the Actor”
- Just extracting the knowledge isn’t enough, we gotta clean it too
2. Char CNN gives higher recall than bi-directional LSTM in OpenTagger
3. Joint reading over manual knowledge graph and OpenIE extraction. Only relations will be learned
4. Labeled data is always the bottleneck. We need to put more efforts into unsupervised and active learning based approaches.
Deep models also need humans
Alfio Gliozzo from IBMResearch talked about various ongoing research efforts in the direction of relation extraction and corrections. My takeaways:
- Extracting relation is difficult. In general — very low recall
- Unwary relations can be a solution. Combining Unary and Binary relation improves recall
- Given the public portion of PermId, can you recognize the Private portion of PermId
- Concepts of word analogy can be used for relation extraction — as there is implicit relation in each word analogy
- We also need to improve our relations, correct them manually. Deep model are not the solutions all the time
Being realistic is difficult. Where we are and where we want to go
Another great presentation by Joshua Shinavier from Uber. He didn’t go very technical but shared a handful of practical lessons he learned managing tons of data at Uber. He talked about the hype cycle and various graph working in synergy as a single Uber Knowledge Graph.
- It is a path built on messy data — use and promotes standards
- We are not all ontologists: semantic web matters
- Controlled vocabularies and Metadata graph both work in sync — gives the compound relationship
- Time spent on understanding and modeling the data can help us scale faster in the long run
- knowledge graphs: static graph, real-time graph, analytics graph (with graph edge embedding), metadata graph, algebraic property graphs
One person’s public data is another person’s private data
Dean Allemang of WorkingOntology spoke about the sensitivity of data. How the indexing of the same data makes it highly public to highly confidential. What’s the difference between a phone-book and a reverse phone book?
- A private dataset and a public dataset can help us control how we want to use the data
- FIBO is more of metadata, it doesn’t say a lot about any specific data. It gives us the attribute of the data. It is the metadata
- A data model can help us extend the data in a sustainable way
In my opinion, it was an excellent conference. Kudos to Thomas Deely, Francois Scharffe and others to make it possible. It had a great lineup of speakers covering topics such as NLP, KG, semantics, data-modeling and various knowledge graph vendors. I am waiting for the videos to get online.
Have you booked your ticket for next year or not? You also get great clicks of NewYork skyline.