Image for post
Image for post
Ibrahim Jabbar-Beik

NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER

NLP News Cypher | 09.20.20

EMNLP and Graphs 😵

Quantum Stat
Sep 21 · 5 min read

☝ Persian art is pretty. Welcome back for another week of the Cypher. Yesterday, we made another weekly update to the and the . We added 10 datasets and 6 new notebooks. This update was a good one since we added PyTorch Geometric notebooks for graph neural networks in case you all are feeling a bit adventurous.🙈

BTW, if you enjoy this newsletter please share it or give it a 👏👏!

Detour: I’ve been experimenting with onnx runtime inference on BERT question answering. The latency is significantly improved with ONNX which is currently running on “okish” cloud CPUs, the latency range is between 170–240ms. Here’s the demo:

FYI, several EMNLP accepted papers were circulating this week for the November conference. Before we go there, here’s a quick appetizer from the paper “Message Passing for Hyper-Relational Knowledge Graphs” which compares the traditional knowledge triple vs. a hyper-relational graph.

Image for post
Image for post
hype

Use the Force LUKE (preprint not out yet 😥)

declassified

GNN Resources

Found this thread from Petar Veličković (DeepMind) highlighting top graph neural network resources, enjoy:

NeurIPS Fun n’ Games:

This Week

Dialog Ranking Pretrained Transformers

TensorFlow Lite and NLP

Indonesian NLU Benchmark

CoDEx

RECOApy for Speech Preprocessing

Survey on the ‘X-Formers’

Dataset of the Week: ASSET

Dialog Ranking Pretrained Transformers

Another one accepted at EMNLP from Microsoft Research: using transformers (GPT-2) to figure out whether a reply to a comment is more likely to get engagement or not. Pretty interesting huh! Their dialog ranking models were trained on 133M pairs of of human feedback data from Reddit.

So what does it really do? Here’s an example from their demo: For the statement“I love NLP!”, if you were to respond with “Here’s a free textbook (URL) in case anyone needs it.” this is more likely to be up-voted than the response “Me too!”. (meaning the former will have a higher ranking score)

Additionally, their colab allows you to run several models at once to distinguish:

updown... which gets more upvotes?

width... which gets more direct replies?

depth... which gets longer follow-up thread?

Colab of the Week

Thank you to author Xiang Gao for forwarding, you can also find it on the Super Duper Repo ✌…

GitHub:

Paper:

TensorFlow Lite and NLP

From their blog post this past week: there are now new features in TF Lite with regards to NLP models: They have new pre-trained NLP models, and better support for converting TensorFlow NLP Models to TensorFlow Lite format.

FYI, their TF Lite Task library has 3 APIs for:

  • : classifies the input text to a set of known categories.
  • : classifies text optimized for BERT-family models.
  • : answers questions based on the content of a given passage with BERT-family models.

Keep in mind these are models that run natively on the phone (aka do not need internet connection to the cloud server).

Indonesian NLU Benchmark

Check out the new Indonesian NLU benchmark. They include a BERT-based model, IndoBERT, and its ALBERT alternative, IndoBERT-lite. In addition, the benchmark also includes datasets for 12 downstream tasks regarding single-sentence classification, single-sentence sequence-tagging, sentence-pair classification, and sentence-pair sequence labeling.

And finally, a large corpus for language modeling containing 4 billion words (250M sentences)🔥🔥.

Paper:

CoDEx

More from EMNLP 😎:

“CoDEx offers three rich knowledge graph datasets that contain positive and hard negative triples, entity types, entity and relation descriptions, and Wikipedia page extracts for entities.”

In addition, they also provide pretrained models to be used on the LibKGE library for link prediction and triple classification tasks.

The total data dump has about 1,156,222 triples.

GitHub:

RECOApy for Speech Preprocessing

RECOApy is a new library that offers devs a UI that helps to record and phonetically transcribe data for speech apps in addition to grapheme-to-phoneme conversion. Currently, the library supports transcription in 8 languages: Czech, English, French, German, Italian, Polish, Romanian and Spanish.

GitHub:

Survey on the ‘X-Formers’

The new model architecture dubbed by the Google authors as ‘X-Formers’ (e.g. Longformer and Reformer) are the new and very memory efficient transformers that have come on the scene in 2020. In this paper, the authors describe a holistic view of this architecture, techniques, and current trends.

Paper:

Dataset of the Week: ASSET

What is it?

A dataset for tuning and evaluation of automatic sentence simplification models. ASSET consists of 23,590 human simplifications associated with the 2,359 original sentences from TurkCorpus.

Sample:

Image for post
Image for post

Where is it?

Every Sunday we do a weekly round-up of NLP news and code drops from researchers around the world.

If you enjoyed this article, help us out and share with friends!

For complete coverage, follow our Twitter:

Image for post
Image for post

Towards AI

The Best of Tech, Science, and Engineering.

Sign up for Towards AI Newsletter

By Towards AI

Towards AI publishes the best of tech, science, and engineering. Subscribe with us to receive our newsletter right on your inbox. For sponsorship opportunities, please email us at pub@towardsai.net 

By signing up, you will create a Medium account if you don’t already have one. Review our for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Quantum Stat

Written by

We Build NLP for the Bravehearts ✌ quantumstat.com

Towards AI

Towards AI is a world’s leading multidisciplinary science publication. Towards AI publishes the best of tech, science, and engineering. Read by thought-leaders and decision-makers around the world.

Quantum Stat

Written by

We Build NLP for the Bravehearts ✌ quantumstat.com

Towards AI

Towards AI is a world’s leading multidisciplinary science publication. Towards AI publishes the best of tech, science, and engineering. Read by thought-leaders and decision-makers around the world.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface.

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox.

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store