Additional Ideas for Making Stronger NER Formatting Models
In the PyData Israel workshop we recently, learned how to use RNN and CNN networks with Word embeddings to make a Named Entity Recognition Model to automatically bold, italicize and underline your text.
If you didn’t catch the workshop check out the amazing slides and repo by uri goren below.
Contribute to urigoren/nlp_ner_workshop development by creating an account on GitHub.github.com
Now that we’ve trained our baseline model here are some areas that you can explore to improve the model on your own time.
1. Replace Pretrained embeddings with Contextual Embeddings such as BERT or ELMo
The Big-&-Extending-Repository-of-Transformers: PyTorch pretrained models for Google's BERT, OpenAI GPT & GPT-2…github.com
2. Combine Embeddings with Character Level, CNNs or RNNs for handling unseen words
In a single gist, Andrej Karpathy did something truly impressive. In a little over 100 lines of Python - without…eli.thegreenplace.net
3. Combine Linguistic Features with your Embeddings
spaCy is a free open-source library for Natural Language Processing in Python. It features NER, POS tagging, dependency…spacy.io
4. Add Self-Attention Mechanisms to your RNN
Based on Chiu and Nichols (2016), this implementation achieves an F1 score of 90%+ on CoNLL 2003 news data.towardsdatascience.com
5. Add Beam Search To Your Decoder
6. Try annotating more data
TLDR: This post walks through how to deploy Doccano on Azure Web Apps in order to collaboratively annotate text data…towardsdatascience.com
These should provide some great next steps for your journey into NLP.
Additionally if the field interests you check out the following posts:
As previously highlighted in my Beyond Word Embeddings Series, 2019 is going to be an exciting year for natural…medium.com
This series will review the pros and cons of word embeddings and demonstrate how to incorporate more complex semantic…towardsdatascience.com
A primer in the neural nlp model archticture and word representation.towardsdatascience.com
If you have any questions, comments, or topics you would like me to discuss feel free to follow me on Twitter.
About the Author
Aaron (Ari) Bornstein is an avid AI enthusiast with a passion for history, engaging with new technologies and computational medicine. As an Open Source Engineer at Microsoft’s Cloud Developer Advocacy team, he collaborates with Israeli Hi-Tech Community, to solve real world problems with game changing technologies that are then documented, open sourced, and shared with the rest of the world.