Additional Ideas for Making Stronger NER Formatting Models

In the PyData Israel workshop we recently, learned how to use RNN and CNN networks with Word embeddings to make a Named Entity Recognition Model to automatically bold, italicize and underline your text.

If you didn’t catch the workshop check out the amazing slides and repo by uri goren below.

Now that we’ve trained our baseline model here are some areas that you can explore to improve the model on your own time.

1. Replace Pretrained embeddings with Contextual Embeddings such as BERT or ELMo

2. Combine Embeddings with Character Level, CNNs or RNNs for handling unseen words

3. Combine Linguistic Features with your Embeddings

4. Add Self-Attention Mechanisms to your RNN

5. Add Beam Search To Your Decoder

6. Try annotating more data

These should provide some great next steps for your journey into NLP.

Additionally if the field interests you check out the following posts:

If you have any questions, comments, or topics you would like me to discuss feel free to follow me on Twitter.

About the Author
Aaron (Ari) Bornstein is an avid AI enthusiast with a passion for history, engaging with new technologies and computational medicine. As an Open Source Engineer at Microsoft’s Cloud Developer Advocacy team, he collaborates with Israeli Hi-Tech Community, to solve real world problems with game changing technologies that are then documented, open sourced, and shared with the rest of the world.