Writing code for Natural language processing Research

Hady Elsahar
Nov 5, 2018 · 9 min read

“Code is available upon request”, “Authors promise ..”, a repo that only contains model.py .. we all know... no need to explain further.

Image for post
Image for post
Beeeeeeeeeeeeeeeeep….

In First day of #EMNLP2018 Joel Grus, Matt and Mark from Allen AI institute presented the tutorial that was arguably the one that attracted the most interest amongst all.
I like this type of tutorials, which discusses new issues in the research community, so much more than having a catwalk of SOTA models across a specific task. A similar tutorial was given in IJCNLP2017 about how to make a good poster and how to give an understandable presentation.

In this tutorial, they summarize their best practices & lesson learned while writing code for NLP research and from the development of the Allen NLP toolkit.

Disclaimer: I am trying here to quote Joel, Matt, and Mark as much as possible. There are few points where maybe I am not so accurate (sorry for that). Most importantly, there are other few points that I don’t entirely agree with — yet I see their point, especially in the first part of the tutorial considering code redundancy when prototyping. But now I leave my opinions aside.

Start:

Image for post
Image for post

Tutorial Part 1: Prototyping

1) Writing code quickly

  • Get a baseline running, start from a reusable modular library (Allen NLP, Fairseq, Sockeye, Tensor2Tensor … ) or an implemented paper that is easy to read its code and run it (good luck finding that..).
  • Sometimes however, you need to start from scratch if nothing around fits your needs.
  • In this phase don’t over-engineer your code and DON’T try to reduce code duplication, copy first your code and refactor later, this guarantees that you will have something running quickly.
Image for post
Image for post

This still obliges you to have a readable code, so you can understand and maintain. here are some useful tips considering that:

  • <<<SHAPE COMMENTS>>> on tensors for easy debugging later
Image for post
Image for post
  • Write code for people not machines: write long comments describing the non-obvious logic
Image for post
Image for post

Minimal Testing:

  • Do minimal testing (but not no testing): “If you write tests that check experimental behavior this is a waste of time because this is subjected to be later changed”
  • However, Write testing for data prep. code. This will be used to preprocess and generate batches and will be independent on the model adjustments so it is better to check that it is working correctly. (e.g. write tests to make sure that the code for reading and doing evaluation on the SQUAD dataset is working correctly).
Image for post
Image for post

Hard-code, only parts you are not focusing on.

  • This makes controlled experiments quite easier (for you and people reusing your code later (if any …) ).
Image for post
Image for post

2) Running experiments:

  • The easiest way to do that is putting in experiments’ results in a spread sheet.
    Note which version of the code was used to run which experiments using Version control.
Image for post
Image for post

Controlled experiments: test only one thing at a time (do ablation tests)

Image for post
Image for post
  • Don’t run experiments with many moving parts.
  • Change one thing each time while keeping everything constant. This is Important to control experiments and show what made the performance improvements.

How to write controlled experiments:

  • Make everything controllable as a param to the model.
  • Load those models from a configuration code or a running script

3) Analyzing Model Performance

Here is a list of some useful metrics to Visualize:
1) Loss, Accuracy
2) Gradients: mean, std, actual update values
3) Parameters: Mean, std
4) Activations: Log problematic activations

Look at your data:

  • don’t do print statements at the end of the training
  • Instead: Save model checkpoints then write a script that given the model checkpoint it runs some queries against your model.
  • It is always better to put that in a web demo this makes it a lot easier to debug the model and interact with it visually. Moreover you can show some of the model inner insights (e.g. attention matrix) with each of the given examples in your web demo.
Image for post
Image for post

Build your data processing: so that you read from a file but also your Models are able to run without given labels i.e. the model doesn’t crash if it cannot compute a loss, same code can run for Train and for demo.


Tutorial Part 2: Developing Good Processes:

Image for post
Image for post
but you need some of them :)

2nd part of the tutorial was about how to develop a good development process in your experiment to be re-runnable everywhere.

  1. Use source control: “Hope you do that already”
  2. Code reviews :
    * Find bugs for you
    * Force you to make your code readable
    * Writing clear code allows the code review to be discussion to model itself not the code
    * Prevents publishing a code and later-on finding bugs in it that can make your results incorrect — This happens btw and can lead you to retract one of your accepted papers check this out
  3. Continuous integration (build automation)
  • Continuous integration always be merging into a branch
  • Build automation always be running tests with each merge
Image for post
Image for post

4. Testing your code (revisiting testing again)

  • Unit testing: is an automated check that part of your code works correctly.
  • If you are prototyping what should you write tests for?
    * Test the basics: test forward path direction things that doesn’t rely directly on the results
    * e.g. Assert that batch size the same size you expect
    * e.g. All the words in the batch are in vocabulary
  • If you are making a reusable library what should you write tests for?
    <<EVERYTHING>>
    * models can train, save and load
    * that it is computing backprop gradients
  • Test Fixture: Running tests on large datasets everytime you merge is kind of big and slow. instead keep a tiny amount of data that is in repo and run those tests on them.
  • Use your knowledge to write clever tests: Attention is hard to test because it relies on parameters: e.g. testing attention models are hard but you can hack that test to be if you put all attention weights equal would that be equal to averaging all the input vectors

Tutorial Part 3: Writing code for Reusable Components

Some abstraction (abstraction is simply making a generic class for this type of classes) has proven useful but some haven’t.

So, you don’t have to abstract everything but rather find the right amount of abstraction to compromise between re-usability and time spent in coding abstractions. This is basically for components that:
* You are going to be reused a lot: e.g. training a model, mapping words to ids, summarizing sequence to a single tensor.
* That have many variations: changing a character or a word to a tensor, changing a tensor into a sequence of tensors, summarizing a sequence of tensors to a tensor (attention model, avg, concat, sum ..etc)
* Things that reflect our higher level thinking (text,tags, labels, ..) this allows you to abstract your model as much as possible.

Best practices from Allen NLP

  • Models: are extension of torch.nn.module uses same model abstraction from pytorch.
    * Vocabulary: all_tokens, tokens2id, id2token ..etc
    * Instances:
    the instances in the dataset, used to create vocabulary
    * Instances contain Fields: (source text, tgt text ..etc)
Image for post
Image for post

and some more abstractions …
* TokenIndexer:
* DataReader:
* DataIterator: Basic iterator: shuffles batches
* BucketIterator: groups instances with similar lengths
* Tokenizer
* TokenEmbedder
* Two different abstractions for RNNs (Seq2Seq, Seq2Vec)|
* Attention
* MatrixAttention

Declarative Syntax:

Image for post
Image for post

Model archives:

This can be used to eval on test or to make a demo

Creating demos:

Still lots of stuff haven’t been figured out yet in AllenNLP:

Joel ends his part of the tutorial by demonstrating a use case showing differences between building models from scratch using numpy (don’t do that), pytorch & AllenNLP.

Tutorial Part 4: How to share your code

Image for post
Image for post

Use Docker containers:

there’s a fast docker tutorial in the slides that has all the important commands that you will probably need.

Releasing your code:

Use file cache:

Image for post
Image for post

Use python environment: for this you can rely on virtualenvs or anaconda. create a new virtual env for each project. Export it to a requirement.txt file which can be installed by anybody who reuses your code

The tutorial ends here..

Oh speaking of slides, Here you are..

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store