ML Resources — July 5

Suhas Pai
Aggregate Intellect
2 min readJul 6, 2021

Tsimpoukelli et al. show a way to adapt few-shot learning from language models to the multimodal setting, enabling strong few-shot/zero-shot performance on tasks like visual question answering. They introduce a method called Frozen, where images are encoded in the word embedding space of a large pre-trained language model and the model learns to generate image captions. While the weights of the language model remain frozen, the gradients are propagated so that image encoder is trained.

Muralidhar et al. present a series of anti-patterns in modern MLOps practices in the financial domain. Here is a summary of their anti-pattern list:

  1. Peek-A-Boo Anti-Pattern — When the date of availability of a dataset lags the date when the measurement of the data takes place
  2. Temporal Leakage Anti-Pattern — When train-test splits in a forecasting task are not done sequentially.
  3. Oversampling Leakage Anti-Pattern — When minority class oversampling is performed before splitting into training and test set
  4. Hyperparameter Leakage Anti-Pattern —For example, when training and test sets are normalized together, thus leaking the statistics of the test set.

More anti-patterns are detailed in the paper.

Prompt engineering in few-shot learning settings is non-trivial. Logan IV et al. demonstrate an efficient way to perform prompt-based fine tuning, which includes using null prompts and fine-tuning only the bias parameters of the model.

While evaluating a language model, experiments are based on a single checkpoint of a pre-trained model. However, performance of the model can change based on the model initializations and the order in which data was seen during training. Sellam et al. have released a set of 25 BERT-base checkpoints, where all the checkpoints are trained with the same hyperparameters but differ in initialization and data shuffling.

Few-shot learning in language models is highly unstable. Zhao et al. show that the instability is due to majority label bias, recency bias, and common token bias. To counteract this, they propose calibrating the model predictions by measuring the model’s bias towards each answer for a context-free input like ‘N/A’ or the empty string, and then tuning the parameters such that each answer is equally likely for that input.

Aggregate Intellect

Aggregate Intellect

Aggregate Intellect is a Global Marketplace where ML Developers Connect, Collaborate, and Build. Connect with peers & experts at https://ai.science or Join our Slack Community.

  • Check out the user generated Recipes that provide step by step, and bite sized guides on how to do various tasks
  • Join our ML Product Challenges to build AI-based products for a chance to win cash prizes
  • Connect with peers & experts through the ML Discussion Groups or Expert Office Hours

--

--