We upgrade to relative position, present a bi-directional relative encoding, and discuss the pros and cons of letting the model learn this all for you

Photo by on

This is Part II of the two-part series “Master Positional Encoding.” If you would like to know more about the intuition and basics of positional encoding, please see my first .

Whereas the first article discussed the meaning of the fixed sinusoidal absolute positional encodings, this article will focus on…


Hands-on Tutorials

We present a “derivation” of the fixed positional encoding that powers Transformers, helping you get a full intuitive understanding.

Photo by on

This is Part I of two posts on positional encoding (UPDATE: Part II is now available !):

  • Part I: the intuition and “derivation” of the fixed sinusoidal positional encoding.
  • Part II: how do we, and how should we actually inject positional information into an attention model (or any other model…


Hands-on Tutorials

For anyone who’s completely lost at how graphs can be used to compute derivatives, or just wants to know how TensorFlow works at a fundamental level, this is your guide.

Photo by on

Deep learning is so popular and so widespread these days, it’s easy to ask the question, “Why did it take…


Making Sense of Big Data

A detailed guide to how a simple GPU-optimized layer substitution can offer 2x-10x speedups, with little to no loss in performance

A camel in the desert
Photo by on . Adapt your softmax like this camel adapted to the desert!

The goal of this post is to explain and provide a TensorFlow 2.0+ implementation of the adaptive softmax, outlined in Reference [1] ():

Just by switching your softmax to an adaptive softmax, you can easily achieve anywhere from 2x-10x speedups in both training and inference. …


Hands-on Tutorials

We offer a hands on guide to the art of tuning locality sensitive hashing with applications to the tasks of document comparison and vector similarity.

Photo by on

For an accompanying guided jupyter notebook, with explanations and ready to run code, please see the following repository:

Inspiration for this post was drawn from the excellent book “Mining Massive Datasets” by Jeff Ullman, Jure Leskovec, and Anand Rajaraman , available for free online at:

To motivate you…


Getting Started

An in depth dive into the inner workings of the SentencePiece tokenizer, why it’s so powerful, and why it should be your go to tokenizer. You might just start caring about tokenization.

Photo by on

I’ll be the first to admit that learning about tokenization schemes can be boring, if not downright painful. Often, when training a natural language, choosing and implementing a tokenization scheme is just one more added layer of complexity. It can complicate your production pipelines, kill the mood with a poorly…

Jonathan Kernes

I’m a physics PhD recently taken over by the allure of AI. When I’m not at the computer you might catch me listening to hip hop or watching basketball.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store