In the previous article, we discussed optimizations and heuristics used by two models, namely, sparse Transformers and Longformers, to overcome quadratic time and space used by Transformer models. With optimizations mentioned in papers for these models, the authors could not only perform equivalent to transformers in lesser space and time…