KantzulingMulti-head vs Multi-query vs Grouped-query attentionTransformer is a model architecture introduces by Attention Is All You Need, it is now widely used in large language models (LLMs). As you…May 18May 18
KantzulingCPython GILCPython interpreter has a global interperter lock (GIL), which allows only one thread to run at a time. GIL can potentially be a…May 11May 11