Bowen LiMake choices on model parameters based on Llama3.1 and Mistral Large 2 newsModel size isn’t all things.Jul 31Jul 31
Bowen LiQuantization tech of LLMs-GGUFWe can use GGUF to offload any layer of the LLM to the CPU. This allows us to use both the CPU and GPU when we don’t have enough VRAM.Jul 29Jul 29
Bowen LiQuantization tech of LLMsLearning note and practices of quantization tech of LLMsJul 28Jul 28
Bowen LiThe three-pass approach to read a paperHow to find a paper that most related to your work in an efficient way?Jul 13Jul 13