PinnedPublished inTDS ArchiveRun Mixtral-8x7B on Consumer Hardware with Expert OffloadingFinding the right trade-off between memory usage and inference speedJan 11, 2024A response icon3Jan 11, 2024A response icon3
Published inData Science CollectiveQwen3-VL Fine-Tuning on Your ComputerModel review, GPU requirements, and code explained step by stepOct 23Oct 23
LoRA Done Right: Recommendations for Near Full Fine-Tuning PerformanceCare about the learning rate, not the alpha, and rank=1 for RL??Oct 1Oct 1
Published inData Science CollectiveNVFP4: Same Accuracy with 2.3x Higher Throughput for 4-Bit LLMsHow to quantize LLMs with NVFP4Aug 27A response icon1Aug 27A response icon1
RAG with Qwen3 Embedding and Qwen3 RerankerHow to use embedding and reranker models to efficiently retrieve only the most relevant chunks or documents given a user queryJun 26A response icon1Jun 26A response icon1
No Verifier? No Problem: Reinforcement Learning with Reference ProbabilitiesRLPR: Extrapolating RLVR to General Domains without VerifiersJun 25Jun 25