PinnedPublished inTowards Data ScienceHow to Evaluate Multilingual LLMs With Global-MMLUEvaluation of language-specific LLM accuracy on the global Massive Multitask Language Understanding benchmark in Python3d ago3d ago
PinnedPublished inTowards Data ScienceImproved RAG Document Processing With MarkdownHow to read and convert PDFs to Markdown for better RAG results with LLMsNov 196Nov 196
PinnedPublished inTowards Data ScienceHow to Use Hybrid Search for Better LLM RAG RetrievalBuilding an advanced local LLM RAG pipeline by combining dense embeddings with BM25Aug 114Aug 114
PinnedPublished inTowards Data ScienceHow to Use Re-Ranking for Better LLM RAG RetrievalBuilding an advanced local LLM RAG pipeline with two-step retrieval using open-source bi-encoders and cross-encodersMay 25May 25
Published inTowards Data ScienceHow to Create a RAG Evaluation Dataset From DocumentsAutomatically create domain-specific datasets in any language using LLMsNov 36Nov 36
Published inTowards Data ScienceRevisiting Karpathy’s “State of Computer Vision and AI”Looking back at AI progress since the 2012 blog post “The state of Computer Vision and AI: we are really, really far away”Oct 189Oct 189
Published inTowards Data ScienceHow to Use HyDE for Better LLM RAG RetrievalBuilding an advanced local LLM RAG pipeline with hypothetical document embeddingsOct 41Oct 41
Published inTowards Data ScienceHow to Improve LLM Responses With Better Sampling ParametersA deep dive into stochastic decoding with temperature, top_p, top_k, and min_pSep 27Sep 27
Published inTowards Data ScienceHow to Reduce Embedding Size and Increase RAG Retrieval SpeedFlexible text embedding with Matryoshka Representation Learning (MRL)May 26May 26
Published inTowards Data ScienceSafeguard Your LLM Chatbot With Llama Guard 2How to apply content moderation to your LLM’s inputs and outputs for a more responsible AI systemMay 13May 13