Published inTowards AIRun Mxbai Rerank v2 with InfinityAchieve Sota quality & 4x faster re-ranking with a simple proxy, enhancing LLM accuracy in your applications.Mar 18Mar 18
Ollama with ApiKey & LiteLLM ProxyOllama is a popular serving application inferring your LLM models locally. Unfortunately, it doesn’t support setting API Key, so if you…Feb 11Feb 11
Serve text-embeddings-inferenceHow to run Hugging Face Text Embeddings Inference (TEI) to serve embedding and reranking models.Jan 21Jan 21
Infinity serving ModernBERT modelNot long ago I discovered a great project Infinity to run embeddings and reranking models locally. Fortunately, these models are typically…Jan 71Jan 71
ArgoCD App with OCI Helm Repo and KustomizeDeploy ArgoCD with kustomize and enabled helm. Create OCI Repository with Secret manifest. Full Deployment of external-dns example.Dec 17, 2024Dec 17, 2024
Published inWren AIWren AI in Kubernetes: Text-to-SQLThis is a community-contributed blog post from Damien Berezenko.Jul 11, 2024Jul 11, 2024
WrenAI Text-to-SQL: API — the good stuffWrenAI application translates Text-to-SQL to search structured data via chat. It features both a UI and an API, free and simple to use.Jun 25, 20242Jun 25, 20242
The easiest way to convert a model to GGUF and Quantizedocker run ghcr.io/ggerganov/llama.cpp. The models will be in the same folder with .bin extension. That’s it!Jun 18, 20243Jun 18, 20243
Training Models and Leveraging General Models, FinetuningThis article is a product of my own research and synthesis of knowledge about various tools for AI bot development, sourced from online…Jun 11, 2024Jun 11, 2024