Stories by Alden Do Rosario on Medium

No, PageIndex Will Not “Kill” RAG, But It Is Indeed Excellent In Some Cases

Alden Do Rosario — Sat, 31 Jan 2026 13:27:29 GMT

An independent benchmark revealing when tree-based RAG outperforms vector RAG — and when it can’t even be used

A viral tweet recently claimed that PageIndex, a new open-source “reasoning-based RAG” system, achieved 98.7% accuracy on a financial benchmark without vector databases, chunking, or similarity search.

The AI community took notice. Some called it a “RAG killer.”

I spent the past week trying to benchmark PageIndex against leading RAG providers. The results tell a more nuanced story — and reveal a fundamental limitation that no one is talking about.

What Is PageIndex?

PageIndex by VectifyAI takes a fundamentally different approach to document retrieval. Instead of the standard chunk-embed-retrieve pipeline, it:

Builds a hierarchical tree index (like a semantic table of contents)
Uses LLM reasoning to navigate the tree and find relevant sections
Extracts content from identified sections for answer generation

The idea is compelling: similarity search finds similar content, but reasoning finds relevant content.

When a question asks for a certification date, similarity search might return a certifications table — related but useless. Tree-based reasoning can navigate to the timeline section instead.

VectifyAI’s Mafin 2.5, powered by PageIndex, achieved 98.7% accuracy on FinanceBench. But FinanceBench tests single-document question answering — each question targets a specific financial report.

The question is: what happens when you have 1000 documents?

The Scalability Problem

Here’s what I confirmed through testing: PageIndex’s tree-based approach cannot practically scale to multi-document scenarios.

It’s great for single-document use cases (for example: Financial documents), but falls short on large multi-document knowledgebases.

In our testing using Google’s simpleqa-verified dataset (a 1000-question benchmark dataset that has about 2795 documents ), building the index ran into major scalability problems.

Due to which: We had to fall back to standard vector search — the same approach it claims to replace.

PageIndex’s team has been transparent about this. In a public exchange on X (Twitter), their official account noted that PageIndex is currently designed for single long document question answering, and that for multiple documents (more than 5), they support via other customized techniques.

They also acknowledged that the open-source version uses a sequential indexing process intended more as a proof of concept than an enterprise-ready system.

The Benchmark: 100 Questions, 1000 Documents

To evaluate PageIndex in a multi-document scenario, I tested what actually happens at scale: FAISS vector retrieval (the fallback when tree indices aren’t available) followed by GPT-5.1 answer generation.

I compared this against three commercial RAG providers, all answering the same 100 questions from SimpleQA-Verified across 2,795 source documents:

Scoring

Quality = (correct - 4 x incorrect) / total

The 4x penalty for incorrect answers reflects a design choice that favors precision over recall: a confident wrong answer costs four times as much as a correct answer earns.

This benefits conservative systems that abstain when uncertain. With a different penalty ratio, rankings would change.

Note: These results are based on a 100-question sample from SimpleQA-Verified, not the full 1,000-question benchmark. With this sample size, the rankings should be treated as directional indicators rather than statistically definitive. The difference between adjacent providers (e.g., CustomGPT at 0.78 vs PageIndex at 0.69) may not be statistically significant at n=100.

When the pipeline does answer, it achieves 96.4% accuracy (81 out of 84 attempted).

The Core Trade-off

These results reveal a fundamental trade-off in PageIndex’s design:

Single-Document: Designed to Excel

PageIndex is built for single-document deep analysis. When it can use tree-based reasoning on a known document, the structural navigation genuinely finds information that similarity search misses. PageIndex’s own FinanceBench results demonstrate this capability.

Multi-Document: Falls Back to Standard RAG

When PageIndex faces hundreds or thousands of documents, it can’t build tree indices fast enough. It falls back to FAISS vector search — and performs like any other vector RAG system, without the structural reasoning that makes it special.

This is the core insight: PageIndex’s strength (tree reasoning) is exactly the thing that can’t scale.

Where PageIndex Genuinely Excels

Despite the multi-document limitations, PageIndex’s approach has real value:

Single-document deep analysis. For financial reports, legal filings, technical manuals — any scenario where you know which document to search — tree-based reasoning navigates complex structure better than chunk-based similarity search.

Structured documents. Documents with natural hierarchy (sections, subsections, numbered items) play to PageIndex’s strengths. The tree index mirrors the document’s own structure.

Auditability. Every retrieval decision is traceable — which tree nodes were considered, which were selected, and why. This matters for compliance-heavy domains.

Principled abstention. PageIndex says “I don’t know” rather than guessing wrong — a valuable property for high-stakes applications.

The Honest Takeaway

PageIndex will not “kill” RAG. Its core technology (tree reasoning) can’t scale to the multi-document retrieval scenarios where most RAG systems operate.

But PageIndex is excellent in some cases. For high-stakes, single-document analysis — legal review, financial due diligence, regulatory compliance — the combination of structural reasoning and principled abstention is genuinely valuable.

The real future likely involves hybrid approaches: vector retrieval for document discovery, tree-based reasoning for precise extraction within top candidates. PageIndex has demonstrated that LLM reasoning over document structure can outperform similarity search for within-document retrieval. That’s a meaningful contribution.

Not a RAG killer. But a valuable tool for specific, high-stakes use cases.

Methodology & Reproducibility

Full benchmark code, data, and results are published at: github.com/adorosario/pageindex-rag-benchmark

Technical Details

Questions: 100 from SimpleQA-Verified (factual, single-answer)
Documents: 2,795 indexed in FAISS (text-embedding-3-small, 81,868 chunks)
Answer model: GPT-5.1 (temperature=0) for PageIndex fallback pipeline; each commercial provider uses its native model
Judge: GPT-4.1-mini using the simple-evals grader template
Scoring: Quality = (correct — 4 x incorrect) / total (penalty_ratio=4.0)

Note: The 4x penalty ratio is a design choice that favors precision-oriented systems. Rankings change under different penalty ratios:

At 1x penalty (no extra punishment for wrong answers), OpenAI RAG (0.81) would rank 3rd ahead of PageIndex (0.78).

Limitations

Sample size: This benchmark uses 100 questions from the 1,000-question SimpleQA-Verified dataset. Results are directional indicators, not statistically definitive — differences between adjacent providers may not be significant at this sample size
Each provider uses its own answer model, making this an end-to-end provider comparison rather than a retrieval-only comparison
I designed the benchmark methodology and selected the providers, scoring formula, and penalty ratio using techniques from OpenAI’s recent “Why LLMs Hallucinate” paper https://arxiv.org/abs/2509.04664 — PS: Read the blog post, its a must-read.

Disclosure

I am the Founder of CustomGPT.ai, one of the evaluated providers. I selected the providers, designed the benchmark methodology, and ran all evaluations. Full audit data is published for transparency so readers can verify the results independently.

While there might be perception of bias, this is also based on 3-years experience (and 10,000+ paid customers) dealing with RAG.

Alden Do Rosario is Founder of CustomGPT.ai. The full benchmark code and results are available at github.com/adorosario/pageindex-rag-benchmark.

Why Your AI Agent Fails at Document Analysis (And What We Built Instead)

Alden Do Rosario — Thu, 08 Jan 2026 19:03:52 GMT

I’ve watched hundreds of enterprise teams try to use ChatGPT and Claude for document analysis.

They upload a contract. They get a summary. They’re impressed.

Then they ask a real question: “Does this match our standard terms?”

And the AI has nothing useful to say.

This isn’t a failure of the AI. It’s a failure of approach. Generic AI gives generic answers because it doesn’t have your context.

After building AI solutions for enterprises over the past three years at CustomGPT.ai, we finally closed this gap.

We call it Document Analyst.

Analyzing photos against building code regulations

The Summarization Trap

Here’s what generic AI does well: summarize, extract key points, categorize information. If you upload a contract, it’ll tell you the parties involved, the key dates, the main obligations.

That’s useful. It’s also table stakes.

Here’s what generic AI can’t do:

- Check if a vendor agreement matches your standard terms

- Flag compliance issues against YOUR regulatory guidelines

- Compare a resume against YOUR hiring criteria

Ask ChatGPT any of these questions and you get… silence. Or worse, a generic answer that sounds helpful but isn’t. Maybe even a hallucination.

Why? Because ChatGPT doesn’t know your standard terms. It doesn’t know your regulatory guidelines. It doesn’t know your hiring criteria.

The AI is doing exactly what it’s designed to do: analyze the document in front of it. The problem is that real document analysis requires context that exists outside that document.

A compliance team uploads marketing materials. ChatGPT flags generic issues — “Consider checking industry regulations.” But it doesn’t know your specific regulatory framework. It doesn’t know that the FTC changed rules last quarter. It doesn’t know your internal approval checklist.

So after the AI “helps,” the team still has to do the real work. Pull up the guidelines. Cross-reference manually. Ask three colleagues for context. Spend half a day on what should take minutes.

That’s the summarization trap. The AI looks helpful. The work doesn’t actually get done faster.

Retrieval vs. Analysis

If you’re building enterprise AI, you’ve probably implemented RAG - Retrieval Augmented Generation. Your AI can search your knowledge base and pull relevant information.

That’s necessary. It’s not sufficient.

Retrieval answers: “What do we know about X?”

Analysis answers: “How does this new document compare to what we know?”

The difference is subtle but critical. Retrieval finds information. Analysis reasons across information.

When your legal team uploads a vendor contract, they don’t want to search your knowledge base. They want the AI to automatically compare that contract against your templates, your standards, your past agreements — and tell them what’s different, what’s missing, what’s concerning.

That requires the AI to hold two things in context simultaneously: the uploaded document AND your institutional knowledge. Then reason across both.

The breakthrough we made wasn’t better summarization. It was recognizing that the uploaded document isn’t the only input. Your institutional knowledge is the lens through which that document should be analyzed.

Same contract uploaded to ChatGPT and to your Document Analyst-enabled agent produces completely different outputs. ChatGPT gives you a summary anyone could get. Your agent gives you insights specific to your business.

Generic AI gives generic answers. Your agent gives YOUR answers.

How Document Analyst Works

We designed Document Analyst to be simple. In your agent’s settings, toggle on the Document Analyst action. That’s it.

1-click enable the document analyst.

Your users now see an attachment icon in chat. They can upload files — contracts, transcripts, resumes, schematics, photos, spreadsheets. Any text or image file.

Here’s what happens next:

The agent treats the uploaded file as temporary context. Then it analyzes that file against your permanent knowledge base — the documents, data, and guidelines you’ve already uploaded to train your agent.

The result: insights that combine the new information with everything your organization knows.

Need to compare multiple documents? Upload several. The agent can analyze them side-by-side, spot differences, find patterns.

And critically: every answer comes with citations. You’re never blindly trusting the AI. You can trace every claim back to the source — whether that’s the uploaded document or your knowledge base. This is non-negotiable for enterprise use.

Under the hood, Document Analyst reasons across both contexts, finding connections, spotting gaps, and surfacing insights that matter specifically to your business. But you don’t need to understand the architecture. You just need to know it works.

What Teams Are Actually Doing With It

We launched Document Analyst in beta to a group of power users. Here’s what they’re doing:

Compliance Pre-Check

Compliance teams drop in marketing materials before they go live. The AI flags issues against their regulatory guidelines instantly. Human review still happens, but the first pass that used to take hours now takes minutes.

Contract Review

Legal uploads vendor agreements and checks them against standard templates. The AI spots what’s missing, what’s different, what’s concerning — before anyone signs. “Clause 4.2 conflicts with your standard NDA” is more useful than “This is a vendor agreement.”

Candidate Screening

HR compares batches of resumes against job requirements using their own hiring criteria. Hours of screening work, automated.

Technical Support

Engineers drop in spec sheets or incident reports and cross-reference against SOPs and troubleshooting guides. No more digging through folders to find the relevant documentation.

The pattern is consistent: tasks that used to require assembling context from multiple sources, multiple colleagues, multiple systems — now happen in a single conversation with your AI agent.

Hours of work. Minutes to complete.

What This Means for Enterprise AI

We’re at an inflection point in how enterprises use AI.

The first wave was “AI that knows everything” .. ChatGPT, Claude, tools that have broad knowledge but no specific context.

The second wave is “AI that knows YOUR everything” , agents that combine general intelligence with institutional knowledge.

Document Analyst is one capability in this second wave. The ability to analyze new information through the lens of what your organization already knows.

This isn’t about replacing human judgment. The compliance team still reviews the flagged issues. Legal still makes the call on contract terms. HR still interviews candidates.

What changes is the prep work. The context-gathering. The “let me check with three colleagues before I can start” phase. That’s what gets compressed.

Your team spent years building institutional knowledge. Policies, templates, guidelines, best practices, tribal knowledge captured in documents. That knowledge has always been valuable. Now it’s also accessible, for every document someone uploads, for every analysis someone needs.

We’re building more capabilities like this. Document Analyst is the first. There’s more coming. Lots more.

Try It

Document Analyst is available now on Premium and Enterprise plans. It works across all deployment types, including API, so you can build document analysis into your own workflows.

To enable it: go to your agent’s Actions page and switch the toggle. Your users will see an attachment icon in chat. Upload documents, ask questions, get answers grounded in your business data.

I’d genuinely love to hear what you analyze first. What’s the document workflow that’s been stuck on “manual review”? What’s the analysis that requires three colleagues to start?

Enable Document Analyst and let me know.

— -

Alden Do Rosario is the Founder & CEO of CustomGPT.ai, named one of the “Top 7 Emerging Leaders in Generative AI” by GAI Insights. He previously served as CTO at Chitika.

Why Your AI Agent Fails at Document Analysis (And What We Built Instead) was originally published in Artificial Intelligence in Plain English on Medium, where people are continuing the conversation by highlighting and responding to this story.

“$500-an-Hour” Is an Endangered Species — How To Use AI To Modernize Your Consulting Practice

Alden Do Rosario — Mon, 04 Aug 2025 18:17:55 GMT

“$500-an-Hour” Is an Endangered Species — How To Use AI To Modernize Your Consulting Practice

“If ChatGPT can serve fast, honest advice whenever someone wants it, why pay me for a 2‑hour billing block?”

My neighbor literally came to my door yesterday asking me this blunt question. He is a respected consultant with thousands of C‑suite clients and decades of credibility.

Billing hourly feels safe, until the AI runs 24/7. And right now, an AI-powered advisory industry is scaling faster than ever.

Executives at over 70% of consulting firms now report using AI in client work. And global revenue in AI consulting is expected to grow from about $11 billion in 2025 to nearly $90 billion by 2035.

If someone you hired spent the weekend patching their face-to-face into an AI coach without reducing value — or even increasing insights — would they notice?

Here’s a five‑step blueprint for non‑tech founders, coaches, and consultants who want to turn their knowledge into an “always‑on” AI advisor, and escape the $500‑per‑hour treadmill.

Step 1: Extract What You Know — Transform It into Evergreen Products

Your unique know‑how is hiding in your PPTs, PDFs, email templates, recorded calls, and workshop worksheets. Serialize it once, and you can reuse it 1,000 times without losing clarity.

Ask yourself:

Which deliverables do you hand out at every engagement?
Which worksheets or frameworks get used over and over?

These become the fuel for your AI product.

You don’t have to code. Today’s platforms let you package your proprietary content into a custom GPT, no development team needed. Just upload, define its service-level tone, and train it smartly:

Clear, structured prompts to walk users through complex ideas? On.
In‑line citations so your assistant says “According to slide 10 of my IP” instead of hallucinating? Built in.
Clear brand and tone of voice to mimic your unique style? Easy.

Once you do this, your value isn’t measured in hours, it lives in that GPT you’ve built.

Step 2: Create a “Thinking Agent” — Choose a Reasoning-Capable Model

What does “reasoning-capable” mean? Your assistant does more than spit out chunks of your PDF, it can guide clients using your logic.

Most consultants want models that:

Know why they teach something (not just what)
Tell the user how to think, not only what to do
Cite your material so the user knows the source

ChatGPT now offers GPT‑powered agents tuned for reasoning tasks, not just writing. You feed in your assessments, frameworks, PDFs, or slide decks. The model builds vector embeddings, connects similar meanings, and gives you consistent guidance. You can even provide it custom instructions, so that it can mimic your particular style, tone-of-voice and methodology.

Think of it like a digital co‑pilot: instead of you walking the client through a 60-minute call, the AI prompts them with follow‑up questions, suggests next steps, and spots blindspots, all in your voice.

Step 3: Run a Client Beta — Ask One Crucial Question

Let your top‑tier client group (2–3 people) test the agent. Then ask them:

“Did the AI coach make you faster, clearer, or richer in just one session?”

If yes, you have product‑market fit.

Don’t over‑survey or over-think. You’re looking for yes‑or‑no conviction, not polite feedback forms.

Step 4: Flip on a Sustainable Revenue Model — Monthly Subscriptions, $/client, $49–$499/month

Once feedback is positive, you’re ready to monetize. The subscription model is not speculation, it’s structurally more stable than hourly or one‑off packages because:

Clients pay for access, not calendar blocks
Churn drops the longer someone stays
You can bundle premium human time as an upsell, AI as your funnel

Ninja tip: Have a general-purpose custom GPT embedded on your website with free advice. This is your lead magnet.

Real‑world example: An accountant launched an AI advisor with access at $89/month and offered one-on-one VIP hours on top. Within a few months, their recurring SaaS business has hit 7‑figures valuation (based on 6-figure ARR)

Still skeptical? Research shows clients in subscription relationships report stronger progress because they feel accountable, and your content is getting consumed weekly, not occasionally.

Step 5: Weaponize the “Data Flywheel” — Your Agent Learns Every Week

This is the real magic: every conversation powers better performance.

When users ask questions:

Their prompts tell you which concepts get used the most.
Their feedback reveals where the bot stumbles (“I meant X, not Y”).
Their usage signals you which framework needs rewrites or expansion.

You log and categorize these interactions — then iterate:

Improve phrasing
Add clarification articles or examples
Update your slide deck, spreadsheet, or workbook
Retrain or refresh your assistant’s memory

This is the “data flywheel”: each iteration of learning makes the agent more accurate, so clients trust it, which leads to more use, and the cycle repeats.

It’s what turned Amazon’s recommendation system into a self‑improving loop, and you can build a micro‑version of that with your own content.

See my previous article below on how to do this.

AI Chat Logs — The Hidden Goldmine Your Company Hasn’t Discovered Yet.

Risk & Trust: Don’t DIY — Purchase Platforms That Handle Security and Updates

Don’t neglect compliance, AI chat logs can be sensitive:

Use services with built‑in logging that meet SOC‑2 Type 2 or ISO‑standards
Add system prompts to prevent injection or misuse
Version your knowledge base so if you add data, you can always revert or audit content

Use platforms that abstract away the compliance layer so you can focus on content.

Why This Strategy Works

Speed to market ‑ You package your existing IP and ship in days — not months of coding.
Equity, not hours ‑ You build content once; it pays you forever.
Higher price elasticity ‑ Clients happily pay $199/month plus your VIP coaching upsell.
Scalable defensibility ‑ Your content becomes the moat; AI is the multiplier.
Sustainable value ‑ The more clients use it, the smarter it becomes, unlike linear hourly labor.

So — What Should You Do Tomorrow?

Pick one asset — say, “How I run my quarterly strategy workshop”
Transform it into a simple chat flow: checklist, decision branch, reflection prompt
Implement in ChatGPT or CustomGPT.ai and have 2–3 clients try it as a helper to your live call
Ask them: Did it help you make progress in half the time, or save $1K in human hours?
If they say yes, re-build the custom GPT in a business-grade, compliant platform and embed it behind a paywall. (PS: You might need a Fiverr developer to do this part)
Observe the chat logs and update the agent weekly based on patterns

For the non‑technical leader: Think of this as copy‑and‑paste from your brain into a box that runs itself. Soon, every thought you’d teach once becomes a question you answer once, and a tool your clients use daily.

If you’re still billing by the hour in 2025 — when rivals are running agent-powered advice on demand — you’re not just behind. You’re forfeiting future value.

So here’s a handshake: start today, beta test this week, and choose your first monthly‑access product by next Monday. Then, watch it grow above your highest billing threshold, in perpetuity.

Giving this a thought? Drop a comment below.

Dear Support Departments, Please Stop Measuring Success by Ticket Volume — It’s a Vanity Metric

Alden Do Rosario — Mon, 30 Jun 2025 16:42:31 GMT

Dear Support Departments, Please Stop Measuring Success by Ticket Volume — It’s a Vanity Metric

Every day, support leaders celebrate soaring ticket counts and tidy backlog burndowns as if those charts proved they were winning.

They’re not.

Ticket-volume dashboards tell you how many problems customers had to report — not how many you actually solved, nor how easy (or painful) the experience felt.

In fact, chasing volume targets rewards bad product design, punishes real customer empathy, and silently drains revenue you could be using to scale.

Below is a hard-look audit of the “ticket-volume cult,” the collateral damage it causes, and the metrics that truly predict loyalty (and profit).

And most importantly, what to do to drive those metrics.

The Ticket-Volume Trap

Ticket-count graphs look authoritative because they move every hour and spit out big round numbers.

But call any KPI a “workload thermometer” long enough and you’ll forget it was never meant to be a thermometer for success.

Even Geckoboard, which popularized the metric, lists “staff planning”, NOT customer happiness, as the primary use-case for volume charts.

Five Things Ticket Counts Never Tell You

First-Contact Resolution (FCR). Customers only “feel” a fix when the first interaction works; FCR drives CSAT upward while volume remains silent on the topic.
Customer Effort. Gartner’s Customer Effort Score (CES) shows high-effort experiences create 96 %more churn, even if they produce identical ticket totals.
Sentiment & Word-of-Mouth. Net Promoter Score (NPS) can fall while volume stays flat, proving that ticket counts miss the emotional side of service.
Product Quality Signals. Spikes in password-reset tickets inflate “good” volume stats while masking a UX flaw your engineers should fix
Silent Suffering. Up to 81 % of customers try to self-solve before logging a ticket; their frustration disappears from your chart entirely.

Metrics That Actually Matter

First-Contact Resolution (FCR) : Cuts repeat contacts and is directly correlated with CSAT.
Customer Effort Score (CES) : A stronger predictor of future spend than NPS or CSAT.
Deflection Rate: Modern AI assistants can deflect 20–90 % of incoming tickets without hurting satisfaction. (Source: customgpt.ai )
Agent Quality Index (AQI) : Gen-AI quality-assurance tools now score every interaction for accuracy and empathy, not just 1% of random calls. (Source: mckinsey.com)
Customer-Performance Indicators (CPIs) : HBR’s term for metrics customers themselves value (speed, clarity, empowerment).

Enter AI for L0 Support : The Fastest Way Off the Treadmill

Forward-thinking teams let AI handle Level 0 inquiries 24/7 , resetting passwords, answering questions from knowledgebases, explaining features, so humans can focus on exceptions.

Most customers already expect AI to resolve routine issues, and companies deploying AI are now reporting huge cost-to-serve reductions.

Dear Support Departments, Please Let AI Do 90% Of Your Job

Breaking the Ticket-Volume Addiction

Stop Worshipping the Graph. Hide volume from exec dashboards for 30 days; watch how conversations change.
Swap Inputs for Outcomes. Replace “tickets closed” with FCR, CES, and revenue retained.
Deploy an AI Assistant As L0 Support. Deflect the low-value flood so your new metrics don’t get drowned in noise.
Run a 90-Day Pilot. Pick a chronic issue queue, roll out AI deflection, and benchmark FCR + CES before/after.
Reward Empathy, Not Speed. Tie bonuses to CES and quality-assurance scores, not handle-time.

AI in Customer Service Is Freaking People Out — Here’s How to Sway Your Team …

Measuring support by ticket volume is like judging a hospital by the number of incoming ambulances: the chart may look impressive, but it’s the opposite of success.

The forward-thinking support department belongs to teams that minimise customer effort, maximise first-contact resolution, and weaponise AI as a tireless L0 shield.

Retire the vanity metrics, and let your dashboards reflect the metrics your customers care about most.

The Absolute Lowest Hanging Fruit In Generative AI Is …

Your Looker report may never forgive you, but your customers will.

Improve your efficiency in meetings with this simple Manus prompt.

Alden Do Rosario — Tue, 27 May 2025 16:57:01 GMT

There used to be a time when people would walk into meetings (over Zoom or in-person) with zero preparation.

And then the meeting goes down this low-fidelity path of silly questions like “How is your dog?” or “What’s the weather like where you are”.

And even worse: Since everyone is so blatantly unprepared, nothing real gets done in the meeting, and the real questions go un-asked.

Image Credit : Alden Do Rosario

Manus AI to the rescue.

But wait — what is Manus AI?

Manus AI is a groundbreaking general-purpose AI agent. Unlike traditional AI chatbots that merely respond to prompts, Manus is designed to autonomously execute complex tasks across various domains, effectively acting as a digital assistant that can think, plan, and act independently.

Image Credit : Alden Do Rosario

Manus AI has garnered attention for its ability to handle tasks ranging from content creation and data analysis to workflow optimization and travel planning.

It operates by breaking down high-level instructions into manageable subtasks, executing them without the need for continuous human oversight.

In the context of meetings, Manus can prepare agendas, anticipate questions, and even generate comprehensive reports, ensuring that sessions are productive and focused.

How to use Manus to prepare for the meeting?

To prepare for your meeting, just give Manus some context about the attendee and what the meeting is about, and let it get to work. ‘

For example, in the prompt below, a candidate is coming in for an interview. So it makes sense to run some deep research over the resume, combined with the job description.

[NAME] is coming in for an interview for this job [ROLE] that I have attached.

I need you to do a full fledged HUMINT exercise on him. 

Scour the web and whatever sources you have access to and give me a full HUMINT report on him. 

Cross check the resume, dig deep, do what it takes.

Don't ask questions. Do whatever is needed. use your judgement.

Here is the type of report you can expect:

Image Credit : Alden Do Rosario

Why this works: The fidelity of the meeting will be increased 2–5X because you will be much more educated about the person. Instead of vague small talk like “What did you do in college?”, you can be like “Hey, I saw you scored 21 points against X — lets talk about that game” or “Dude, tell me about why you got suspended in junior year”

Can this be used for sales meetings too?

Absolutely — doing this for sales meetings is bound to improve your close rate. Here is a sample prompt for sales meetings (please modify it to your own situation)

I have a meeting with the attendees shown below - its a 30-min zoom call.

here are the attendees:
```
Copy-paste attendees from your calendar invite. 
```

Here is the agenda:
```
Include agenda of the meeting (typically copy-pasted from calendar invite or email) 
```

Here is my goal for this meeting: 
```
Clearly state the goal of the meeting (like close the deal, setup demo, etc) 
```

I want you to deeply research the participants and then prepare me with the anticipated questions they will have based on their roles. Give me a full report of each participant, what he/she does and the questions I should anticipate from them.

I’ve been using this approach to prepare for sales meetings recently and the attendees truly appreciate the preparation and deep research.

And best of all: Someone else (the AI!) did all the hard work, something that would previously have taken me ages to do (which means I would skip it!)

One key thing to remember: If the meeting is critical, do spend a few minutes to cross-check the facts (it’s AI after all and can hallucinate sometimes!)

What is PROMPT PROMPTS?

It’s a new publication established as a library for useful AI prompts. You’ll get fast, no-nonsense guides and tips from a range of prompt engineers.

Keep up with our newest AI resources — follow the publication now!

Prompt Prompts

Write for PROMPT PROMPTS

If you have the AI skills and want to contribute to Prompt Prompts, you can email to be added to our writers: PromptPrompts@jimtheaiwhisperer.com

You’ll be sent a template to follow. Prompt Prompts has high standards focused on clarity and brevity. It’s in the name: we deliver prompts fast!

Improve your efficiency in meetings with this simple Manus prompt. was originally published in Prompt Prompts on Medium, where people are continuing the conversation by highlighting and responding to this story.

Focus on Bing — Not Google — For AEO, Here Is Why …

Alden Do Rosario — Fri, 07 Mar 2025 15:47:37 GMT

Focus on Bing — Not Google — For AEO, Here Is Why …

A few day ago, I published my article “ChatGPT Clicks Convert 6.8X Higher Than Google Organic” here on Medium, and the #1 question flooding my inbox has been:

How do I optimize my website for AEO (AI Engine Optimization)?

If you haven’t read the data-backed article yet, do that first. The findings are eye-opening.

ChatGPT Clicks Convert 6.8X Higher Than Google Organic

But if you want my one-word ADHD response to the AEO question: Bing.

Now, just to be clear — I’m NOT an SEO or AEO specialist. There are far more experienced folks than me in that arena.

But having analyzed the conversion data and seen the writing on the wall, here’s my perspective:

1. Focus on Bing, Not Google

Look, I know this sounds counterintuitive. Google has dominated search for so long that “Google it” became a verb. But here’s the reality:

Bing powers most of the AI answer engines — ChatGPT, Microsoft Copilot, and influences Perplexity too.
Google still holds massive market share, but extracting growth from Google is like trying to juice water from a rock at this point.

While everyone else is battling for tiny gains in the Google ecosystem — all while Google is rapidly changing its algorithm and juicing every dollar from its paid ads — there’s significantly less competition in optimizing for Bing — which now has outsized importance thanks to its role in powering AI responses.

How to Actually Optimize for Bing (Actionable Steps):

Exact Keyword Usage: Unlike Google, Bing places higher emphasis on exact-match keywords. Include your target keywords in titles, H1/H2 tags, meta descriptions, and within the first 100 words of content.
Focus on High-Quality Backlinks: Bing prioritizes quality over quantity. Aim for backlinks from older, established domains and authoritative sites (.edu, .gov) rather than chasing volume.
Optimize for Social Signals: Unlike Google, Bing actually considers social media signals as ranking factors. Promote your content actively across social platforms to improve engagement and visibility. (PS: The one exception : Lately Google seems to love Reddit)
Submit Your Sitemap: Use Bing Webmaster Tools to submit your sitemap directly. This helps Bing crawl and index your content more effectively. (Yeah, I know this sounds basic — but it’s the least you can do)
Consider Meta Keywords: While Google ignores meta keywords, Bing still gives them some weight. Include relevant, targeted keywords in your meta tags.

This isn’t just theory. Our data shows that clicks from ChatGPT (which has ties to Bing’s index) convert at 6.8X the rate of Google organic traffic. That’s not a minor improvement — it’s a major difference for reasons mentioned below.

2. Focus on Conversions, Not Traffic

The old SEO playbook was simple: more traffic = more success. That’s outdated thinking.

In our analysis, ChatGPT drove just 4% of our traffic but accounted for 22% of conversions. Think about that — a traffic source that represents a tiny fraction of visitors but delivers over one-fifth of your business outcomes.

And here is the most scary thing: Answer engines like ChatGPT and Perplexity are just getting started — but growing 44% month-over-month. Most of the world hasn’t even heard about these engines yet. In fact, ChatGPT with free search was introduced just a month ago.

Conversion-Focused Actions to Take Today:

Track AI Source Conversions: ChatGPT uses utm_source=chatgpt.com in its outgoing clicks — start tracking those in your GA4. Perplexity sends the Referer header (here is my conversation with Perplexity cofounder, Johnny Ho)
Optimize Landing Pages for AI Visitors: Create dedicated landing pages that acknowledge the context visitors already have from their AI conversation (e.g., “As you may have discovered in ChatGPT…”). Since ChatGPT sends the utm_source=chatgpt.com param, you could potentially customize the user experience based on that.
Create Content Assets for AI Citations: Develop high-value resources specifically designed to be cited by AI engines — comprehensive guides, original research, and definitive answers to common questions. While answering questions like “How”, “What”, etc is facing a decline in organic Google search, the content from these is what feeds the AI engines. Do a content gap analysis on your website to see what authoritative questions are missing from your website content.

This is the quality vs. quantity debate settled with hard data. I’d rather have 100 highly qualified visitors than 10,000 casual browsers any day.

3. Focus on Chat, Not Pageviews

The conversational nature of AI answer engines fundamentally changes user expectations. When someone clicks through from ChatGPT to your site, they’ve already had a conversation about your topic. They’re primed, informed, and ready to engage.

Why You Need a Custom Chatbot on Your Website:

Today’s visitors are becoming increasingly “AI-first” in their thinking. They’re comfortable with conversational interfaces and often prefer them to traditional page navigation.

Remember: AI is the new UI

Here’s why implementing a custom chatbot on your site is no longer optional:

Match User Expectations: Visitors coming from ChatGPT or similar tools are already in a conversational mindset. A custom chatbot provides continuity in their experience.
Instant Information Access: Traditional website navigation forces users to hunt through page hierarchies. A custom chatbot lets them simply ask for what they need.
Personalized Content Delivery: AI chatbots can tailor responses based on user context, delivering exactly what each visitor needs without overwhelming them.
Enhanced Data Collection: Chatbot conversations provide invaluable insights into what your visitors actually want to know — information you can use to improve your content strategy. The chat logs are a goldmine of information for you to inform your conversion and content strategy.
24/7 Engagement: A well-trained chatbot provides immediate responses at any hour — and any language — keeping potential customers engaged when your team is offline. (Tip: This is doubly true if your customers are more active during off hours — like divorce law or plumbing services)

Tools like ChatGPT or onsite custom chatbots are pre-cooking and pre-selling the user so that they are all ready to buy.

Actionable Steps for Chat Optimization:

Implement a Custom-Trained AI Chatbot: Use no-code platforms to create a custom chatbot specifically trained on your content.
Provide Your Bot with Starter Questions: Use actual customer inquiries to provide your user with starter questions. You can also easily customize your bot in plain English to act in a certain way by providing it with a custom persona.
Create Conversational Paths to Conversion: Design chat flows that naturally guide users toward key conversion points. Using plain English, you can instruct the bot to end responses with a path to your conversion action (e.g. “Signup for our 7-day free trial” or “Contact us and ask about this month’s special offer”)
Monitor and Improve: Regularly review chatbot interactions to identify gaps in knowledge or opportunities for improved response.

Remember: Antiquated page structures and traditional site navigation are increasingly barriers to conversion for AI-native users. A conversational interface removes these friction points.

Example conversational agent on a documentation website

What This Means for Your Strategy (Actionable Takeaways)

If you’re a digital marketer or business owner right now, consider this a rare opportunity. While your competitors continue throwing resources at saturated Google optimization strategies, you can:

Audit Your Bing Performance Today: Use Bing Webmaster Tools to assess your current visibility and identify quick optimization opportunities.
Structure Content for AI Readability: Implement clear headings, FAQs, structured data, and summary sections to make your content AI-friendly. AND: Answer those questions related to your niche.
Set Up AI Traffic Tracking: Create conversion tracking and custom reports in GA4 to track traffic and conversions from ChatGPT, Perplexity, and other AI sources.
Implement a Custom Chatbot Strategy: Don’t just add a generic antiquated chatbot like Intercom — develop a strategy for how a custom conversational interface can enhance your specific user journey.
Develop a AEO Plan: Work with your SEO team or agency to create a specific strategy that prioritizes AEO alongside (not instead of) Google.

The shift to AI-driven answers isn’t just coming — it’s already here. And based on our data, the conversion potential is too significant to ignore. The websites that adapt fastest will capture disproportionate value in this new ecosystem.

I’d love to hear if you’ve noticed similar patterns with AEO driven conversions or if you’ve implemented any of these strategies. Drop your experiences in the comments!

ChatGPT Clicks Convert 6.8X Higher Than Google Organic

Alden Do Rosario — Tue, 25 Feb 2025 19:53:05 GMT

Here’s the deal: I recently dug into some data in GA4 for our website and found that while Google Organic brings in more traffic, ChatGPT clicks convert way better — 6.8X better for free trial conversions, to be exact.

If you’re in the trenches of SEO and digital marketing, you know that every conversion counts. Let’s break this down.

Digging into GA4.

I was checking our traffic numbers in GA4 when something caught my eye. Despite having a fraction of the total visitors, ChatGPT clicks were outperforming Google Organic in converting users into free trials.

Just to confirm: A free trial activation on our website (customgpt.ai) requires a work email AND credit card on our platform — so these are not some accidental clicks.

And yes, we’re talking about a 6.8X conversion multiplier here.

In this post, I’m going to show you the raw data, dive into each stage of the user funnel, and explain why this matters for your SEO and CRO strategy.

The Traditional SEO Playbook

For years, we’ve leaned heavily on Google Organic to drive traffic. And given the huge market share Google commands, you probably should.

The idea is simple: more clicks equal more conversions. But what if quality beats quantity every time?

But now, there is ChatGPT (and Perplexity!)

Instead of a broad net, it delivers high-intent traffic. Fewer clicks, but better-qualified leads. This post lays out the numbers behind that claim.

From a common-sense perspective, it makes perfect sense — instead of TBC (10-Blue-Links), the user is coming to your website highly qualified and probably warmed up and pre-sold on your product and services.

So it makes intuitive sense that ChatGPT clicks would be more valuable than Google Organic.

Data Collection & Methodology

I based this analysis on three core metrics:

Total First-Time Active Users: The initial touchpoints.
Total Sign-Ups (Email Required): The next step in the funnel. [Side note: We require work email — so this leaves out fraudulent gmail addresses]
Total Free Trials (Credit Card Required): The ultimate conversion goal. [Side note: The credit card is collected using Stripe hosted checkout]

We also looked at two conversion ratios:

Free Signup Conversion Ratio
Free Trial Conversion Ratio

Data was gathered using GA4 and represents the last couple of days.

Quick tip: This analysis is now possible because ChatGPT tags its outgoing clicks with a nice utm_source=chatgpt.com that can be tracked in GA4. So when your website shows up in the ChatGPT citations, it will have that param appended in its link.

Data Overview

Take a look at this table for a quick snapshot:

In short, while ChatGPT drove just 4% of the traffic, it accounted for 22% of the conversion.

Important note: This traffic distribution is on a log-scale.

Here is the kicker: ChatGPT conversions are 6.8X higher.

Detailed Funnel Analysis

Let’s slice and dice the funnel:

First-Time Active Users: Google Organic brings in 7,079 users, but ChatGPT delivers just 306.
Sign-Ups: Google Organic gets 54 sign-ups compared to ChatGPT’s 13.
Free Trials: Out of those, only 17 from Google Organic become free trials, versus 5 from ChatGPT.

Now, here’s the kicker: when you calculate the conversion ratios, ChatGPT users are far more likely to sign up and convert into free trials — even though a full credit card is required.

The drop-off is far less severe in the ChatGPT funnel, which means these users are high-quality prospects. That’s why I’m using a log scale on the funnel chart — it really lets you see the massive drop-offs and the small, yet significant, ChatGPT cohort.

Conversion Ratio Deep Dive

Let’s drill down:

Free Signup Conversion: Google Organic converts at 0.76% while ChatGPT does it at 4.25%.
Free Trial Conversion: The real star — Google Organic converts at 0.24%, but ChatGPT clocks in at 1.63%.

To be clear: ChatGPT’s free trial conversion is 6.8 times higher than that of Google Organic. It’s not just a fluke; these numbers suggest that ChatGPT clicks come with higher intent. Whether it’s the conversational context or the quality of the referral, something is clearly working in its favor.

SEO & Technical Implications

Rethinking Quality vs. Quantity

This data challenges the conventional wisdom of chasing massive volumes. Instead, focus on quality traffic that converts.

Our natural instinct (and that of our bosses) tends to chase the raw traffic numbers. It might be time to look carefully at down-funnel conversions as well.

Algorithmic Differences

Traditional search engine algorithms aim for volume, but AI-driven platforms like ChatGPT are built to provide highly relevant, context-driven results.

That difference could be why we see such a high conversion rate.

User Experience Matters

The landing page experience and the alignment of content with user intent are critical. If you can harness the intent behind ChatGPT clicks, you can optimize your funnel to maximize conversions.

Disclaimer: Our product is very tech-oriented, so might get an added boost from the tech-savvy early-adopter ChatGPT crowd. If you are selling pyjamas to grandmas, you wont see similar results.

Actionable Insights & Recommendations

Here’s what you can do:

First thing: Dig into GA4 and see if you can start tracking your incoming traffic from chatgpt.com
Get your tracking right: If you haven’t already done so, make sure that all intermediate events in your conversion funnel are being tracked in GA4.
Optimize for ChatGPT : While you don’t need to rush with both feet into AEO (or GEO or AISEO or whatever it is called these days!) — do keep an eye out for this changing phenomenon — at least when it comes to conversions. [Side note: If you haven’t already done so, please pay attention to your Bing rankings]

Caveats & Considerations

And here comes the list of caveats — because its not all milk and honey — a couple of things to keep in mind:

Sample Size: ChatGPT traffic numbers are smaller. While the conversion ratios are impressive, broader tests over time are necessary. I did run a statistical significance test, but clearly more data is needed (both on our own website — and other websites)
Your Results Will Vary : It’s very much possible that due to the nature of our product, we get a different type of visitor from ChatGPT than we do with Google Organic. Your results will vary drastically depending on what you are selling.
External Factors: Various factors like market trends, seasonal effects, or even changes in the algorithms could impact these results. There are literally a million factors at play here — so please consider your own specific conditions.
No Predictions: If you notice, I am NOT making any predictions about SEO or market share here (that’s up to pundits to argue over!) — Even though I personally feel ChatGPT, Grok and Perplexity could be nipping at Google’s market share, it’s a fruitless exercise to try and take bets on that.

Got questions or want to share your own experience? Drop a comment below. Let’s keep the conversation going!

Thank you for being a part of the community

Before you go:

Be sure to clap and follow the writer ️👏️️
Follow us: X | LinkedIn | YouTube | Newsletter | Podcast | Differ
Check out CoFeed, the smart way to stay up-to-date with the latest in tech 🧪
Start your own free AI-powered blog on Differ 🚀
Join our content creators community on Discord 🧑🏻‍💻
For more content, visit plainenglish.io + stackademic.com

ChatGPT Clicks Convert 6.8X Higher Than Google Organic was originally published in Artificial Intelligence in Plain English on Medium, where people are continuing the conversation by highlighting and responding to this story.

Deepseek for Entrepreneurs : The Reasoning Per Dollar Is All You Need

Alden Do Rosario — Mon, 03 Feb 2025 21:25:22 GMT

Deepseek for Entrepreneurs : The Reasoning Per Dollar Is All You Need

Here’s the deal:

Every once in a while, a new technology comes along that completely changes the game for entrepreneurs.

It’s not just about making things a little faster or cheaper — it’s about unlocking opportunities that were previously impossible.

One of my favorite books of all time is: On ENTREPRENEURSHIP and IMPACT by Desh Deshpande (you can view it here) — its a beautiful book for Entrepreneurs (because Entrepreneurs are busy and this book covers each chapter in 2 pages)

Here is the relevant section:

Credit: Deshpande Foundation

Deepseek (or similar models like OpenAI’s o3) is one of those technologies. Why? Because it dramatically improves what I like to call “reasoning per dollar.”

Some others call this “bending the cost curve” — you get my point though.

What’s “Reasoning Per Dollar”?

Think about it this way: How much reasoning power can you get for every dollar you spend?

This matters because, as entrepreneurs, we’re constantly building things — tools, apps, services — that solve problems.

But some ideas never make it off the ground because they’re either too expensive to execute or the tech isn’t smart enough to do the job well.

Deepseek changes that. It makes advanced reasoning affordable.

Here’s an example: Imagine a “lawyer agent” that reviews contracts for you. Before now, this kind of tool would either cost a fortune to run or be so limited in reasoning ability that it wasn’t worth using. With Deepseek, tools like this become not only possible but practical.

My Personal Experience

I’ve been working as an AI-powered research writer for three years now. The goal? To create content that’s better than what 99% of humans can write — and certainly better than anything AI has produced so far.

Hot news: OpenAI released a similar “deep research” agent today. You can do a side-by-side comparison if you are into deep researchers.

The journey hasn’t been easy. At first, the output was mediocre. Over time, it got better — “OK” turned into “Really Good.” But it still wasn’t great.

The missing piece? A reasoning model that could deeply understand and synthesize research sources.

Enter Deepseek. With its high-reasoning capabilities at a reasonable cost, my tool can now deliver results I couldn’t have dreamed of before. And since the tool takes 30+ minutes to run per task (yes, it’s computationally heavy), affordability is a game-changer here.

The Big Question You Should Be Asking

Here’s what I want every entrepreneur to think about: How does this new capability change what I do?

It’s like 2010 all over again when smartphones hit the mainstream. Back then, we asked ourselves: What happens when people aren’t chained to their desktops anymore? Entire industries were born from that shift.

Now we’re facing a similar moment: What happens when reasoning power becomes both advanced and affordable? The possibilities are endless — but only if you’re paying attention.

Addressing Concerns About “China”

Let’s clear something up: Every time someone mentions Deepseek, there’s this knee-jerk reaction about its ties to China. I get it — people are cautious about where their data is hosted and who has access to it.

But here’s the thing: There are plenty of hosting options available that mitigate these risks. I’ve even put together a quick graphic to explain how this works because education is key here.

To be clear: I have no affiliation with Deepseek — I’m just sharing insights because I think this tech has huge potential for entrepreneurs.

You should most likely do your own self-assessment — here is a helpful guide I put together for that:

DeepSeek R1 : Is It Right For You? (A Practical Self‑Assessment for Businesses and Individuals)

Final Thoughts

Technologies like Deepseek and o3 are opening up new opportunities. They let us build smarter tools, more efficient services, and entirely new applications that were once out of reach.

So ask yourself: How can this new capability redefine what I do?

Because if you don’t, someone else will — and they’ll be the ones shaping the future while you’re stuck playing catch-up.

RAG vs. CAG : Can Cache-Augmented Generation Really Replace Retrieval?

Alden Do Rosario — Thu, 30 Jan 2025 16:58:46 GMT

RAG vs. CAG : Can Cache-Augmented Generation Really Replace Retrieval?

A recent VentureBeat article highlights a new Cache-Augmented Generation (CAG) method that promises no retrieval overhead and even better performance than Retrieval-Augmented Generation (RAG).

Sounds too good to be true?

We decided to find out by running our own tests on KV-Cache (a popular CAG implementation) versus RAG.

Below are our insights on what happens when you apply these methods to real workloads.

1. Setting the Stage: RAG vs. KV-Cache (CAG)

RAG

What It Is
A Retrieval-Augmented Generation approach that uses a retriever to find relevant documents, then passes them to a large language model for final answers.

Where It Shines

Handles larger or frequently updated datasets without loading everything at once.
Avoids massive prompts, which can lead to truncation or context overload.

Key Limitations

Adds a retrieval step, which can be slower.
Often relies on external APIs or indexing overhead.

KV-Cache (CAG)

What It Is
A method that aims for near-zero retrieval time by loading all documents directly into the model’s context window. In principle, it cuts out the retriever entirely.

Note: In our benchmarks, we used a “No Cache” version of KV-Cache because the model was too large to run locally. Instead, we mimicked the same behavior via an API (OpenRouter) by feeding all documents each time. We’re not comparing retrieval speed here, since KV-Cache would obviously win if run locally on a suitable setup.

Where It Shines

If your entire knowledge base easily fits in the model’s context, you get almost instant answers (no retrieval step).
Best for stable datasets that rarely change.

Key Limitations

Context Size: If you exceed the model’s capacity, you must truncate or compress, killing accuracy.
Local Requirement: Real caching needs control over memory, meaning you must run the model on your own infrastructure.
Frequent Updates: Reloading the entire knowledge in context is impractical for dynamic data.

2. The BIG BUT (and We Cannot Lie)

Long-context LLMs (like Google Gemini or Claude with hundreds of thousands of tokens) are emerging, making CAG more appealing for some workloads.

But there’s a big condition:

You must run the model locally and have access to its memory to enable caching. Many high-powered LLMs are hosted, limit context lengths, and obviously, you can’t access the memory for user-level manipulation via an API.
Once your dataset crosses a threshold you might exceed the context window. If that happens, the method can break entirely or force you to truncate vital info, tanking accuracy.

This snippet from one error log says it all:

“error”:{“message”:”This endpoint’s maximum context length is 131072 tokens. However, you requested about 719291 tokens…”}

Translation: You’re out of luck unless you compress or chunk your data which can reduce the performance by a lot.

3. Our Benchmark Setup

We used the HotpotQA dataset (known for multi-hop QA) and ran our tests on the meta-llama/llama-3.1–8b-instruct model. We posed 50 questions each to two knowledge sizes — 50 documents and 500 documents — to see how each method performs at different scales.

Because we used an API (OpenRouter) for KV-Cache, there was no actual “cache” or local memory optimization happening; we simply passed all documents in each request.

top_k=5 for RAG, and no top_k for KV-Cache (it loads everything).
No retrieval time comparison: Our focus is on semantic accuracy, since KV-Cache would trivially have zero retrieval overhead if it were truly caching locally.

4. Results

Our benchmark tests on the HotpotQA dataset revealed interesting insights into the performance of RAG and KV-Cache (CAG) under different knowledge sizes.

Below are the key findings:

Figure 1: Average semantic similarity scores for KV-Cache (No Cache) and RAG across knowledge sizes (k=50 and k=500). Tests were conducted on the HotpotQA dataset using the meta-llama/llama-3.1–8b-instruct model, with 50 questions per knowledge size. KV-Cache used an API (OpenRouter) without local caching, while RAG employed top_k=5 for retrieval.

Key Takeaways

KV-Cache Struggles with Scale: As the dataset grows, KV-Cache faces context size limits, which require prompt truncation or compression.
RAG Handles Complexity: RAG’s retrieval mechanism ensures only relevant documents are used, avoiding context overload and maintaining accuracy.

The Bottom Line

While KV-Cache shines with small, stable datasets, RAG proves more robust for larger, dynamic knowledge bases, making it a better fit for real-world, enterprise-level tasks.

5. KV-Cache (CAG): Pros & Cons

CAG can appear unbeatable in early or small-scale tests (e.g. ~50 documents). But scaling up to 500+ documents reveals some crucial issues:

Context Overflow

When you exceed the model’s max context window, you risk prompt truncation or outright token-limit errors. Vital information gets cut, and accuracy suffers.

Local Hardware

To truly leverage KV-Cache, you need direct access to the model’s memory. If you rely on a hosted or API-driven model, there’s no way to manage caching yourself.

Frequent Updates

Every time your data changes, you have to rebuild the entire cache. This overhead can undermine the supposed “instant” advantage that KV-Cache promises.

6. Quizzing Time: Score Wars — Why ‘Rosie Mac’ is the Winner

Not all scores tell the full story. When evaluating model responses, similarity metrics compare generated answers to a reference text. But what happens when one answer is more detailed than the reference? Does it get rewarded — or penalized? Let’s look at a real example from our benchmark.

The Question:

Q: Who was the body double for Emilia Clarke playing Daenerys Targaryen in Game of Thrones?

Two Correct Answers:

Answer A

“Rosie Mac was the body double for Emilia Clarke in her portrayal of Daenerys Targaryen in Game of Thrones.”

Answer B

“Rosie Mac.”

Which one do you think scored higher on our similarity metric? Most people might assume the more detailed answer (A) wins. But here are the actual scores:

Answer A: 0.60526
Answer B: 0.98361

Yes, the shorter “Rosie Mac.” received the higher score. Why? Because the ground truth reference answer was simply “Rosie Mac” — so the more detailed response introduced extra words that lowered the alignment score.

This doesn’t mean longer answers are worse — often, they provide better context. But it highlights why similarity metrics should be interpreted with caution, especially in nuanced or multi-hop reasoning tasks. Our overall results remain valid, but it’s important to look beyond raw scores to gain a comprehensive, unbiased perspective on how these models truly perform.

Image Credit: Kevin Michael Schindler

7. Final Thoughts: No Free Lunch

Yes, Cache-Augmented Generation can truly offer zero retrieval overhead — if your entire knowledge base and context can fit comfortably in your local LLM. But for many enterprise or multi-hop tasks, that’s a big “if.”

If your data is large or updates frequently, RAG approaches like CustomGPT.ai may remain the more robust and flexible choice.

8. Frequently Asked Questions

What is Retrieval-Augmented Generation (RAG)?

It’s a technique that fetches external documents at inference time to enrich a model’s responses, allowing you to handle bigger or changing data sets without overloading the model’s context.

How did you measure semantic similarity?

We used a BERTScore model (“all-MiniLM-L6-v2”) to compare generated answers with ground-truth references.

What does “No Cache” KV-Cache mean in your diagrams?

It indicates we didn’t run an actual local caching mechanism. Instead, we replicated the effect by passing all documents via an API request each time, so we could compare its semantic accuracy without focusing on speed.

Why was HotpotQA used?

HotpotQA requires retrieving multiple documents to answer a single question, making it ideal for testing retrieval methods like RAG and highlighting KV-Cache’s limitations with large knowledge bases.

When is multi-hop retrieval needed?

When no single document contains the full answer — common in research, legal analysis, and complex reasoning tasks requiring fact linking.

Learn More

CustomGPT.ai: RAG-as-a-Service for large or dynamic data
VentureBeat Article: How Cache-Augmented Generation Reduces Latency & Complexity
CAG Official Repo: CAG Original
Our Modified Fork (With Results): CAG Fork

RAG vs. CAG : Can Cache-Augmented Generation Really Replace Retrieval? was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

DeepSeek R1 : Is It Right For You? (A Practical Self‑Assessment for Businesses and Individuals)

Alden Do Rosario — Mon, 27 Jan 2025 15:59:08 GMT

DeepSeek R1: Is It Right For You? (A Practical Self‑Assessment for Businesses and Individuals)

Deepseek just turned the AI world upside down with its new R1 model. It’s all over the news, so I won’t repeat it here.But then, the fears too are justified — as laid out nicely by Jim the AI Whisperer .

AI as political warfare: DeepSeek AI advances the global interests of the Communist Party of China

For example, here is part of the model’s system instructions (laden with Chinese govt interference) extracted by Jim.

Credit: Jim the AI Whisperer

If you’re weighing whether to adopt DeepSeek R1 for personal or business objectives, it’s worthwhile to assess its strengths and concerns against your own needs, values, and requirements.

Below is a self‑assessment framework you can use to determine whether DeepSeek R1 might be right for you.

1. Use Case Clarity: What Do You Want from an AI?

Content Generation & Creative Work

Potential Benefits: DeepSeek R1’s advanced language modeling capabilities are reportedly strong. If your goal is to produce compelling marketing copy, draft articles, or generate creative content, you might find an upside in the model’s robust generative power.
Potential Drawbacks: Some content categories may be censored or skewed by built‑in political or ideological constraints. If you rely on producing uncensored analysis or commentary (especially on certain social or political topics), you might find the outputs constrained.
Key Questions: Are you looking for an AI that can handle politically sensitive or regulated topics? Or do you primarily need general, apolitical copy?

2. Customer Support & Chatbot Integration

Potential Benefits: A model with strong, human‑like conversation skills can offer 24/7 customer assistance, handle routine inquiries, and streamline support.
Potential Drawbacks: If your brand or organization operates globally, you might worry about built‑in filters or ideological stances that clash with your corporate values, especially if customers raise sensitive questions.
Key Questions: Will you be comfortable with possible “official” or heavily filtered replies from your chatbot? Are you willing to monitor or post‑process the AI’s responses to keep them aligned with your business policies?

3. Business Intelligence & Research

Potential Benefits: DeepSeek’s underlying engine appears to handle advanced reasoning tasks well, potentially excelling in summarizing business data, generating data insights, or providing research assistance.
Potential Drawbacks: If certain lines of inquiry are restricted by the model’s core alignment, your research may be unintentionally narrowed — particularly on controversial or geopolitically sensitive areas.
Key Questions: Does your team rely on neutral, unfiltered access to analyses (especially on internationally relevant topics)? Do you need an AI unencumbered by potential ideological stances?

2. Data Governance & Privacy: How Is Your Data Handled?

1. Data Sharing & Sovereignty

Potential Benefits: DeepSeek R1 may offer attractive hosting options or local‑running instances that keep data in your environment. This is appealing for teams that want more direct control over the AI.
Potential Drawbacks: If the model’s underlying infrastructure is physically or contractually tied to government data centers (as hinted by some references), there are open questions about the ultimate destination or usage of your prompts and data.
Key Questions: Do you have strict data sovereignty requirements or compliance obligations that could conflict with DeepSeek’s data storage or transfer policies? Does your organization have the legal capacity to evaluate the model’s compliance with international privacy standards?

2. Security & Encryption

Potential Benefits: Advanced security protocols might be in place — there are mentions of quantum encryption or sophisticated channels. If you can verify this, it may be a plus for high‑sensitivity use cases.
Potential Drawbacks: Security claims still need to be audited or validated by independent experts. If your sector is highly regulated (finance, healthcare, government), you need a clear chain of trust for your data.
Key Questions: Are you comfortable accepting vendor claims of security at face value, or can you audit them? Does your IT security framework allow for third‑party solutions with unknown overseas ties?

3. Content Authenticity & Ethical Considerations

1. Risk of Political or Ideological Bias

Potential Benefits: Not all usage scenarios will trigger political or ideological issues. For straightforward tasks (like summarizing internal documents or providing coding help), the system’s underlying censorship might never surface.
Potential Drawbacks: If you or your audience values unfettered discussion of any topic, embedded content restrictions could undermine trust. Moreover, bias in the model’s worldview might seep into subtle but important ways in brand messaging or public‑facing content.
Key Questions: Does your brand want to maintain a neutral or independent stance? Could your customers perceive biased or censored outputs as negative, damaging your reputation?

2. Regulatory and Reputational Exposure

Potential Benefits: If you’re operating in a domain where the Chinese market is strategic, using a model aligned with its policies might align you with local regulations.
Potential Drawbacks: If your home market or internal policies strongly oppose censorship, or if you must uphold strict standards of freedom of expression, adopting a model with visible or hidden ideological constraints can bring negative PR.
Key Questions: Will your stakeholders or clients question your choice if the AI is discovered to have internal “red lines”? Are you comfortable explaining these constraints in board meetings or to the public?

4. Technical Versatility & Adaptability

1. Integration & Customization

Potential Benefits: If DeepSeek R1’s architecture supports customization or fine‑tuning for your in‑house tasks, you may gain advanced capabilities for specialized domains like finance, manufacturing, or biotech.
Potential Drawbacks: The ability to “jailbreak” or circumvent censorship reveals an internal conflict in the model’s architecture. This raises questions about reliability: might your integrated solution unexpectedly refuse or alter outputs under certain prompts?
Key Questions: How flexible is DeepSeek’s model? Does the vendor share documentation that details the model’s constraints or “break points”? Will you have a fallback method if you need a second model for sensitive tasks?

2. Scalability & Performance

Potential Benefits: Early reports suggest strong performance on reasoning tasks and the potential for quick model updates. At a fraction of the cost.
Potential Drawbacks: The official update cycle references “biweekly retraining with new data.” This continuous feed might help keep it fresh, but it might also unpredictably alter its alignment or responses.
Key Questions: Do you need a stable environment for your AI integration (with predictable update cadences)? Or do you benefit from an aggressively updated system?

5. Cultural, Ethical, and Operational Alignment

1. Cultural Fit:

If your personal or organizational values prioritize open discourse, you’ll want to scrutinize any hidden constraints. However, if your day‑to‑day usage does not brush against politically sensitive topics, these constraints may never surface.

Governance & Risk Appetite:

Some organizations have robust compliance frameworks requiring thorough vendor risk assessments. If that’s you, factor in the possibility of state or party influence on the model’s outputs.

Legal Environment:

The question of potential oversight or intervention by non‑domestic authorities can raise compliance red flags. For personal use, it’s often less of a concern — though still worth considering if you’re using it on sensitive topics.

6. Decision Summary: Is DeepSeek R1 Right for You?

Personal/Hobbyist Use:
If you’re casually experimenting with AI to write short stories or answer day‑to‑day queries and you have no inclination to discuss sensitive socio‑political topics, DeepSeek R1 might offer advanced capabilities at little or no cost. You’ll want to keep an eye on its data handling if you share private information, though.
Small Business / Startups:
If you want to quickly embed AI in customer support, marketing, or research, DeepSeek R1’s performance could be compelling. However, take note of brand image and the possibility that certain user queries might be “restricted.” Potential backlash might arise if your customers discover ideologically censored content.
Enterprise / Regulated Sectors:
For enterprise usage, especially in sectors like finance, healthcare, defense, or media, you’ll want to conduct due diligence. Validate if the model’s data pipeline meets your privacy/security standards, and weigh reputational risk if a foreign government’s ideological constraints are built in. Some organizations may decide the risks are too high.
Public Sector / Government:
Governments that require AI for public‑facing or internal use will look closely at sovereignty, data compliance, and ideological neutrality. If independence from external political influence is critical, you’ll likely seek a different AI model or a more transparent vendor partnership.

7. Action Items and Next Steps

Pilot in a Sandbox
Test DeepSeek R1 on your own data in a controlled environment. Evaluate censorship triggers and bias in typical use cases before integrating widely.
Conduct a Thorough Risk Assessment
Include legal, security, and compliance teams in your pilot. Identify how potential constraints or hidden influences could harm brand perception or customer trust.
Explore Alternative or Supplementary Models
You might decide to use different models for different tasks — one for general creative tasks and another for sensitive topics. This multi‑model approach can help mitigate the risk that arises from ideological filters. Consider a distilled model (like DeepSeek R1 Distill Llama 70B) that seems to have less censorship and interference built in.
Monitor Vendor Roadmap
If DeepSeek’s leadership can provide clarity on data governance or the possibility of a more neutral “international edition,” it could change your risk profile. Also monitor other vendors that could use the techniques implemented in Deepseek to create new models.
Develop Clear Usage Policies
If you adopt DeepSeek R1, craft internal policies about when and how it’s used, clarifying disclaimers for public‑facing interactions.

Final Thoughts and Takeaways

My thoughts on DeepSeek: the R1 model is shockingly good in response quality and planning — right up until you peek behind the curtain and see the system instructions saturated with overt government influence.

For most businesses outside of China, those censorship and propaganda concerns alone are likely deal-breakers.

Yet, there’s a silver lining: the technical innovations powering DeepSeek’s strong reasoning will almost certainly appear in other models — minus the political constraints.

Moreover, DeepSeek’s distilled version (which leverages LLaMA) is reportedly free from many of these problematic instructions and still offers robust capabilities. (Hint: Searching for “tank man” in this model provides a clear response — whereas it is blocked in the original model)

Have you tested DeepSeek R1 yet, or are you planning to? Share your experiences below. Let’s keep the conversation going.

DeepSeek R1 : Is It Right For You? (A Practical Self‑Assessment for Businesses and Individuals) was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.