Stories by Mauro Garcia on Medium

CFS-R: Conditional Field Reconstruction

Mauro Garcia — Wed, 13 May 2026 17:21:29 GMT

Selects relevant candidates by asking which memories, combined, best rebuild the query itself.

The failure mode that started this

CFS fixed paraphrase pile-up by subtracting coverage. But there is a second failure mode underneath, and subtraction does not touch it.

Try a multi-hop question against cosine top-K:

Query: "Why did we end up over budget on the kitchen?"

Top 5:
1. "The kitchen budget was about $40k."                          ← topical, not causal
2. "Kitchen costs ran high this quarter."                        ← topical, not causal
3. "We talked about the kitchen renovation again."               ← topical, not causal
4. "Quartz countertops came in at $9k over quote."               ← actual cause
5. "Cabinets were delayed and the installer charged rush fees."  ← actual cause

The top three resemble the query. The bottom two answer it. Cosine ranks by “looks like the query,” but the query is a question and the answer is often a set of partial evidence whose combination matches the query’s content. The individually-most-similar memory is topical filler. The useful memories sit lower because each carries only part of the answer.

This is partial-evidence dilution, and it shows up constantly in long conversational histories. A user asks “why,” “how,” “what happened with,” “what did we decide” — and the answer is not one perfect memory. It’s a span of evidence.

CFS handles paraphrase clusters. CFS-R handles the next problem: which memories, together, reconstruct what the query is asking for?

The core idea

CFS treats selected memories as points that emit coverage fields. Once a region is represented, nearby repeats become more expensive. CFS-R treats candidates as basis vectors and tries to express the query as a positive combination of them. Vectors that receive nonzero weight are doing work explaining the query. Vectors that receive little or no weight are redundant given everyone else.

In linear algebra terms, CFS-R solves:

where z_q is the unit-normalized query (a d-dimensional vector), C is the matrix that stacks the unit-normalized candidate embeddings as rows (P rows by d columns), a is the nonnegative coefficient vector of length P, and λ ≥ 0 is the ridge weight.

The nonnegativity constraint is the load-bearing piece. Without it, the solver can use negative weights to cancel candidates against each other, which may be mathematically valid but is semantically useless for retrieval.

A memory should not score well because it helps erase another memory. CFS-R constrains every coefficient to be positive or zero: a_i >= 0. That means each surviving coefficient has a real retrieval meaning: this memory adds evidence toward reconstructing the query direction.

The candidates with the largest positive coefficients are the ones doing the most reconstructive work. That ranking, blended with raw cosine relevance, becomes the CFS-R leg in the hybrid retriever.

Why CFS-R is not MMR

MMR asks: is this candidate relevant, but not too similar to what I already picked? That is pairwise diversity. CFS-R asks: does this candidate positively contribute to reconstructing the query? That is evidence reconstruction.

MMR is a greedy relevance-diversity tradeoff. It penalizes each candidate by similarity to already-selected items, one pick at a time. CFS-R solves a joint optimization over the entire candidate pool — candidate interaction is baked into the coefficient solution instead of being unrolled greedily.

The short version: MMR diversifies. CFS subtracts. CFS-R reconstructs.

The math

Given a unit-normalized query vector z_q and a candidate matrix C, where each row of C is a unit-normalized candidate embedding, CFS-R first builds a candidate pool using a cosine prefilter:

Projected gradient descent is enough for the solve — one matrix-vector product per step, projection is clipping negatives to zero:

Each candidate’s final CFS-R score blends its normalized coefficient ã_i with its normalized cosine r̃_i:

The current locked setup uses w = 0.90, λ = 0.005, P = 150, and 80 iterations. The solver is small. The idea is the important part: CFS-R does not rank candidates by isolated similarity. It ranks them by positive contribution to reconstructing the query.

Why “Conditional Field Reconstruction”

CFS-R is a sibling operator, not a bolt-on:

CFS      = Conditional Field Subtraction
CFS-R    = Conditional Field Reconstruction

They share the same field intuition — retrieval should care about the shape of the evidence set, not each candidate’s independent score — but the mechanism is different. CFS is subtractive and sequential: pick one, deform the scoring field, pick the next, deform again. CFS-R is reconstructive and joint: solve for the whole coefficient vector at once.

Greedy subtraction is useful when paraphrase redundancy is the main leak. Joint reconstruction is useful when the answer requires complementary partial evidence. They are not redundant. They solve different failures.

How CFS-R behaves

Dense paraphrase cluster. CFS-R will assign weight across near-collinear candidates — not ideal if pure diversification is the goal. But coefficients compete inside the pool and the cosine prior helps reorder things; CFS-R doesn’t collapse as badly as plain cosine. In practice I don’t run it alone.

Complementary partial answers. The regime CFS-R was built for. Two candidates each carry half the answer at cosine 0.60; a topical filler scores 0.70; cosine picks the filler. CFS-R sees that the partial-answer memories explain dimensions the filler does not, gives them positive reconstruction mass, and moves them up.

Cosine top-1 is sufficient. No harm. The top result reconstructs the query well, receives a large coefficient, stays high.

Adversarial inputs. Not a truth validator — a wrong candidate that’s nearly identical to the query embedding can still fool it. But on LoCoMo adversarial, CFS-R outperformed every other operator I tested, which suggests reconstruction helps when the adversarial issue is a broader similarity trap rather than a single planted near-duplicate.

Temporal questions. The weakest spot. CFS-R reconstructs semantic evidence, not chronology. The weighted fusion mostly closes the temporal gap, but pure temporal reasoning likely needs a time-aware variant.

CFS-R as part of a hybrid retriever

Standalone CFS-R is one signal, and single signals are brittle. The deployable shape is RRF over three legs, with the CFS-R leg taking a larger pool and an explicit weight:

RRF(cosine top-10, BM25 top-10, CFS-R top-50), CFS-R weighted 3x → final top-10

The weighted RRF score:

Earlier configs used a smaller or equally-weighted CFS-R leg. The sweep showed CFS-R wants more influence in the fusion stack — not infinite (pure CFS-R underperforms the hybrid on LoCoMo), but a CFS-R-dominant RRF stack gives the best early-rank quality, which is exactly what you want in a memory retriever.

Does it work? Empirical results

I evaluated CFS-R on LoCoMo (1,982 questions, same setup as the CFS evaluation), holding cosine and BM25 fixed and varying only the third leg.

baseline cosine top-10:           NDCG@10 0.5123, Recall@10 0.6924
rrf(cos, BM25):                   NDCG@10 0.5196, Recall@10 0.6989
rrf(cos, BM25, MMR tuned):        NDCG@10 0.5330, Recall@10 0.7228
rrf(cos, BM25, CFS-long):         NDCG@10 0.5362, Recall@10 0.7295
rrf(cos, BM25, CFS-R top50 w3):   NDCG@10 0.5447, Recall@10 0.7303

Against tuned MMR: +1.17 pp NDCG@10 (95% CI [+0.66, +1.69], p < 0.001). Against CFS-long: +0.85 pp NDCG@10 (95% CI [+0.33, +1.35], p = 0.0006). Against baseline cosine: +3.24 pp NDCG@10, +3.79 pp Recall@10.

The sweep wasn’t fragile.. the top configurations clustered tightly between 0.5441 and 0.5447 NDCG@10, which means the operator is on a stable plateau rather than a single magic hyperparameter.

The category breakdown is where the conceptual difference shows up:

single-hop  multi-hop  temporal  open-dom  adversarial
tuned MMR              0.3479     0.6377    0.2938    0.6144     0.4705
CFS-long               0.3615     0.6376    0.2959    0.6157     0.4734
CFS-R top50 w3         0.3646     0.6344    0.2948    0.6209     0.5018

The adversarial line is the result that matters: +3.13 pp over tuned MMR, +2.84 pp over CFS-long. If the adversarial problem were only pairwise diversity, MMR should be very hard to beat but it isn’t. That supports the main claim: long-memory retrieval is not just about avoiding similar chunks. It is about reconstructing the evidence behind the query. Temporal is no longer a glaring weakness either, CFS-long still slightly leads, but CFS-R has closed the gap while keeping the adversarial gains.

When to use CFS-R

CFS-R earns its place when answers are combinations rather than single hits “why,” “how,” “what happened with,” “what changed,” “what did we decide.” Long conversational corpora with repeated topics. Cases where cosine over-selects topical filler. Cases where MMR helps but feels too shallow or pairwise. Cases where adversarial or misleading-neighbor retrieval matters.

It’s especially useful when the top few context slots matter. CFS-R improves NDCG@5 strongly, which means it improves the part of the ranking most likely to make it into the model’s actual context window.

When not to use CFS-R

Purely timestamp-sensitive tasks - CFS-R reconstructs semantics, not chronology. Tiny candidate pools, sparse non-redundant corpora, or workloads where exact lexical lookup dominates. Anywhere you actually need a truth validator rather than a retriever.

And, like CFS, it isn’t meant to be deployed alone. The deployable form is RRF(cosine, BM25, CFS-R). In the v1 setup the CFS-R leg gets extra weight because the reconstruction signal is strong enough to deserve more influence.

Closing

CFS-R is a retrieval operator. Like CFS, it adds a small amount of code on top of a standard retriever. The shift is conceptual: don’t only rank candidates by how much they resemble the query but rather rank them by how much they help reconstruct it.

Cosine answers: what looks closest?
MMR answers: what is close but not too redundant?
CFS answers: what region has not been covered?
CFS-R answers: what memories, together, rebuild the query field?

That last framing is the one that matters for long memory. When people ask questions about their history, they usually aren’t asking for a single nearest sentence. They’re asking for the evidence behind a situation, a decision, a change, a conflict, or a pattern. That evidence is distributed. CFS-R is built for that.

MMR diversifies. CFS subtracts. CFS-R reconstructs. That is the operator.

Reference repo: https://gist.github.com/M-Garcia22/542a9a38d93aae1b5cf21fc604253718

CFS: Conditional Field Subtraction

Mauro Garcia — Wed, 06 May 2026 18:47:19 GMT

selects relevant candidates by penalizing regions already covered by previous picks.

The failure mode that started this

Run cosine top-K over a long conversation and watch what comes back. You’ll see something like:

Query: "What did I say about the kitchen renovation budget?"

Top 5:
1. "We decided to spend around $40k on the kitchen."     ← good
2. "I told her our kitchen budget was about $40k."       ← repeat
3. "Yeah, $40k is what we're putting toward the kitchen." ← repeat
4. "The kitchen budget came out to roughly forty grand."  ← repeat
5. "Cabinets are going to eat half the budget."          ← new info

Four of your five context slots got eaten by the same fact reworded four times. The piece of evidence that would actually help you reason “that cabinets are half the spend” is buried at #5, and if your top-K is 4, it’s gone.

This is paraphrase pile-up, and it’s one of the most expensive failure modes in conversational memory retrieval because it burns limited top-K slots on redundant evidence. It happens because cosine similarity can over-reward repeated surface forms and nearby paraphrases of the same fact.

Conditional Field Subtraction (CFS) is an operator designed for the broader version of this failure mode: redundant-neighborhood collapse, where a retriever over-selects from one dense region and misses useful evidence elsewhere.

The core idea

Most retrieval scoring asks: how relevant is this candidate to the query?

CFS asks a different question: given what I’ve already selected, what shape of evidence is still under-covered, and which relevant candidate best fits that gap?

The intuition is geometric. Imagine your selected memories as points in embedding space. Around each selected point, draw a soft sphere
(Gaussian field) representing “the region this memory already covers.” Sum those fields. The result is a coverage field over the embedding space. Wherever the field is high, you’ve already got that area covered. Wherever it’s low, you have a gap.

CFS prefers the next candidate that is (a) relevant to the query and (b) sitting in a less-covered region of the candidate space.

The math

CFS is greedy. At each step t, given the already-selected set S_{t-1}, it picks the next candidate by:

Three components, each doing real work:

1. log f(c) — log-relevance to the query.

f(c) = cos(z_q, z_c) is standard cosine similarity. Taking the log is the change. Why? Because the penalty term is going to subtract from this, and we want the penalty to scale multiplicatively in the original space:

exp(log f(c) − β · Σ K_σ) = f(c) · exp(−β · Σ K_σ)

A high-relevance candidate doesn’t get linearly slaughtered by a fixed redundancy discount. The penalty scales with how much you’ve already covered.

2. Σ K_σ(c, s) <-> summed kernel field.

K_σ is a Gaussian: K_σ(c, s) = exp(−‖c − s‖² / 2σ²).

The sum over selected items is the heart of the operator. Compare this to MMR (Carbonell & Goldstein 1998), which uses max:

MMR: max over s ∈ S of sim(c, s)

Max is local. A candidate sitting in a tight cluster of five paraphrases can receive roughly the same redundancy signal whether one or all five paraphrases are already selected. After you’ve picked one, MMR can still fall back toward relevance among the remaining paraphrases, which is exactly what caused the duplicate pressure in the first place.

Sum is cumulative. Every selected neighbor in the cluster adds to the penalty. The first paraphrase is mildly redundant; the fourth is much more expensive. Dense clusters tend to break apart instead of consuming the whole top-K.

3. β and σ — the two knobs.

- σ is the bandwidth. Small σ → narrow Gaussians → only near-duplicates penalize each other. Large σ → wide Gaussians → loosely related items also push each other apart. I use σ = 0.40 in unit-norm embedding space.

- β is the penalty strength. β = 0 recovers pure relevance. β → ∞ recovers pure dispersion. I use β = 0.20.

These two parameters define a coverage field, not a similarity threshold. That’s the conceptual shift.

Why “Conditional Field Subtraction”

The name is descriptive, not poetic.

Field: the summed Gaussian kernel Σ_s K_σ(·, s) is literally a scalar field over embedding space — a smooth function that says how covered each region is.
Subtraction: we subtract this field from the relevance score before maximizing.
Conditional: the field is conditioned on the current selection S_{t−1}, which means the score of a candidate depends on what’s already been picked. There is no fixed score per candidate. The score changes as the selection evolves.

That last point matters. CFS is not a reranker over a fixed score. It is a sequential decision rule where each pick reshapes the field for the next pick. This is what makes it useful on dense redundant neighborhoods: the field deepens around the first selected item, pushing nearby repeats down for every subsequent decision.

How CFS behaves

A few intuitions worth internalizing:

On a dense paraphrase cluster: CFS tends to pick one strong representative, then penalizes nearby repeats. The first item may look great; by the third pick, the cluster is being pushed down by an aggregated kernel sum.

On unrelated candidates: K_σ falls off exponentially with distance. Two unrelated memories barely interact. The penalty is local.

On marginally related candidates: This is the interesting case. A candidate that’s near but not inside a cluster gets a moderate penalty. It will lose to a more-distant alternative if both have similar relevance, but it will beat a clearly-redundant candidate. This is the regime where CFS earns its keep.

On the first selection: S_0 = ∅, so the penalty is zero, and CFS picks pure max relevance. The diversification only kicks in from the second pick onward.

On adversarial inputs: A candidate sitting at the centroid of a tight cluster gets a brutal penalty, even if it’s the most relevant single item. This is sometimes wrong — if the cluster is tight because everyone agrees on a fact and that fact is the answer, CFS will pick a representative and move on. That’s usually correct for memory retrieval (you only need the fact once) but occasionally costs you on questions that need redundant confirmation.

CFS as part of a hybrid retriever

Standalone CFS is interesting but not the deployable form. The deployable form is rrf(cosine, BM25, CFS) — Reciprocal Rank Fusion over three independent rankings:

score(c) = Σ over r ∈ {cos, BM25, CFS} of 1 / (60 + rank_r(c))

Each leg covers a different failure mode:

Cosine captures semantic match.
BM25 captures lexical/exact-term match (named entities, dates, numbers — things embeddings smear).
CFS captures coverage pressure: among relevant candidates, which ones occupy regions not already covered by the current selection?

The three rankings disagree often, which is exactly what RRF wants. RRF is order-only — it doesn’t care about score scales — so the log in CFS doesn't fight the sigmoid in normalized BM25 or the raw similarity in cosine.

This shape is what I evaluated, and it’s what I’d recommend deploying.

Does it work? — empirical results

I evaluated the deployable form of the operator, rrf(cosine, BM25, CFS), on LoCoMo raw-turn retrieval: 1,982 questions across 10 long conversations, using text-embedding-3-small embeddings.

The comparison included plain cosine, cosine+BM25 RRF, perturbed-cosine third-vote controls, and a reproduction of mem0’s published additive-fusion scoring algorithm.

Results on retrieval ranking:

baseline cosine top-K: NDCG@10 0.5123, Recall@10 0.6924
mem0 additive fusion: NDCG@10 0.4903, Recall@10 0.6625
rrf(cosine, BM25): NDCG@10 0.5196, Recall@10 0.6989
rrf(cosine, cos2, BM25): NDCG@10 0.5278, Recall@10 0.7060
rrf(cosine, BM25, CFS): NDCG@10 0.5311, Recall@10 0.7168

Against mem0’s additive fusion, rrf(cosine, BM25, CFS) improves retrieval ranking by +4.08 pp NDCG@10 and +5.43 pp Recall@10. Against rrf(cosine, BM25), adding CFS contributes +1.15 pp NDCG@10 and +1.79 pp Recall@10.

The honest ablation matters: not all of that lift is pure diversity. A perturbed-cosine third vote recovers part of the gain. But CFS still adds +1.08 pp Recall@10 above the perturbed-cosine control, which is the clearest sign that it is surfacing useful candidates the cosine-correlated controls miss.

To test the paraphrase-pile-up mechanism more directly, I isolated the single-hop queries where the failure mode was actually present. The harmful-pile-up subset contains 128 LoCoMo single-hop queries where cosine top-10 missed at least one gold item ranked 11–50 by cosine, while the retrieved top-10 contained a redundant non-gold pair with similarity ≥ 0.75. In that subset, CFS improved evidence-cluster recall over every cosine-correlated control: +1.81 pp over duplicate cosine, +1.67 pp over cos2 ε=0.30, +1.51 pp over cos2 ε=0.50, and +2.66 pp over cos2 ε=1.00, all with bootstrap confidence intervals excluding zero. That is the mechanism this article is about: CFS is not merely producing another semantic vote; it is retrieving more distinct supporting evidence clusters when cosine pile-up is actually happening.

I also checked whether the improvement was just a ranking-leg budget artifact. With the ranking-leg budget equalized, long-CFS beat same-budget long-cos2 controls across ε=0.30, ε=0.50, and ε=1.00 on both NDCG@10 and Recall@10, again with bootstrap confidence intervals excluding zero. That closes the obvious “third vote mechanics” explanation: CFS is doing operator-specific coverage work inside the RRF stack.

By category, the full rrf(cosine, BM25, CFS) stack wins 4 of 5 categories against mem0’s additive fusion: single-hop, multi-hop, temporal, and open-domain. The diversity-specific advantage over the perturbed-cosine control is clearest on temporal and adversarial questions, while CFS gives up ground on single-hop questions where multiple paraphrases of the same fact may actually be useful supporting evidence.

When to use CFS

CFS earns its place when two conditions hold:

Your candidate pool has redundant neighborhoods. Conversational data, customer support tickets, meeting transcripts, anywhere humans say similar things multiple ways or revisit the same facts from slightly different angles. If your corpus is technical documentation with mostly disjoint chunks, CFS may add noise without much benefit
Your top-K budget is tight. If you can afford K=50, paraphrase pile-up is annoying but recoverable. If you’re squeezing into K=5 because of context budget or latency, every redundant slot is expensive and CFS pays for itself.

It’s a small operator. It adds maybe 30 lines of code on top of a standard retriever. The only state it needs is the embedding matrix you should already have.

When not to use CFS

A few situations where CFS is the wrong tool:

Questions that need redundant confirmation. If your downstream needs to count or aggregate (“how many times did the user mention X?”), CFS will give you one mention and call it done.
Some hard temporal queries. When time-adjacent memories are individually weak but collectively form an answer, breaking the cluster can destroy the signal. In the newer LoCoMo ablation, though, CFS helped temporal questions overall, so this is a trade-off rather than a blanket failure.
Sparse corpora with no real redundancy. The penalty term is approximately zero everywhere, you’ve added compute for no benefit, and the log-transform on relevance is mildly harmful at the margin.
As a single-leg retriever. Standalone CFS without cosine or BM25 alongside is brittle. The diversity term needs a strong relevance signal underneath it; if the underlying ranking is weak, diversifying weak candidates produces diverse garbage.

Closing

CFS is a small, named operator. It modifies an idea from 1998 in two specific ways and applies it to a recurring failure mode in modern RAG systems: redundant-neighborhood collapse. It works best when the retriever needs coverage across distinct evidence, and predictably less well when redundancy itself is useful signal.

What I think is worth taking from this isn’t the operator itself — it’s the framing: score candidates by the shape of what’s missing, not by similarity to the query alone. That framing generalizes beyond CFS and beyond memory retrieval. Wherever you’re filling a fixed-size context window with the most relevant items, you’re probably leaving signal on the table by ignoring what’s already there.

Conditional Field Subtraction is one way to stop ignoring it.

Reference Repo: https://gist.github.com/M-Garcia22/ff4ec80f5a08ca2fd9234bcc35804d1c

Real-Time Font Management in Expo

Mauro Garcia — Fri, 29 Nov 2024 13:31:29 GMT

I’ve cooked up a package that takes the headache out of font management in React Native and Expo.

Continue reading on Medium »

Lazy loading images with React Native

Mauro Garcia — Sun, 29 Sep 2024 13:16:13 GMT

Lazy loading images has been a staple in app development for years, boosting app performance and saving bandwidth by loading only the…

Continue reading on Medium »