I’m a big fan of opportunity analysis. I’ve seen far too many organizations underinvest in opportunity analysis and thus waste enormous — and avoidable — effort on experiments that produce negative or neutral results.
When I’m helping organizations address search challenges, I advocate strongly for more robust opportunity analysis to size problems and estimate the return on investment for proposed solutions to them.
But different search challenges call for different kinds of opportunity analysis. And sometimes opportunity analysis is so hard that it isn’t a realistic option. This post briefly explores the landscape of opportunity analysis for search.
Opportunity analysis for search can be either trivial, doable, or impractical.
Trivial. When opportunity analysis is trivial, you should just do it. Very often, a few hours — or even minutes — of running SQL queries against your logs can provide an upper bound that is sufficient to deprioritize a project and move on to something else.
Doable. When opportunity analysis is doable but non-trivial, it’s still worthwhile. But you’ll want to simplify the analysis to maximize your return on invested effort. These rough calculations can be difficult for analysts and engineers who are not used to cutting corners.
Intractable. When opportunity analysis is intractable, you have to use your instincts — and whatever evidence you have — to decide whether it’s worth gambling on an experiment. If you decide to gamble, make sure to invest the minimum effort needed to confirm or reject your hypothesis.
Here are some examples to illustrate the differences between problems for which opportunity analysis is trivial, doable, or intractable.
- Proposed Project: Build landing pages for top queries.
Opportunity Analysis: Compute the fraction of traffic that comes from top 100 queries.
- Proposed Project: Improve tokenization of model-number search queries.
Opportunity Analysis: Compute the fraction of traffic that comes from search queries containing both a letter and a number.
These analyses aren’t perfect — indeed, they are far from it. But each one requires only a few lines of SQL to produce a robust estimate from modest (e.g., 1-day log) amount of data.
- Proposed Project: Improve spelling correction.
Opportunity Analysis: Take representative samples of search queries that do and don’t trigger spelling correction. For each, compare the spelling correction results to those from a best-of-breed web search engine — either programmatically or through crowdsourcing . Treat the web search results as ground truth to estimate coverage and accuracy.
- Proposed Project: Improve recall using synonym expansion.
Opportunity Analysis: Take a representative sample of search queries that return low or no results. For each query, expand it using head tokens whose cosine with the query in a pre-trained word embedding space is at least 0.8, and see if doing so at least doubles the number of results. Use human judgements on a sample to estimate the impact on precision.
These kinds of analyses are significantly more involved than the previous ones and they may have even lower fidelity. But they still take a lot less time and effort than developing the potential improvements and testing them online.
- Proposed Project: Support natural language search queries.
- Proposed Project: Improve ranking by using a neural network model.
As the above examples illustrate, sometimes there’s no practical way to perform an opportunity analysis for a search project. In these cases, you have to decide whether to invest in a minimum viable experiment.
Opportunity analysis is a great way to invest a small effort upfront to avoid wasting far more effort on low-impact experiments. Many search problems are amenable to opportunity analysis, but the efforts and fidelity for those analyses will vary. When opportunity analysis isn’t a realistic option, you have to decide whether or not to gamble on a minimal experiment. But invest in opportunity analysis when you can.
And don’t be perfectionist about it. Rough, quick estimates are invaluable.