Does your site have a poor search experience? Here’s what to do.
If you haven’t optimized your site for search accuracy, you should. This improvement is innovative and innovation is good!
If you’re reading this blog, you know that there is so much more than goes into a good UX than just visual design. Performance and accuracy are big parts of the picture. And as UXers, it is our responsibility to boldly go where no one has gone before.
Why Does It Matter?
- The top 14 most common queries account for 10% of all search activity
- Just making these top 14 perform well will improve the search experience for 10% of users
- If half of our site’s users use search primarily (as opposed to navigation), then 10% x 50% = 5% improvement
- Little improvements of 5% and 3% here and there add up quickly!
- Optimizing results for the top 42 queries will improve the search experience by 20%!
- If you are optimizing a support search experience, optimizing performance will greatly reduce support calls and allow support to respond more quickly to more difficult problems, provide more troubleshooting content, engage with the community, and share information with the UX team :).
- If you are optimizing a corporate site search experience, if potential customers can’t find the information they need, your competitors are only a click away!
- This first-hand troubleshooting knowledge will help us identify needs, prioritize our backlog, and solve the problems that will make the most difference to the most customers
- Decrease customer time and frustration spent trying to find information and troubleshooting
- SAVE MONEY
Search Analytics and UX, Unite!
By combining search analytics and UX, you can find out what is happening and why, creating a powerful decision-making engine for your company! Search analytics allows UX to focus on the problem areas during customer interviews and contextual inquiries.
There is a wonderful book that helped me on this journey, Rosenfeld’s Search Analytics for Your Site. This blog will show you what should be in a search optimization plan, as well as the various steps to take. For my own journey, I flowed out what the current search experience was versus what it should be, spent a lot of time in our search reports, identified ways to boost particular metadata, created synonym files, and worked with our support engineers to identify top content for each product and the top tasks of installation, upgrading, and troubleshooting.
So, let’s begin at the beginning.
How Does Your Search Work Now?
You obviously can’t figure out how to improve things if you don’t have a benchmark of how it works now. To create a benchmark, identify the top tasks your users want to do. That means you’ll have to visit your search logs and talk with your support engineers or other customer experts to understand top tasks and searches, as well as the optimal search experience, make some flowcharts, test, and crunch numbers. For my own journey, I identified the top tasks as mentioned, analyzed our search results to determine what the top searches were, then did a lot of manual testing across our main products to assess where we were, and also created Google Analytics dashboards for continual monitoring. I then worked with our support engineers to determine what the best results were for each product and main task and top search queries.
- First, make sure that you have reasonable results, which Rosenfeld defines as “results that don’t seem like they were selected by a crazy person”. :-D
- Perform relevancy testing
- Perform precision testing
- Visualize most frequent searches
- Set goals for at least the top 14 queries
- Don’t ignore zero result searches
- Look for patterns, usage of booleans
- Clean up search results
- Session, audience, and cluster analysis
This reveals outliers that are problematic and helps track overall search system performance over time.
- Delete queries that do not have obvious results from your relevancy test
- Start with queries that have “right answers”
- Test each query by recording where the best match ranked among search results
- Did it make the 1 result? Top 5 critical results?
Limitations to relevancy testing:
- Leaves out many queries that don’t have a right answer — queries that might be common and important
- Relies on guessing what would be right for searchers, so highly subjective measure
Precision testing measures the # of relevant search results/total number of search results, and tells you how many of the results are good ones. Look at precision of the top 5 results — these are the critical ones that a searcher would likely scan before giving up.
- Relevant (r)- ranking is completely relevant
- Near (n) — result is not perfect match, but clearly reasonable for it to be ranked highly
- Misplaced (m) — reasonable result, but shouldn’t be ranked so highly
- Irrelevant (i) — has no apparent relation to the query
- Strict — only relevant results are acceptable
- Loose — both relevant and near results were counted
- Permissive — relevant, near, and misplaced results were counted
For example, if of your search results, 2 out of 5 are r’s (relevant), 2 are n’s (near), 1 is m (misplaced), then you have a 40% strict precision rating, 80% loose precision, and 100% permissive precision.
Test where you are now versus where you are with each subsequent cycle or release:
Visualize Most Frequent Searches
Create word cloud of frequent queries (there are a lot out there — choose your favorite) to help everyone understand what’s most important to our users.
IP and date/time allow you to determine when a session started and ended. Then you can look at the queries during that session to see how the user’s search phrases change during the session and whether our search engine and keywords align with their terminology and expectations. You can also learn how narrow queries need to be before they succeed in retrieving useful content — or before the searcher gives up (and either just leaves your site forever or if they have a relationship with your company, calls support)
- IP address — sort by IP address first then by time to identify the beginning/ending of a session
- date/time stamp
- Need total values that disregard upper/lowercase and strip terms out of phrases to get clear view of what most frequently searched terms are
- Need to know how many unique queries there are
- Need to account for stemming, so install and installation and installing would be counted in the same total
- # of results
- whether a result was clicked or whether the user exited the site without clicking
These are also quite helpful:
- browser or app that sent the query — get client metrics and recognize robot crawlers
- page user was on when he searched
- categories and subcategories user filtered on
Many search engines treat the loading of a new search result page as the re-execution of the same query. If you are using pagination and the same query happens within a short time frame, it is likely just the user clicking to a new search page. Another good reason to switch to lazy loading!
As mentioned, optimizing for the top 14 queries (10% of your search activity) will improve the search experience for 10% of your users, while optimizing for the top 42 will improve your search performance by 20%.
Set thresholds for failure
- Top x should average y% on strict matches
- Top x should average y% on loose matches
- Top x should average y% on permissive matches
- Determine how often a best match shows up in the top 5 search results for a particular query. If it doesn’t show up, consider it a failure.
- If possible, determine how many results a user clicks into for each query — this is Selection Rate.
- If a result way down in the search result list is being clicked frequently, it should be moved up. Your users are telling you that it is the best match! It could also be tied into the “best bets” discussed later.
Don’t Ignore Zero Result Searches
To determine what content is missing and how to make sure that the existing content is easily findable, you need to ask the questions:
- Is there no content on this topic? Should there be?
- Is the relevant content mistitled?
- Is the relevant content poorly written?
- Does the relevant content terminology align with user terminology?
- Are we using jargon that customers aren’t familiar with?
- Try variants of each search term to make sure that there are no relevant documents. If there aren’t, create new content to fill this need.
Analyzing Search Results
- Pattern analysis: What patterns emerge when you “play” with the data? Can we use those patterns to determine what types of metadata and content are most important to our users?
- Are there surprises? Outliers?
During this process, you’ll:
- sample the content to get a sense of what is there
- group things that seem to go together
- sort things to find patterns
- iterate on the pattern analysis until you are satisfied with the results
No single pattern is the “right” one!
- Clean up your search results — remove empty queries, repeat queries, and internal testing results. This is detailed later.
- Analyze interesting queries — names, product IDs, dates, error messages, log file lines, etc.
- How do sessions that include these types of queries change throughout the session? This can give us an understanding of what the user was looking for, and if we determine that is is important, allow us to optimize our search engine for that query.
- Boolean operator analysis: How many search phrases use AND? OR? NOT? Wildcards?
- Failure analysis: What can we learn from searches that return no results? How can we fix those problems and improve search performance? Failure analysis is detailed later in this document.
- When analyzing sessions, are there signs at the beginning of the session that these will be failures?
- Session analysis: What happens during a search session? How do searchers’ needs and understanding of the content change as they search?
- Audience analysis: How might we discover the differences between audience segments and their information needs? How might we better address these differing needs?
- Cluster analysis: Some queries may have just one term, not enough to guess what the user was looking for. However, by analyzing other search phrases that contain that term, we can identify “best bets” and display those first to the user. (In the SearchCount.exe program I’ve created, we can do this!)
- Use your existing keywords from your most frequent queries and keyword analysis. Identify best bets for these keywords. Optimize your search.
- Which best bets come first? Again, we look to our data. If people searching a specific product name are three times more likely to want to download files and read a user guide rather than read news & events, then the download and user guide “best bets” should come first.
- Auto-complete keywords from most frequent queries and keyword analysis
- Identify metadata attributes and content types
Card Sorting and Beyond
Perform card sorting with internal and external users to create and validate your content categories. See where you can logically group them. Perform statistical analysis on categories.
Export your results from your analytics application or use this PERL script: www.rosenfeldmedia.com/books/downloads/searchanalytics/loganalyzer.txt to parse them from your server log. Disclaimer: I didn’t use the PERL script.
- Create two columns in your spreadsheet — one for count and one for unique queries. Import or paste your data. Make sure that your total count for each term includes upper and lowercase values, such as SSL and ssl. Since the open-source search engine I was using didn’t group things regardless of case, I created a quick program to group terms case-insensitively. This will give you a much more accurate view of your top searches.
- Use this spreadsheet for analyzing or create your own: http://rosenfeldmedia.com/books/searchanalytics/blog/free_ms_excel_template_for_ana/
- Rank: How each query ranks in terms of frequency
- Percent: The percentage of overall search activity that each unique query is responsible for (out of all of your site’s search activity)
- Cumulative Percent: The percentages of all queries added up.
- Count: How often each unique query was searched.
- Unique Query: The query itself.
- If possible, provide a live link to the unique query for easy testing.
- Average number of terms per query
- Other information
- Copy and paste your count and unique queries into the spreadsheet
Look for Patterns
- Tonal patterns: Are users using jargon? Are we using jargon? Are users using abbreviations? Does user language match our expectations and the terminology we use in our content?
- Give your content authors a list of common name abbreviations to use when tagging relevant pages
- Synonym patterns: Understand how your users are searching for your content and determine how you need to change your content to meet their expectations
- Look for queries that seem synonymous and group them
- For each group, tally the individual queries’ percentages to determine a cumulative percentage for that cluster
- When creating a set of metadata, you may need to determine “preferred terms”, variants that are most appropriate to use.
- Time-based patterns: Do searches vary depending on whether it’s a weekday or a weekend? This might not be as relevant to us, but we can find out!
- Question patterns: Hunt for patterns that seem to describe question types that searchers seem to be asking
- Categorize the kind of question or need the query represents, such as: Task, Product/Service, Version only, SP only, Platform, Database, Programming language, Protocol, Security, Content types, Error
- Categorize the gaps in your searchers’ knowledge
- When new categories emerge as you move through the data, return to previous queries to make sure that these categories work for earlier queries as well
- You may find that you can merge similar needs, such as platform and database
- Add the percentages associated with each need or question type (add up all x’s in the column). This helps you see what need or question type is most important to our users.
- Answer patterns: Look for patterns that describe the kinds of answers — in terms of content types — that searchers may hope to find
- Review your common queries and query groups, but this time ask a different question: “What kind of content would searchers want when they searched this term?”
- Review your content types to ensure that they align with users’ expectations. We are doing this to supplement the card sorting categorization we already performed.
- Determine what types of content are connected to common queries. This will help you make better decisions about which content types to address first in a content migration plan.
- Kristina Halvorson’s Content Strategy for the Web is a good source for creating a content strategy
Finding Patterns in the Long Tail
After optimizing the “short head” results, you can turn to the “long tail”. The long tail includes the queries that do not have many results, but companies such as Amazon have made their fortunes addressing by providing books and products that fall into this “long tail”.
- What content types do you see in this long tail? What percentage of queries can be grouped into these content types?
- Are there typos? Identify common typos and provide the expected results, like Google does. Depending on your search engine, you may need to create synonym files to group common typos with their correct spelling, or abbreviations and acronyms with the full names.
- Implement spellchecking
Anti-Pattern Analysis: Surprises and Outliers
If you have queries that don’t seem to fit into any sort of pattern, study it intensely. It may be the clue to another pattern. Or it may be truly strange and worthy of investigation. You can do a random sample from the long tail to see if there are many similar queries like this one.
Don’t Leave Users Hanging With No Search Results
If they perform a query that doesn’t provide any results, try removing words from the query. If there are related most frequent searches and best bets, suggest those as a way for the user to keep going.
- Allow users to expand search results if there were too few
- Allow users to narrow search results if there are too many
Design Search Results For Synonyms, Acronyms, Variants, and Common Misspellings
Make sure your search is looking for:
- product or service name synonyms
- acronyms and full names (SSL and Secure Sockets Layer)
- Does your company have a list of acronyms? I’ll bet they do! You can also get this from product documentation, glossaries, and indices.
- terms and common misspellings (look through your data to identify more common misspellings)
- search strings included in phrases with no spaces
- proper nouns (names of objects)
- error messages
Analyze User Behavior For Each Data Category
Users have different needs when looking for documents, articles, downloads, announcements, and other information. By analyzing the top results in each category, we can determine best bets for each category.
Clean Up Our Search Results
- Eliminate Search Log Junk
- Remove empty queries — these can skew response time and most frequently searched numbers
- Remove repeat queries — if the search term or phrase changes slightly, then the user is usually navigating to the next page. If the query and parameters are identical, it is usually junk and should be removed
- Remove Internal Testing From Search Reports
Failures are very important. They tell you what your searchers are trying to do and what they expect to be on your site.
Zero Result Queries:
- Instead of showing that there were no results and thus giving your customer a dead end, provide partial matches by removing some of the user’s original keywords.
- Match only categories or facets. Do not use the keywords.
- Display top search results in each content area for the products/services the customer has access to.
Queries That Fail to Retrieve Useful Results
- Our searchers are not willing to scroll and click through long lists of search results, especially if they are not logically ordered
- They are impatient, forgiving and lazy — they want the right results at the top of the first page. The first page should contain as many relevant results as possible. Anything else should be considered a failure.
Is There A Content Disconnect?
As a company grows and acquires other companies, merging and aligning content titles and metadata is often neglected. If that’s the case for your company, your content managers will have to do a lot of cleanup, and you should implement a content management plan for future acquisitions, including:
- Standardizing titling guidelines and communicating this to all teams
- Use standard terminology across products
- Create and revise and align metadata used to describe the content
- Show content writers the kinds of terms users are using to search, how those terms differ from the language in the documents, and how that disconnect leads to their content not being found.
Monitor Important Queries
Focus on queries that are relevant to your business and monitor them for both volume and a rate of growth. Ultimately, this can save a lot of money on conventional product and market analysis.
Analyze Queries That Cause Users To Immediately Exit Your Site
Search exits are when searchers leave the site after searching, but without clicking on any results.
- Identify queries where users exit without clicking any results — these queries are failing our users (and our company!)
- Review the search exits on a regular basis to identify any new queries where content needs to be honed or modified.
Remove Internal Testing From Search Reports
To remove internal testing results which will skew search reports, each internal tester should begin their testing with:
<first name> startsearchtest
and end with
Using these particular phrases, “startsearchtest” and “endsearchtest” will allow the people analyzing the search results to search for these values (which would rarely be used by any other internal or external user), then remove any searched terms between these two terms. The output in a search report that was just sorted by date/time would look something like this:
Strengthen Our Personas
In addition to the customer interviews and usability testing from which your UX team and company derive information on user needs, goals, pain points, and other useful information, search analytics provides us with user intent: what users are searching on, what problems they’re encountering with your products and services, and what answers and content they expect to find. You can then strengthen our personas with this information on their common needs and tasks.
Another way to strengthen our personas is to use search analytics to segment your audience, such as including common search queries for each persona to illustrate common information needs and interests.
Segment Our Audience
Your users are not all alike. Let’s use your data to figure out the differences.
If you have information that will let you determine a unique identity for the user (as in an authenticated site), then you can get even more valuable information:
- if the user is a first-time user
- how often the user visits
- where the customer is in the product & service lifecycle
- where the user is located. How do these regions map to your sales territory? To your developer campaigns?
- language — you can break down the audience by specific language code.
You could then determine different needs across cultures and geographic areas and optimize for them. You never know what valuable information you will find unless you start analyzing! Oh, the places we’ll go! :-D
Ratings and Relevancy
If you don’t already have a ratings system in place for your content, you should see about implementing one. You could go with a thumbs up/thumbs down approach or a five star approach. Not only will this give you feedback on your content quality (be sure to include a comments box when someone provides a rating), but will also show you exactly which content is most important to the majority of your users.
For 1–3 stars, it’s probably that something needs to be improved. But what is it? Only the user can tell us, so you need to provide a way for them to do so. For a thumbs-down, why doesn’t this article meet your needs? What can your company do to improve their experience?
This will also help you to show relevancy to your customers, as in “5 out of 6 users found this article relevant” or “90% relevancy”. The success of this will depend on the targeted audience — if it is a finite set like application developers, API developers, integration developers, and system engineers, for example, then this could more helpful than if you are dealing with all customers and potential customers visiting your corporate site, or like Google, the entire world.
Measure Users’ Expectations
You can also gauge users’ expectations by simply asking them. Providing a link on your site to an always-open simple questionnaire will allow you to gain even more information on what customers need and want from your company.
Measure Product Improvement
If certain queries, such as searches on particular errors, decrease over time, that tells you that your products and services are improving! Or at least, that your articles, community, documentation, and other content have provided workarounds and your users have been able to find and successfully implement them!
Identify Desire Paths (Where Do Users Want To Go From Here?)
Are users expecting to be able to find related information on a particular page? For example, if they’re looking at a user guide, provide them with related KB articles, related blogs and community posts, etc.
Put YOur Metadata Through the Query Test
- Choose a manageable number of most frequent queries, say, 25.
- See if those queries have synonymous metadata values.
- Are there any queries that do not have similar metadata terms? These could be potential gaps in your metadata vocabulary that you need to fill.
- Measure your frequent queries’ relevance and precision.
- If your metadata keywords don’t perform as well as the frequent queries that they are synonyms of, you should review the keywords and possibly replace terms.
Use Reverse Lookup to Identify New and Problematic Terms
- Start with a small list of important documents, say 20. These could be most popular, most frequently searched, or that after your analysis, you decide are most important to our customers’ needs.
- List queries that retrieve those important documents. If you don’t have an analytics application that can handle clickstream analysis and SSA, you have to be willing to parse data and crunch numbers manually.
- Determine if there are metadata that correspond to those queries.
Track Metadata Trends
Review queries on a regular basis to identify new queries that are becoming more popular. Is there a new problem that people have begun encountering? A new business need that is becoming ubiquitous? Use these terms to develop new keywords.
Standardize on Content Guidelines
Make sure that your writers are using standard keyword tagging, style, and titling of documents to ensure that their content is easily findable. Standardize on other types of content, such as blog posts, community posts, KB articles, downloads, etc.
Which Content Should Be Removed?
By now, you’ve probably already found that a large percentage of your content is redundant or outdated, and is not being accessed at all or only by a few people. Often this content is too old or does not have a clear owner. If important queries don’t find this content, it is probably not relevant.
Get The Whole Picture — Who Knows What?
Look at the people in your company. Who knows what in each department?
- Most departments interact with users to some extent and all have useful information. That information is even more useful when it is combined. Dig for more information with your support, marketing, sales, professional services, R&D, product managers, and whoever else should be involved in making a great and optimized user experience.
Find out more about how each department works:
- What types of data do they own?
- What sorts of tools and approaches do they use to learn from their data?
- How do they interact with customers, what types of data do they collect, and how can you use it. For example, Marketing interviews customers, product managers interact with customers to understand needs and pain points, professional services and support interact with customers on a daily basis.
- What types of insights do they use this data for?
- Which decision makers use this data?
- Who’s doing the actual work? What are their backgrounds? Can they and will they collaborate with you?
Developing a complete picture of your research needs will help you to:
- eliminate redundancies
- identify gaps and correct imbalances in the tools, methods, and data that currently drive your decision making
- help everyone understand your organization’s challenges together and act on them together. Otherwise, you will be executing incomplete solutions to incomplete problems.
- Show the relationship between these different research inputs. For example, draw a line between “frequent search queries” and “regular task analysis”.
What Gets in the Way of SSA?
- Lack of awareness
- Technical hurdles — IT people are too busy to parse log files. May need developer’s help in writing ad hoc queries and creating pivot tables from data
- Political hurdles
- Legal hurdles
- Lack of data
- Lack of analytics tools
Integrating Site Search Analytics and Optimization Into Your Process
By seeing SSA as part of your ongoing work, such as 5% of a normal week, rather than a one-off project, you will be able to continually improve your users’ experience and ensure that it keeps up with evolutions in your users’ needs.
SSA is scalable. You can spend just 15 minutes per month looking over simple reports:
- most frequent queries list
- null results query list
You’ll always get something useful out of your analysis. If you grow the tuning by 15 minutes per month, you will have continual improvement.
Review your search reports and test frequent queries. If your search engine doesn’t group upper and lowercase identical queries, create a program or have a developer create one that does. Input these queries and get a complete picture of what phrases included the same search term. Check whether the results are expected. If not, you may need to review your search engine’s ranking algorithm.
Once you’ve gotten this far, you’re in good shape and your users should be very happy! I’ll be adding some sample charts to show how I optimized search in the next day or so. <genericized flow>, <genericized ES boosting>, <genericized search results>, <genericized overall performance>
Further resources to parse out and analyze information:
- Perl script for parsing: http://rosenfeldmedia.com/books/searchanalytics/content/code_samples/
- Spreadsheet for analyzing: http://rosenfeldmedia.com/books/searchanalytics/blog/free_ms_excel_template_for_ana/
Recommended KPI and web analytics resources
- The Big Book of Key Performance Indicators (Eric Peterson)
- Web Analytics: An Hour A Day (Avinash Kaushik)
- Spending Quality Time with Your Search Log. UIE. Spool. http://www.uie.com/articles/time_search
- Search Analytics’ companion web site: http://rosenfeldmedia.com/books/searchanalytics/
- The Big Book of Key Performance Indicators (Eric Peterson)
- Web Analytics: An Hour A Day (Avinash Kaushik)
- Diagrams from book: www.flickr.com/photos/rosenfeldmedia/sets/
- Zipf Distribution
- Other links in the book that might be interesting:
- Google Analytics — how to make it return null reports: http://cutroni.com/blog/2009/09/08/tracking-zero-result-searches-in-google-analytics
- Difference between KPIs and metrics; http://visualrevenue.com/blog/2008/02/difference-between-kpi-and-metric.html
- CMU KPI: www.planning.cmich.edu/kpis.shtml