Everything changes extremely fast nowadays and it is very important to follow new trends and understand them. One of the buzzwords that I want to discuss today is elastic search. We will look at what it is, it’s main advantages, statistics, success stories and books.
What is Elastic Search?
Elastic search is an open source, broadly-distributable, readily-scalable, enterprise-grade search engine based on Lucene and released under the terms of the Apache License. It is Java-based and designed to operate in real time. It can search and index document files in diverse formats. It was designed to be used in distributed environments by providing flexibility and scalability. Now, elastic search is the most popular enterprise search engine followed by Apache Solr, also based on Lucene.
Elastic search is able to achieve fast search responses because, instead of searching the text directly, it searches an index instead. This is more or less like searching for a keyword by scanning the index at the back of a book, as opposed to searching every word of every page of the book. Elastic search can scale up to thousands of servers and accommodate petabytes of data. Its enormous capacity results directly from its elaborate, distributed architecture.
Elastic search is used for a lot of different use cases as well, for example “classical” full text search, analytics store, auto completer, spell checker, alerting engine, and as a general purpose document store.
Advantages of using elastic search
- Build on top of lucene
Elastic search is built on top of Lucene, which is a full-featured information retrieval library, so it provides the most powerful full-text search capabilities of any open source product.
Also it is good, because it is already familiar to developers.
- Full-text search
Elastic Search implements a lot of features, such as customized splitting text into words, customized stemming, facetted search, etc.
- Fuzzy Searching
A fuzzy search is good for spelling errors. You can find what you are searching for even though you have a spelling mistake.
- Autocompletion & Instant Search
Searching while the user types. It can be simple suggestions of e.g. existing tags, trying to predict a search based on search history, or just doing a completely new search for every keyword. This option is very popular on google. We type elastic search and google suggests elastic search benefits or elastic search success stories.
- Document- oriented
Elastic search is document-oriented. It stores real world complex entities as structured JSON documents and indexes all fields by default, with a higher performance result.
Elastic search is able to execute complex queries extremely fast. It also caches almost all of the structured queries commonly used as a filter for the result set and executes them only once. For every other request which contains a cached filter, it checks the result from the cache. This saves the time parsing and executing the query improving the speed.
Software development teams favor Elastic search because it is a distributed system by nature and can easily scale horizontally providing the ability to extend resources and balance the loading between the nodes in a cluster.
- Structured search
Elastic Search is schema free, it accepts JSON documents, as well as tries to detect the data structure, index the data, and make it searchable.
- Data record
Elastics earch records any changes made in transactions logs on multiple nodes in the cluster to minimize the chance of data loss.
- Query Fine Tuning
Elastic search has a powerful JSON-based DSL, which allows development teams to construct complex queries and fine tune them to receive the most precise results from a search. It provides also a way of ranking and grouping results.
- Restful API
Elastic search is API driven, actions can be performed using a simple Restful API.
- Distributed approach
Indices can be divided into shards, with each shard able to have any number of replicas. Routing and rebalancing operations are done automatically when new documents are added.
- Use of faceting
A faceted search is more robust than a typical text search, allowing users to apply a number of filters on the information and even have a classification system based on the data. This allows better organization of the search results and allows users to better determine what information they need to examine.
Often, you have multiple customers or users with separate collections of documents, and a user should never be able to search documents that do not belong to him. This often leads to a design where every user has his own index. More often this leads to have too many indexes. One larger Elastic search index is actually be better.
Success stories; facts & statistics
Elastic Search has been adopted by some major brands, like: Tesco, Linkedin, Foursquare, Facebook, Netflix, Dell, Ebay, Wikipedia, The guardian, New York Times, Salesforce, Docker, Orange, Groupon, Eventbrite and many others.
Let’s look at some of them to see their results:
Dell case study
Dell implemented elastic search to support e-commerce search for 60+ countries in 21+ languages. The search team at Dell has 30 members. They’ve seen the importance of search advance as consumer shopping expectations became more focused on instant gratification. Delivering exactly the result a consumer is looking to buy keeps them continually innovating and expanding the platform’s relevancy and personalization capabilities.
Several years ago Dell search commerce platform was experiencing aging pains, it was not responsive and a search engine did not support multi-tenancy, cloud readiness, etc. As a result it wasn’t horizontally-scalable and there were challenges creating and maintaining indices. The moment was right to modernize the search platform and meet the needs of contemporary e-commerce. They evaluated Solr, Google Search Appliance and other search engines, but ultimately narrowed down on Elastic search. Multi-tenancy, ease of scalability, relevancy of results, aggregations queries and being open source were the key enablers for going with Elastic search.
Dell has deployed two Elastic search clusters on Windows servers in Dell data centers. Search Platform is based on .NET framework. One is a search cluster that powers the search experience on Dell.com, and the other is an analytics cluster used to track search-related user activity on the site. The analytics cluster provides an ability to deliver a crowd sourced and influenced search results and also provides great insight into the usage of search platform.
The Dell search cluster contains an extremely comprehensive data set as it indexes everything on Dell.com, consisting of over 27 million documents which include all the products that can be purchased on the site, all the drivers for these products that can be downloaded, troubleshooting articles, knowledge-base documents, product manuals, videos and video metadata, etc. The product documents include all the information related to that particular product: the product title, its description, the image link, keywords, meta information for the technical specifications of these products (RAM size, processor type, resolution, etc), stock status so they know how many days it will take to ship the product, pricing information, department category, etc.
As for the Dell analytics cluster, which has currently more than 1 billion documents, indexes every click on Dell.com that comes from a search experience. Dell uses this data to analyze the top-performing queries, the top performing categories, and various other metrics to perform actionable, dynamic improvements to the site.
Also, in order to deliver accurate search results in all languages, Dell created extensive linguistic pipelines for each language. The pipelines utilize Elastic search’s language analyzers, stopword removal, spell check, synonym match, stemming, and other features to make the query more accurate. Dell also added a final step at the end of their linguistic pipelines that they call a catch-all influencer, which is essentially an offline aggregator that helps identify the entities from the query the customer entered. This aggregator runs across multiple systems, such as the content management system and their master lookup tables across various databases, and, depending on what the customer queried for in the search bar, maps the product category to the product category code, the manufacturer name to the manufacturer code, and so on and so forth. These inputs, enriched with analytics and customer identification data, are then passed to a probability engine and helps Dell re-write the final query. This context helps Dell significantly understand what the user is expecting when they perform a search.
Thanks to the real-time nature of Elastic search, as well as its powerful aggregations, Dell introduced a new feature called virtual assistant, which gives shoppers an interactive way to refine their search before clicking the search button by giving them a preview of their results. “If I type the term “laptop”, I can see that there are refiners available to narrow down my search, one of them being screen size, another being the processor type, and so on.
As Elastic search supports the creation of multiple indices, it provided a great ability for Dell Search Engineering to deliver more features based on Elastic search. For example, Dell was able to create an experimentation engine on their existing framework, which lets them easily test new features to a specific percentage of users and measure the impact before rolling them out to their entire deployment. This gives Dell a solid working hypothesis of the user’s rate of relevant results, leading to an increase in probability of buying the searched products.
As a result of the switch to Elastic search, Dell has seen increases in revenue per visit, click-through rate, average order value, conversion and positive customer satisfaction score. Also, now by using elastic search, Dell ensures the right people have the right access and permissions to their cluster in a live, customer-facing environment. By migrating to Elastic search, Dell reduced the number of servers they needed by 25–30%.
The Guardian case study
The Gurdian wanted to revitalizing the newspaper industry with real-time readership data. They faced a challenge of how to ensure that web content is properly presented and exposed to 5 million readers.
The Guardian’s in-house developed analytics system, enables users across the company, including editors, journalists, the search optimization team, and developers to see in real-time exactly how users are interacting with the content. In the news environment, which changes every minute, real-time visibility is invaluable. The Guardian leverages the data to ensure that content is given exposure at the right time, on the proper social media platforms, with the right headlines. Elastic search gave The Guardian the freedom to build a very powerful analytics system in-house, processing 40 million documents per day to deliver real-time visibility of site traffic across the organization. Now, a large portion of The Guardian’s business relies on Elastic search to understand how their content is being consumed.
The use cases for Elastic search at The Guardian are varied: the visibility afforded by the analytics system is used to see how many hits each content item receives; which headlines and content generate more traffic; where traffic is being referred from; which social media platforms to promote specific content on and when, to gain maximum exposure; and which links to provide the reader to click on next. Engineers are even using Elasticsearch to diagnose website performance issues by searching through events.
For the Guardian, responding to change in real time is critical. A significant portion of our traffic will get a lot of traffic in a very short time. In that type of circumstance, they need to be able to respond at its peak, and so they need to have the information right away. If we wait until the end of the day to see what’s happening, it would be too late. And elastic search provides this real-time visibility.
Elastic search helps leveraging real-time analytics, for example, easily query 360 million documents, see traffic for all content as it happens and gain insight into how updates impact site traffic. Also, it gives the entire organization real-time insight into audience engagement, democratize analytics access for more than 500 users and encourages a culture of exploration and innovation for all employees.
By using elastic search, The Guardian drive more page views because it helps to improve content, headlines and promotion in a variety of ways. And as a result it increases the number of pageviews and the site’s success.
What is also very important to mention, it empowers the team to get more involved, and take a proactive approach to improving the site and its content. It enhances user experience as well, by providing readers with more content that meets their demands, which enhances the user experience on the company’s website.
And of course, it improves site performance, by tracking how any changes impact site performance, diagnose issues and keep the site up and running at peak performance.
Docker had a challenge of how to deliver high-performance search across a continuously growing database without overloading operational resources. And IT department decided to use elastic search to easily find the right container for running distributed applications. Now, Elasticsearch really helps docker deliver a scalable, seamless, and highly available search and discovery experience to the growing Docker community.
Having made the decision to move to Elastic search, docker evaluated the available options for hosted Elastic search by looking at a variety of different criteria: location, number of indexes, available resources, high availability options and price. Elastic Cloud was the best option.
Consistent performance and reliability were key concerns for Docker, making Elastic Cloud’s dedicated Elastic search clusters a good fit for two key reasons. First, Elastic Cloud’s hosting model based on dedicated clusters with reserved memory and CPU, gave them assurances that their application would be consistently performant. Second, Elastic Cloud’s high availability options gave Docker added assurances that even in the event of a full data center outage their search database would remain available.
Moving to Elastic search in production affected the performance gains Docker was looking for. Load dropped and search latency and throughput were massively improved. Additionally, Docker was able to greatly improve search result quality by using Elastic search’s field boosting and function score queries to promote more popular and relevant search results.
With their new infrastructure Docker is able to serve better search results faster. For Docker this is critical; a tool built around providing power and convenience must also have supporting services which possess those characteristics.
With Elastic search, Docker found a solution on how to easily and cost-effectively scale a search application to meet growing volumes of data, ensure excellent search and discovery experience and manage operational complexities.
Orange had a problem, that every single vertical engine ran on its own set of hardware, including redundancy. This was expensive and difficult to manage, but even worse was that the complexity increased every time a new vertical engine, or new feature, had to be deployed. After living with this complexity for years, they realized that they needed to reduce the number of different technologies and improve their ability to quickly add new features.
Orange quickly came down to Elastic search mainly because of the consistent, comprehensive API and the fact that it was designed from its beginning to be elastic. Elasticity was one of the key requirements for their migration. And also they were selecting a technology on which all of their future vertical engines and features would be built.
Moving away from their legacy interface was easy because Elastic search JSON parsing and HTTP clients are easy to develop in almost every language. Moreover, Elastic provides client libraries for mainstream languages, which simplifies their interaction with Elastic search even more by hiding the low level of JSON parsing and HTTP interactions.
They are currently experimenting with Elastic search for more of their internal tools. For example, they are developing a tool to analyse the readability or “quality” of the 1.2 Billion URLs in their French web database and determine whether the URLs are readable and get a sense for the overall quality of the domain or host.
Today, Orange has 3 clusters, the biggest having 50 million documents on 20 virtual machines. The primary size of these indices is 150GB, and we’re able to process hundreds of requests per second with latency rates under 200ms, all while running on VMs rather than dedicated hardware.
Now it is possible to deploy new Elastic search indexes and clusters, it is easier for their internal teams to create new vertical engines, new features, and handle more data.
7 Elastic search books to read
Anyone just starting Elastic search needs to know what it is, how it works, and why to use it. With Elastic search Essentials you’ll get all of this condensed into 240 pages of introductory lessons and exercises. You’ll move into custom data modeling for handling intense queries over a search database. As elastic search is best learned through practice, the author offers a nice mix of theory and practice together in each chapter.
Elastic search in Action helps beginners with the core concepts and quickly pushes beyond this into more advanced situations. You’ll learn it through live examples. Along the way you’ll learn about batch searching and indexing results to optimize response times. Elastic search in Action teaches you how to write applications that deliver professional quality search. As you read, you’ll learn to add basic search features to any application, enhance search results with predictive analysis and relevancy ranking, and use saved data from prior searches to give users a custom experience.
Here’s another beginner-friendly book that requires absolutely no prior knowledge to get started.
With this book you will learn the basics of Elastic Search like data indexing, analysis, and dynamic mapping, query and filter Elastic Search for more accurate and precise search results.
Also, you will learn how to monitor and manage Elastic Search clusters and troubleshoot any problems that arise, configuration and creation of an Elastic Search Index. As well as using Elastic Search query DSL to make all kinds of queries, efficient and precise use of filters without loss of performance, implementing the autocomplete functionality, highlight data and geographical search information for better results and many more.
By studying how the Elastic search engine stores data you can learn a lot about search indexing and optimization. You’ll learn best practices for mapping strategies and how to handle document metadata for different search queries. You’ll find out how to use analysis and analyzers for greater intelligence in how you organize and pull up search results to guarantee that every search query is met with the relevant results. You’ll explore the anatomy of an Elastic Search cluster, and learn how to set up configurations that give you optimum availability as well as scalability. Also, what is good about this book is that you’ll find real-world solutions to help you improve indexing performance, as well as tips and guidance on safety so you can back up and restore data.
This book starts with design patterns for a new server running Elastic search. You will learn how to create a custom search engine for an ecommerce store and how to generate auto-populated search results like Google. Also, you will discover the power of Elastic search by implementing it in a variety of real-world scenarios. And learn how to not only generate accurate search results, but also improve the quality of searches for relevant results. You will find out how to generate real-time visualizations of your data using compelling visualization techniques, such as time graphs, pie charts, and stacked graphs and how to widen the scope of matches using various analyzer techniques, such as lower casing, stemming, and synonym matching.
To build scalable websites and work on big data projects you’ll need higher-level Elastic search experience. That’s why Mastering Elastic search is a valuable resource for developers with a deep interest in Elastic search applications. You will learn about Apache Lucene and Elastic Search design and architecture to fully understand how this great search engine works.
You will know how to design, configure, and distribute your index, coupled with a deep understanding of the workings behind it and learn about the advanced features in an easy to read book with detailed examples that will help you understand and use the sophisticated features of Elastic Search.
The Elastic search Cookbook contains 130+ different recipes for common setups, pitfalls, and basic extensions you can build on top of the Elastic search API.