How Search Engines Use Graphs

Atman Naik
9 min readJan 5, 2022

--

Figure 1 How Search Engines Use Graphs

Graph-based search is a new way of data and digital asset management initiated by Facebook and Google.

Web-based search results provide important information that you may not have specifically asked for — it provides useful and targeted search information, allowing you to quickly categorize highly interesting data points.

The key to this advanced search capability is that in the first question, a graph-based search engine considers the entire structure of the available data. And because graph systems understand how data is related, they return the richest and most accurate results.

Think of a graph-based search as a “conversation” with your data, rather than a series of searches. Search and discovery, rather than search and retrieval.

In this series of “Graph Databases in the Enterprise”, we will explore the most effective and efficient use of graphical technologies in the world’s leading organizations. Recently, fraud detection, real-time search engines, core data management, network and IT operations and identity and access management (IAM) have been examined.

Figure 2 Challenges

Challenges in Graph-Based Search

As with advanced technology, graph-based search faces challenges. Here are some of the big ones:

Size and Connection of Asset Metadata

The usefulness of digital assets increases with the associated rich metadata that describes assets and their interactions. However, adding additional metadata increases the complexity of management and asset search.

Real-Time Query Functionality

The power of a graph-based search system lies in its ability to search and retrieve data in real time. However, cutting down on complex and highly connected data in real time is a major challenge.

A Growing Number of Data Nodes

With the rapid growth in the size of assets and their associated metadata, your application needs to be able to accommodate both current and future needs.

Figure 3 Use of Graph Database

Why use a Graph Database for a Graph-Based Search?

Graph-based search would not be possible without a graphical site to enable it.
In short, graph-based search is smart: You can ask the most accurate and helpful questions and come up with the most relevant and meaningful information, while traditional keyword-based search results in random, refined, and low quality results.
With a graph-based search, you can easily query all your linked data in real time, then focus on the given answers and start a new real-time search based on the information you have received.

Graph database makes advanced search and detection possible because:

  • Businesses can edit their data as it happens and conduct searches based on their natural structure. Graph data provides a model and question language to support the natural structure of the data.
  • Users get fast, accurate search results in real time. With a graphical website, a variety of rich metadata is provided in all content for quick search and retrieval.
  • Data designers and developers can easily modify their data and structure and add a variety of new data. The built-in flexibility of the graph site model allows for very rapid changes in search capabilities.

In contrast, information stored on related websites does not change much in future changes: If you want to add new types of content or make structural changes, you are forced to redefine the relationship model in a way you do not need about the graph model.
The graph model is much more flexible and faster than 1,000 times a related website when working with connected data.

Figure 4 Example of Google and Facebook

Example of Google and Facebook

In their early days, both Facebook and Google provided a basic “keyword” search, in which users could type a word or phrase and return a list of all the results that included those keywords.
This method relies on transparent pattern recognition, and many users find it difficult to redefine search terms over and over until the right result is obtained.
The Facebook people website and the Google information website have one thing in common: They are both built using graph technology. And in recent years, Google and Facebook have realized that they can make the most of their great searchable content, and each has introduced graph-based search tools to take advantage of these marketing opportunities.
Recognizing the limitations of keyword searches, Google launched the “Information Graph” in 2012 and Facebook followed its service “Graph Search” in 2013, both of which provide users with more status information in their search.
As a result of these new services, both businesses have experienced tremendous growth in consumer engagement — hence commercial success.
Following in the footsteps of giants like Facebook, Google and adidas, new startups like Glowbl and Decibel — and many others — have also developed graph-based search tools to find new business information, launch new products and services and attract new customers.

Figure 5 PageRank Algorithm

Page Rank Algorithm

PageRank (PR) is an algorithm used by Google Search to rank websites in their search engine results. PageRank was named after Larry Page, one of the founders of Google. PageRank is a way of measuring the value of a webpage. According to Google:
PageRank works by calculating the number and quality of page links to determine the critical value of a website. The basic premise is that the most important websites may get additional links from other websites.
It is not the only algorithm used by Google to order search engine results, but it is the first algorithm used by a company, and it is well known.
The above average is not used in most graphs.

Algorithm

The PageRank algorithm generates a wide range of opportunities that are used to represent the chances of a person clicking links randomly to any particular page. PageRank can be calculated by collecting documents of any size. It is thought in several research papers that distribution is evenly distributed among all literature in the collection at the beginning of the calculation process. PageRank statistics require a few exceptions, called “iterations”, in order to adjust the approximate PageRank values ​​to closely reflect the true value of the theory.

Simplified Algorithm

Take the place of the four web pages: A, B, C, and D. Links from page to page, or multiple outbound links from one page to another, are ignored. PageRank starts at the same value for all pages. With the original PageRank method, the total PageRank over all pages was the total number of pages on the web at that time, so each page in this example will have the first 1 value. However, the latest versions of PageRank, as well as the rest of this section, take advantage of the distribution between 0 and 1. So the initial value of each page in this example is 0.25.
PageRank transmitted from a given page to the target of its outgoing links in the next duplication is divided equally across all outgoing links.
If only the links in the system went from pages B, C, and D to A, each link would transfer 0.25 PageRank to A in the next iteration, at a cost of 0.75.

PR(A)=PR(B)+PR(C)+PR(D)

Suppose instead that page B has a link to pages C and A, page C has a link to page A, and page D has links to all three pages. Thus, in the first duplication, page B will transfer part of its already existing value, or 0.125, to page A and another part, or 0.125, to page C. Page C will transfer all its existing value, 0.25, to only one. link page, A. Since D has three output links, it will transfer one-third of its current value, or approximately 0.083, to A. Upon completion of this recurrence, page A will have a PageRank of about 0.458.

PR(A)=[PR(B)/2]+[PR(C)/1]+[PR(D)/3]

In other words, the PageRank conferred by an outbound link is equal to the document’s own PageRank score divided by the number of outbound links L( ).

PR(A)=[PR(B)/L(B)]+[PR(C)/L(C)]+[PR(D)/L(D)]

In the general case, the PageRank value for any page u can be expressed as:

Figure 6 PageRank Value

That is, the PageRank value of page u depends on the PageRank values ​​per page v contained in the Bu (set containing all pages linked to page u), divided by the number L (v) of links from page v. The algorithm includes something that weakens the calculation of PageRank. It is similar to the income tax that the government imposes on one even though it pays for itself.

Figure 7 Google Search Algorithm

Google Search Algorithm

Google is the world’s most popular search engine. Their search engine usually owns more than 90% of the market, resulting in about 3.5 billion searches in their daily site. Although notorious for their algorithm performance, Google provides a high level of content on how they prioritize websites on the results page.
New websites are being created every day. Google may find these pages following links to existing content that they previously searched for, or if the website owner submits their site map directly. Any updates to existing content can be re-submitted to Google by requesting it to retrieve a specific URL. This is done with Google Search Console.

While Google may not specify how many times sites are searched, any new content linked to existing content will be available later.
Once web searchers have collected enough information, they return it to Google for listing.
Reflection begins with analyzing website data, which includes written content, images, videos, and site-based technology. Google looks for negative and negative traits such as keywords and new website content to try to understand what they are crawling about.
The Google website index contains billions of pages and 100,000,000 gigabytes of data. To organize this information, Google uses a machine learning algorithm called RankBrain and a knowledge base called Knowledge Graph. These all work together to help Google provide as much relevant content as possible to users. When the identification is complete, they proceed to the measurement action.

Everything that happens so far is done in the background, before the user encounters a Google search function. Level is an action that occurs based on what the user is searching for. Google considers five major factors when someone searches:

  • Query meaning: This determines the purpose of any query for any end user. Google uses this to determine exactly what a person wants when they search. They analyzed each question using complex language models based on previous searches and behavior behaviors.
  • Web page relevance: Once Google determines the purpose of a user’s search query, it revises the content of the web rankings to determine which one is most important. The main driver of this is keyword analysis. Website keywords should match Google’s understanding of the user’s query.
  • Content quality: With matching keywords, Google takes the initiative and updates the quality of content on required web pages. This helps them to prioritize what results come first in terms of the authority of a particular website and its page rank and youth.
  • Web page usability: Google prioritizes websites that are easy to use. Usability covers everything from site speed to response.
  • Additional context and settings: This step matches previous user searches and specific settings within the Google platform.
Figure 8 Google Search Result

Conclusion

For businesses with large products, content or digital assets, graph-based search offers a better way to make this data available to users, as the corporate giants Google and Facebook have clearly demonstrated.
The important use of graph-based search in business is endless; Customer support sites, product catalogs, content sites and social media for a few.
Graph-based search offers many competitive advantages, including better customer engagement, more targeted content and increased revenue opportunities.
Businesses that fall into the power of graph-based search today will be much ahead of their peers tomorrow.

In case of any suggestion or query, please let us know in the comments below.

Blog by ~

Atman Naik || Aslaan Mulla || Sahil Mohite || Omkar More || Tejas Bhat

--

--