Amazon Kendra | Build your custom Search Engine using Machine Learning

Vaibhav Malpani
Vaibhav Malpani’s Blog
4 min readJan 6, 2020

In an enterprise, you would have a lot of files that have a lot of information in various formats like ‘PPT’, ‘PDF’, ‘TXT’ etc. but it’s just very hard to find it quickly when it’s most needed. Let’s say even if find the file, you will need to browse through a lot of pages to exactly find what you were looking for. Also looking for files in folders and browsing through it is not as natural as talking, right? (This will make sense as we move along)

One who is trying to find data from a chunk of files

Today we will learn how to extract information by naturally talking to a chunk of files

To get started:

  1. A lot of files (If you are an enterprise, you already have it :D )
  2. AWS account for using Amazon Kendra

For this demo, I did not have enough files so I created them. I took a lot of random words and extracted the data related to it from Wikipedia and saved them in separate text files using the below-shown code (will only work on python 3)

Problem statement

We will consider the above-extracted files as our chunk of files and try to make a search engine over these files using Amazon Kendra. It’s a highly accurate and easy to use enterprise search service that’s powered by machine learning.

Steps:

  1. Create Index
  2. Add Data Sources
  3. Test and Deploy

1. Create Index:

  • Give any index name that would help you recognize what data is present in that index.
  • In IAM Role, you can select an existing role or choose ‘create a new role’
  • Encryption of data can vary for everyone, in this case, I am not selecting the encryption option.
  • click on create after you are done with the above steps. It can take up to 30 minutes to create the index.
Screenshot for ‘create index‘
After the Index is created

2. Add Data Sources:

Data could be imported from the S3 bucket, SharePoint and Amazon RDS. For our demo, I have uploaded the wiki files on a S3 bucket.

Data source connectors
  • Click on ‘Add connector’ below the S3 bucket
  • Give a name to your data source
  • On the next screen ‘browse S3’ bucket and select the bucket which has your data files.
  • We will select ‘sync run schedule’ frequency as ‘on-demand’ for our demo. You could choose appropriately depending upon your use-case.
  • And at the end select ‘create’
  • After that, the below window will show up. Click on ‘Sync now’ to start the data syncing from S3 to Amazon Kendra.

3. Test and Deploy:

Click on ‘Search Console’ on the below-shown dashboard.

We can start with simple searches like this:

As you can see it can easily find files that have apple word in it.

Let’s see how the search reacts when we give it complex queries.

notice how the $265 billion is shown in a larger font

As you can see from the above result, Amazon Kendra understands natural language questions and gives the answer which is easy-to-understand.

Pricing:

Free tier:

The service provides free usage of up to 30 days of the Developer Edition (coming soon) from the time you create your first index.

Pricing table:

Customer story:

When Material scientists at 3M lead new research, they need access to information from prior relevant research information that’s buried in the many patents they hold in their huge knowledge base. Finding the right information is often exhausting (but not exhaustive) and time-consuming. To deal with this issue, they decided to use Amazon Kendra.

If you liked this post, please clap for it; follow me if you want more such posts!

--

--

Vaibhav Malpani
Vaibhav Malpani’s Blog

Google Developer Expert for Google Cloud. Python Developer. Cloud Evangelist.