Unofficial Handbook

Watson Discovery: Creating and Populating a Collection

Ingestion, crawlers, and connectors

bakagaijin
Technology Hits
Published in
8 min readApr 9, 2022
Photo by Mark Fletcher-Brown on Unsplash

Disclaimer: This is my own work and does not represent the viewpoint of my present employer or any past employer.

Back to Table of Contents

In this chapter, we will talk about getting documents into Watson Discovery.

Ingestion

Ingestion is the process of taking documents into the system in order to build a searchable collection.

When you create your collection, you select the data source and method used to populate the collection. Once the collection has been created, the method and data source cannot be changed.

This means that there is a one-to-one relationship between collections and data sources.

Screenshot by the Author

This screenshot shows the data sources currently available for Plus-type accounts.

There are several ways to ingest documents into your collection:

  • Uploading documents using the web-based Tooling (User Interface).
  • Creating and running one of the built-in crawlers or connectors.
  • Developing and running a custom crawler using the Discovery API (requires coding).
  • Reusing data from an existing collection.

Let’s look at each of these in more detail.

Uploading Documents

Uploading is the best way to get documents from your laptop or workstation into Discovery on the cloud.

If you select to create a collection from uploaded documents, you will see a form for naming the collection and setting the language, etc. This form will be similar for all collections you create.

Screenshot by the Author

If you check the box marked Apply FAQ extraction, Discovery will scan the document…

--

--

bakagaijin
Technology Hits

Master Inventor and AI Architect. Grew up in Japan, World Traveler. Former Navy Linguist. Interests include Music, Writing, Tech, Travel, and a Better World.