Unofficial Handbook
Watson Discovery: Creating and Populating a Collection
Ingestion, crawlers, and connectors
--
Disclaimer: This is my own work and does not represent the viewpoint of my present employer or any past employer.
Back to Table of Contents
In this chapter, we will talk about getting documents into Watson Discovery.
Ingestion
Ingestion is the process of taking documents into the system in order to build a searchable collection.
When you create your collection, you select the data source and method used to populate the collection. Once the collection has been created, the method and data source cannot be changed.
This means that there is a one-to-one relationship between collections and data sources.
This screenshot shows the data sources currently available for Plus-type accounts.
There are several ways to ingest documents into your collection:
- Uploading documents using the web-based Tooling (User Interface).
- Creating and running one of the built-in crawlers or connectors.
- Developing and running a custom crawler using the Discovery API (requires coding).
- Reusing data from an existing collection.
Let’s look at each of these in more detail.
Uploading Documents
Uploading is the best way to get documents from your laptop or workstation into Discovery on the cloud.
If you select to create a collection from uploaded documents, you will see a form for naming the collection and setting the language, etc. This form will be similar for all collections you create.
If you check the box marked Apply FAQ extraction, Discovery will scan the document…