Indexing data from Experience Edge in Coveo with the new GraphQL Connector

Árvai Mihály
11 min readAug 2, 2023

--

Today I’m bringing a post about indexing data from Experience Edge in Coveo via the newly available GraphQL connector. This post only demonstrates a very basic setup, it’s a proof of concept for me. The connector is still beta so you can expect further improvements on this. If you are new to Coveo, I highly recommend taking some courses on levelup.coveo.com.

The GraphQL API Source

What is the difference between the Generic REST API source and the new GraphQL Source?

The newly introduced source is a layer on top of the existing Generic REST API connector. That means if you have already experience with the Generic REST API connector, you will find that the configuration is basically the same for both of the connectors.

But the GraphQL Source solves one issue that is not feasible in the Generic REST API Source. The Generic REST API Source supports paging when you are retrieving data from an API, but it is only supported via query string parameters, not via the payload content and the GraphQL endpoint would expect the paging parameters in the request body, not in the query string.

Basically, you can use both of the sources for indexing content from Experience Edge, but if your result set is huge and you need paging for retrieving, then the GraphQL Connector will be your friend.

The other issue that is solved in the GraphQL Source is that Generic REST API Connector cannot handle errors that are coming in 200 HTTP Status Codes. If there is an error in the GraphQL query the error response will be returned with a 200 status code. You can see on the screenshot that is handled in the new connector:

Error Handling for errors that are wrapped in 200 Code.

Why not Coveo For Sitecore module?

A detailed and pretty good article can be found here and it describes why you shouldn’t use Coveo For Sitecore.

but in a nutshell: If you are using Experience Edge with Sitecore XM10.2+ or XMCloud, you are publishing to Edge and not the web database. The connector module is deeply integrated with Sitecore XP, mainly depending on web publishing. In the case of XMCloud you can get an auto-update that can break the integration, so probably you want to keep your customizations minimal in the case of XM Cloud.

Why Coveo and not leveraging Search from Experience Edge?

There is a Solr-based search available in Experience Edge, but it is very limited and it is not designed for content/site search. If you want to provide a good search experience for your visitors you can use some advanced search products like Coveo or Sitecore Search. Thanks to the composable world, you are free to choose your search solution and you can plugin into your existing solution.

What are the main integration options for Coveo with XM Cloud/Experience Edge?

There are other ways to index your Sitecore-based content with Coveo in case of a headless architecture.

Sitemap Source

The Coveo Sitemap Connector enables you to index and search the content of websites using sitemaps. It supports indexing dynamic content by periodically crawling the sitemaps to update the indexed content. You can configure various options such as metadata indexing, crawling frequency, relevance rules, and filters to customize the search experience. It allows you to consolidate content from multiple websites or domains into a single search interface, providing a unified search experience across different sources.

The Coveo Sitemap Connector works by parsing and processing the XML sitemaps provided by websites. It extracts the URLs and associated metadata from the sitemaps, then crawls and indexes the corresponding web pages, allowing users to search and retrieve relevant content through the Coveo search platform.

Sitemap connector has many great features, but the one that I really loved is that can read additional metadata from the sitemap.xml. The Sitemap Connector utilizes metadata provided in XML files to enrich search results, allowing for contextually relevant information and snippets to be displayed alongside search listings, aiding users in quickly finding the desired content.

Example:

<url>
<loc>http://example.com/about/</loc>
<lastmod>2015-02-10T13:47:23+00:00</lastmod>
<changefreq>weekly</changefreq>
<priority>1.00</priority>
<coveo:metadata>
<casenumber>18467</casenumber>
<companyname>
<![CDATA[
Company XYZ Inc. <USA>
]]>
</companyname>
</coveo:metadata>
</url>

Generic REST API Source or GraphQL Source

The REST API/GraphQL sources in Coveo enable integration with external systems and data sources through web services. It allows developers to retrieve and index data from REST/GraphQL APIs, making it searchable and accessible within the Coveo unified search platform. With both of the sources, you can easily connect Coveo with various applications, databases, or services, providing a unified search experience across multiple data sources in your organization. It offers flexibility and extensibility for developers to leverage the power of REST/GraphQL APIs and bring valuable data into Coveo’s intelligent search capabilities.

Push API

Push API still can be an option, the Coveo For Sitecore connector uses this too, however probably it might be the most complex from your integration point of view.

How to configure the GraphQL Source with Experience Edge?

Preparing GraphQL Query

I think most of the cases can be handled by the sitemap connector from Coveo POV, but there might be scenarios that would be problematic with Sitecore sources. E.g some data does not have pages or they are not public pages.
In my example, I bring a simple example of data that I want to index. They are simple blog pages with the following fields:

  • Author — Droptree points to the Author template that has some other field like a headshot, name, and Twitter handle of the author
  • Headline — Single Line Text
  • Description — Multi-Line Text
  • Content — RTE
  • Cover Image — Image
  • Tags — Treelist, points to a set of predefined tags that categorizes the content.
  • Publication Date — Date field

I prepared some demo content in my instance and prepared the following GraphQL query that I can run against the Experience Edge endpoint. This query will be configured in the GraphQL API source. (I’m not using variables in the GraphQL query because the token replacement will happen directly in the query. The pageSize and offset tokens will be replaced by Coveo)

query indexQuery {
pageOne: search(
where: {
AND: [
{
name: "_path"
value: "6F2C7093-1883-40C2-97A8-FB51949E4C90"
operator: CONTAINS
}
{
name: "_templates"
value: "28101057-BA7F-4B54-93C7-8666C901A6E7"
operator: CONTAINS
}
]
}
first: @pageSize
after:@offset
) {
total
pageInfo {
endCursor
hasNext
}
results {
id
author: field(name: "Author") {
jsonValue
}
date: field(name: "Publication Date") {
jsonValue
}
headline: field(name: "Headline") {
jsonValue
}
description: field(name: "Description") {
jsonValue
}
coverImage: field(name: "Cover Image") {
jsonValue
}
tags: field(name: "Tags") {
jsonValue
}
}
}
}

There is no big magic here, it basically retrieves my blog items based on the specific path and template and returns the id and the fields that I want to index. The query is prepared for paging through the results, so it can leverage the major benefit of the GraphQL source.
The result would like this:

And let’s inspect a single document in the JSON and note some JSON Path for further configurations.

{
"id": "B0674BB2E1814D8B8EE385B97355779A",
"author": {
"jsonValue": {
"id": "61c30686-c865-4380-888c-a3d8d6e7153c",
"url": "/Data/Authors/Adam",
"name": "Adam",
"displayName": "Adam",
"fields": {
"Twitter": {
"value": "@Adam"
},
"Photo": {
"value": {
"src": "https://edge.sitecorecloud.io/1qvxtyvbd0ct0am9nvvjbg-pju3ufg2ckkh64il7q5uw/media/Project/sxademo/headshots.png?h=399&iar=0&w=399",
"alt": "",
"width": "399",
"height": "399"
}
},
"Name": {
"value": "Adam"
}
}
}
},
"date": {
"jsonValue": {
"value": "2023-05-16T00:00:00Z"
}
},
"headline": {
"jsonValue": {
"value": "Some Sitecore Blogpost"
}
},
"description": {
"jsonValue": {
"value": "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets cont"
}
},
"coverImage": {
"jsonValue": {
"value": {
"src": "https://edge.sitecorecloud.io/1qvxtyvbd0ct0am9nvvjbg-pju3ufg2ckkh64il7q5uw/media/Default-Website/sc_logo.png?h=51&iar=0&w=204",
"alt": "",
"width": "204",
"height": "51"
}
}
},
"tags": {
"jsonValue": [
{
"id": "3a4e74eb-40ee-42a1-9703-b698a1649a4c",
"url": "/Data/Tags/Coveo",
"name": "Coveo",
"displayName": "Coveo",
"fields": {
"Name": {
"value": "Coveo"
}
}
},
{
"id": "4d9acf7f-39d9-4d39-b1bb-a8fa373a421e",
"url": "/Data/Tags/Sitecore",
"name": "Sitecore",
"displayName": "Sitecore",
"fields": {
"Name": {
"value": "Sitecore"
}
}
}
]
}
}

Now we only need to have the payload JSON prepared for the Edge endpoint and this is something that it will look like this: (You will post this JSON Body to the Edge endpoint https://edge.sitecorecloud.io/api/graphql/v1)

{
"operationName": "indexQuery",
"variables": {

},
"query": "<your graphql query in JSON string>"
}

Preparing Coveo — Adding fields

Before we start configuring our source, let’s add a couple of fields to Coveo, like Tags (Multi-Facet value), Headshot Image, Author etc.

Example for the Tags field. It is Tree List field in Sitecore and we want to store multiple values for each document (if more than 1 is selected) and we want to use it as facets later on our Search page, so “Multi Value Facet” should be checked in case of this field:

Adding Tags fields to Coveo

Adding GraphQL Source

Before you start, you can check how the configuration JSON looks like for the GraphQL Connector on the official documentation.

{
"Services": [
{
"Url": "https://api.github.com/",
"authentication": {
"username": "@username",
"password": "@password",
"forceBasicAuthentication": "true"
},
"Endpoints": [
{
"paging": {
"pageSize": 10,
"offsetType": "cursor",
"nextPageKey": "data.user.pullRequests.pageInfo.endCursor"
},
"headers": {
"accept": "application/vnd.github.v3+json",
"User-Agent": "PostmanRuntime/7.29.0"
},
"Path": "graphql",
"Method": "POST",
"ItemPath": "data.user.pullRequests.edges",
"SkippableErrorCodes": "404",
"ItemType": "PullRequests",
"Uri": "%[node.url]",
"ClickableUri": "%[node.url]",
"Title": "%[node.title]",
"ModifiedDate": "%[node.createdAt]",
"PayloadJsonContent": "{\"query\":\"query {\\r\\n user(login:\\\"JohnSmith\\\") {\\r\\n pullRequests(first:@pageSize, after:@offset) {\\r\\n totalCount\\r\\n edges {\\r\\n node {\\r\\n createdAt\\r\\n title\\r\\n url\\r\\n }\\r\\n cursor\\r\\n }\\r\\n pageInfo {\\r\\n endCursor\\r\\n hasNextPage\\r\\n }\\r\\n }\\r\\n }\\r\\n}\"}"
}
]
}
]
}

You can click on the “Add Sources” button in the Sources view and select GraphQL API

In the next dialog, you need to fill the Source Name field, copy the Experience Edge API Key into the API Key field, and prepare the JSON Configuration (see it later)

Adding new source

Preparing the JSON Configuration

The below JSON is quite straightforward. The Services object contains the necessary info about what and how should be indexed. The Url property is the root URL of the Experience Edge API, and the Endpoints array contains which endpoint should be called to fetch some data. ItemPath is a JSON path, it described that the results array and its’ content will be indexed from the endpoint. It configures the HTTP Method and reads some data, e.g headline.jsonValue.value string will be indexed into the Title. Uri and ClickableUri fields are mandatory in the case of GraphQL API source and you should have unique values at least in the Uri field. ClickableURI can be anything, e.g the corresponding landing page on your site.

We will also index the Modified DateTime from the data.jsonValue.value path.

You have to pass the @ApiKey token in the Headers section, otherwise Coveo cannot query from Experience Edge.

The paging section tells the Coveo that the Total Count and the Next Page Key (hash) is located in the response via JSON Path, so it will be able to page through the results.

The Metadata object is the place where we can define a JSON path to retrieve values from the API Response. tags.jsonValue[*].fields.name.value will index every element of the Tags array from the JSON (Sitecore TreeList field) as a semi-colon-separated value. (e.g. there is the following JSON in the response → [“XM Cloud”,”Sitecore XP”], it will be indexed as “XM Cloud;Sitecore XP”. We already configured Tags field for Multi-Value facets

The PayloadJsonContent is a bit painful right now since it's a JSON String with the Experience Edge Payload, and if you remember, that JSON already contains an escaped JSON String. Hopefully, this is something that will be improved soon and you will be able to specify the GraphQL query in a separate field on the source. Do not forget, this connector is only in beta so you can expect further improvements.

{
"Services": [
{
"Url": "https://edge.sitecorecloud.io/api/graphql/v1",
"Headers": {
"sc_apikey": "@ApiKey"
},
"paging": {
"pageSize": 1,
"offsetType": "cursor",
"totalCountKey": "data.pageOne.total",
"nextPageKey": "data.pageOne.pageInfo.endCursor"
},
"Endpoints": [
{
"Path": "/",
"ItemPath": "data.pageOne.results",
"Method": "POST",
"ItemType": "Post",
"PayloadJsonContent": "{\"operationName\":\"indexQuery\",\"query\":\"query indexQuery {\\n pageOne: search(where: {AND: [{name: \\\"_path\\\", value: \\\"6F2C7093-1883-40C2-97A8-FB51949E4C90\\\", operator: CONTAINS}, {name: \\\"_templates\\\", value: \\\"28101057-BA7F-4B54-93C7-8666C901A6E7\\\", operator: CONTAINS}]}, first: @pageSize, after: @offset) {\\n total\\n pageInfo {\\n endCursor\\n hasNext\\n }\\n results {\\n id \\n author: field(name: \\\"Author\\\") {\\n jsonValue\\n }\\n date: field(name: \\\"Publication Date\\\") {\\n jsonValue\\n }\\n headline: field(name: \\\"Headline\\\") {\\n jsonValue\\n }\\n description: field(name: \\\"Description\\\") {\\n jsonValue\\n }\\n coverImage: field(name: \\\"Cover Image\\\") {\\n jsonValue\\n }\\n tags: field(name: \\\"Tags\\\") {\\n jsonValue\\n }\\n }\\n }\\n}\\n\"}",
"Title": "%[headline.jsonValue.value]",
"Uri": "%[coveo_url]/products/%[id]",
"ModifiedDate": "%[date.jsonValue.value]",
"ClickableUri": "%[coveo_url]/products/%[id]",
"Metadata": {
"id": "%[id]",
"author_headshot": "%[author.jsonValue.fields.Photo.value.src]",
"author_name": "%[author.jsonValue.fields.Name.value]",
"author_twitter": "%[author.jsonValue.fields.Twitter.value]",
"publication_date": "%[date.jsonValue.value]",
"headline": "%[headline.jsonValue.value]",
"description": "%[description.jsonValue.value]",
"cover_image": "%[coverImage.jsonValue.value.src]",
"tags": "%[tags.jsonValue[*].fields.Name.value]"
}
}
]
}
]
}

At this point, you can save the source, but we still need to define Mappings in order to map the indexed metadata info Coveo Fields.

(Manage mappings in Coveo documentation).

First, open the Source view, select our GraphQL source, and the Manage Mappings menu item. (see screenshot)

In the popup, click on the Add button, select Mapping and add the mappings for the previously created new fields (author, headshot, tags)

In the field, you are selecting the Coveo field, which we have created in the previous step, and you are writing an expression in the Rules field. %[tags] will map tags metadata field which is configured in the source configuration. You should repeat these steps for every new field and finally, let’s rebuild the source.

After the successful rebuild, let’s open the content browser, and verify if the data is indexed for the items.

Tags field in COveo
Author fields

You can see that the author fields and tags field were indexed in Coveo.
Finally, I’ve just created a Coveo Hosted Search page to quickly verify again, if everything was indexed correctly, here is the final result:

Basic hosted search page to verify mappings and facets

I added a very simple result list and used our newly added fields as filters.

But of course, you can build your own search interface from scratch by using the Coveo Atomic Library

I hope you enjoyed this post. Coveo is a great product and perfectly fits in the composable DXP World.

--

--