Machine Learning with APOC

Let’s learn how to make Neo4j easily interact with ML providers.

Giuseppe Villani
Neo4j Developer Blog
11 min readMay 1, 2024

--

Starting from version 5.8, the APOC Extended library allows us to interact Neo4j with leading Machine Learning platforms easily.

The procedures described in this chapter act as wrappers around cloud-based Machine Learning APIs. These procedures generate embeddings, analyze text, complete text, complete chat conversations, and more.

Note that these procedures leverage RestAPIs that are provided by the platform, so they do not require SDK or other additional jars. Unless mentioned, they use POST as the HTTP Method.

In order to understand the operation and, if necessary, to better customize the various procedures, the equivalent HTTP request will be highlighted for each one.

Procedure list overview

Currently, APOC provides these sets of procedures:

  • apoc.ml.openai.*
  • apoc.ml.vertexai.*
  • apoc.ml.bedrock.*
  • apoc.ml.sagemaker.*
  • apoc.ml.watson.*

Basically, all procedure sets have (at least) the following 3 procedures.

In addition, each procedure set can have other procedures, as we will see later.

Embedding API

CALL apoc.ml.<type>.embedding(['Some Text'], ...<other params>)
YIELD index, text, embedding

which returns the embedding for a given text.

The Neo4j result is:

+----------------------------------------------------+
| index | text | embedding |
+----------------------------------------------------+
| <embeddingIndex> | <givenText> | <list of float> |
+----------------------------------------------------+

Chat Completion API

CALL apoc.ml.<type>.chat(['Some Text'], ...<other params>) 
YIELD value

which prompts a Chat Completion API and returns a map with the RestAPI response.

Completion API

CALL apoc.ml.<type>.completion('prompt', ...<other params>)
YIELD value

which prompts a Text Completion API and returns a map with the RestAPI response.

Let’s take a look at all the APIs available so far.

OpenAI procedures

It allows you to interface with the OpenAI APIs, as well as potentially with all the APIs compatible with it available.

To utilize these methods, you should create an OpenAI API key or, alternatively, invoke OpenAI-compatible APIs with their own API key (or even without one), as detailed in the section below titled “OpenAI-compatible provider”.

Let’s first see how to use them with OpenAI. For reference, see the Embedding, Chat completion, and Text completion documentation pages.

All of them will have the default header:

Content-Type: application/json
Authorization: Bearer <accessToken>

Embeddings API

CALL apoc.ml.openai.embedding(['Some Text'], 
'<apiKey>',
{<$optionalConfigMap>})


/*
Equivalent to a RestAPI request with:

Endpoint:
https://api.openai.com/v1/embeddings

Body:
{"inputs": <firstParameter>, "model": "text-embedding-ada-002"}

*/

Text Completion API

CALL apoc.ml.openai.completion('What color is the sky? Answer in one word: ',
'<apiKey>',
{<$optionalConfigMap>})

/*
Equivalent to a RestAPI request with:

Endpoint:
https://api.openai.com/v1/completions

Body:
{"prompt": <firstProcedureParameter>, "model": "gpt-3.5-turbo-instruct"}
*/

Chat Completion API

CALL apoc.ml.openai.chat([
{role:"system", content:"Only answer with a single word"},
{role:"user", content:"What planet do humans live on?"}
],
'<apiKey>',
{<$optionalConfigMap>})

/*
Equivalent to a RestAPI request with:

Endpoint:
https://api.openai.com/v1/chat/completions

Body:
{"messages": <firstParameter>, "model": "gpt-3.5-turbo"}
*/

OpenAI-compatible provider

Moreover, we can potentially interface with any API that is compatible with the OpenAI APIs.

For example, with the Azure API, we can change the endpoint configuration, together with the apiVersion and apiType ones, needed because authentication works differently on Azure than it does on most other OpenAI providers.

For example, using the embedding procedure:

CALL apoc.ml.openai.embedding(['Some Text'], 
'<apiKey>',
{endpoint: "https://my-resource.openai.azure.com/openai/deployments/my-deployment-id",
apiVersion: '2023-07-01-preview',
apiType: 'AZURE'
}
)


/*
Equivalent to a RestAPI request with:

Endpoint:
https://my-resource.openai.azure.com/openai/deployments/my-resource/embeddings?api-version=2023-07-01-preview

Body:
{"messages": <firstParameter>, "model": "gpt-3.5-turbo"}

Headers:
Content-Type: application/json
api-key: <secondParameter>
*/

Also, with Anyscale Endpoint, by defining the endpoint URL, and the model ID:

CALL apoc.ml.openai.embedding(['Some Text'], $anyScaleApiKey,
{
endpoint: 'https://api.endpoints.anyscale.com/v1',
model: 'thenlper/gte-large'
})


/*
Equivalent to a RestAPI request with:

Endpoint:
https://api.endpoints.anyscale.com/v1

Body:
{"messages": <firstProcedureParameter>, "model": "thenlper/gte-large"}

Headers: like OpenAI
*/

or with Local APIs:

CALL apoc.ml.openai.embedding(['Some Text'], "ignored",
{endpoint: 'http://localhost:8080/v1', model: 'text-embedding-ada-002'})

/*
Equivalent to a RestAPI request with:

Endpoint:
http://localhost:8080/v1

Body:
{"messages": <firstProcedureParameter>, "model": "thenlper/gte-large"}

Headers: like OpenAI
*/

Also, we can use the HuggingFace API, defining the config apiType: 'HUGGINGFACE' and path: '', since we need to manipulate URL and body request, e.g.:

CALL apoc.ml.openai.completion('What color is the sky? Answer in one word: ',
$huggingFaceApiKey,
{
endpoint: 'https://api-inference.huggingface.co/models/gpt2',
apiType: 'HUGGINGFACE',
model: 'gpt2',

// -- not to add the `/completions` url suffix
path: ''
})



/*
Equivalent to a RestAPI request with:

Endpoint:
https://api-inference.huggingface.co/models/gpt2

Body:
{"inputs": <firstProcedureParameter>, "model":"gpt2",}

Headers: like OpenAI
*/

In addition to these procedures just seen, there are 3 others that leverage the Chat Completion’s OpenAI API.

Query with natural language

For generating and executing Cypher queries:

CALL apoc.ml.query("What movies did Tom Hanks play in?") 
yield value, query
RETURN *

with a result like:

+------------------------------------------------------------------------------------------------------------------------------+
| value | query
+------------------------------------------------------------------------------------------------------------------------------+
| {m.title -> "You've Got Mail"} | "cypher
MATCH (m:Movie)<-[:ACTED_IN]-(p:Person {name: 'Tom Hanks'})
RETURN m.title
"
| {m.title -> "Apollo 13"} | "cypher
MATCH (m:Movie)<-[:ACTED_IN]-(p:Person {name: 'Tom Hanks'})
RETURN m.title"
| {m.title -> "Joe Versus the Volcano"} | "cypher
MATCH (m:Movie)<-[:ACTED_IN]-(p:Person {name: 'Tom Hanks'})
RETURN m.title
"
...
+------------------------------------------------------------------------------------------------------------------------------+

Describe the graph model with natural language

To describe the schema of the database:

CALL apoc.ml.schema() yield value

with a result like:

+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
| value |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
| "The graph database schema represents a system where users can follow other users and review movies.
Users (:Person) can either follow other users (:Person) or review movies (:Movie).
The relationships allow users to express their preferences and opinions about movies.
This schema can be compared to social media platforms where users can follow each other and leave reviews or ratings for movies they have watched.
It can also be related to movie recommendation systems where user preferences and reviews play a crucial role in generating personalized recommendations." |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------+

Create Cypher queries from a natural language query

To generate N Cypher queries for the user question:

CALL apoc.ml.cypher(
"Who are the actors which also directed a movie?",
{count: 4}
)
yield query
+----------------------------------------------------------------------------------------------------------------+
| query
+----------------------------------------------------------------------------------------------------------------+
| "
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(d:Person)
RETURN a.name as actor, d.name as director
"
| "cypher
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(a)
RETURN a.name"
| "
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(d:Person)
RETURN a.name"
| "cypher
MATCH (a:Person)-[:ACTED_IN]->(:Movie)<-[:DIRECTED]-(a)
RETURN DISTINCT a.name"
+----------------------------------------------------------------------------------------------------------------+

Create a natural language explanation from a Cypher query

Since version 5.19

To return an explanation of the Cypher query:

WITH "MATCH (p:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie) RETURN m" AS query
CALL apoc.ml.fromCypher(query, {retries: 3}) YIELD value
RETURN value

with a result like:

+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
| value |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
| "The graph database schema represents a system where users can follow other users and review movies.
Users (:Person) can either follow other users (:Person) or review movies (:Movie).
The relationships allow users to express their preferences and opinions about movies.
This schema can be compared to social media platforms where users can follow each other and leave reviews or ratings for movies they have watched.
It can also be related to movie recommendation systems where user preferences and reviews play a crucial role in generating personalized recommendations." |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------+

See the documentation page, to see all the possible configurations

Vertex.AI procedures

We can also easily interface with the Vertex AI API, in a very similar way to the OpenAI.

Assuming we created a Google Cloud account and a project with id my-own-project, we can execute the following procedures.

All of them will have the default header:

Content-Type: application/json
Accept: application/json
Authorization: Bearer <accessToken>

For reference, see the Embedding, Chat completion, and Text completion documentation pages.

Embedding API

CALL apoc.ml.vertexai.embedding(
['Some Text'],
'<accessToken>',
'my-own-project',
{<$optionalConfigMap>}
)


/*
Endpoint:
https://us-central1-aiplatform.googleapis.com/v1/projects/my-own-project/locations/us-central1/publishers/google/models/textembedding-gecko:predict

Body:
{"instances":[{"content":"Some Text"}], "parameters":{}}
*/

Text Completion API

CALL apoc.ml.vertexai.completion(
'What color is the sky? Answer in one word: ',
'<accessToken>',
'<project-id>',
{<$optionalConfigMap>}
)


/*
Endpoint:
https://us-central1-aiplatform.googleapis.com/v1/projects/my-own-project/locations/us-central1/publishers/google/models/text-bison:predict

Body:
{
"instances":[{"prompt":"What color is the sky? Answer in one word: "}],
"parameters":{"temperature":0.3,"topK":40,"maxOutputTokens":256,"topP":0.8}
}
*/

Chat Completion API

CALL apoc.ml.vertexai.chat(
[{author:"user", content:"What planet do timelords live on?"}],
'<accessToken>',
'<project-id>',
{<$optionalConfigMap>}
)


/*
Endpoint:
https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/chat-bison:predict


Body:
{
"instances":[
{
"messages":[
{
"author":"user",
"content":"What planet do timelords live on?"
}
],
"examples":[{"output":{"content":"Earth"},"input":{"content":"What planet do humans live on?"}}],
"context":"Fictional universe of Doctor Who. Only answer with a single word!"
}
],
"parameters":{"temperature":0,"topK":40,"maxOutputTokens":256,"topP":0.8}
}
*/

See the documentation page for more info

Bedrock procedures

These procedures have been introduced since version 5.15

APOC allows us to interface with Amazon Bedrock as well.

Like the others, it has 3 procedures for embedding / chat completion and completion APIs.

Note that, unlike Vertex AI and OpenAI, there is no apiKey parameter, as AWS credentials require both an Aws Key ID and a Secret Access Key.

These can be defined, for example, via apoc.conf:

apoc.aws.key.id=<AWS Key ID>
apoc.aws.secret.key=<AWS Secret Access Key>

There are other methods to authenticate, as stated in the documentation.

The default RestAPI header entries are:

Authorization: <AWS4 Auth>
X-Amz-Date: <timestamp>
Host: bedrock-runtime.us-east-1.amazonaws.com
Content-Type: application/json
accept: */*

where the Authorization value is calculated in this way.

Assuming we have defined the credentials via apoc.conf, we can then invoke the API as follows:

Embedding API

CALL apoc.ml.bedrock.embedding("Test")

/*
Endpoint:
https://bedrock-runtime.us-east-1.amazonaws.com/model/ai21.j2-ultra-v1/invoke

Body:
{"inputText": <firstProcedureParameter>}
*/

Text Completion API

CALL apoc.ml.bedrock.completion('What color is the sky? Answer in one word: ')


/*
Endpoint:
https://bedrock-runtime.us-east-1.amazonaws.com/model/ai21.j2-ultra-v1/invoke

Body:
<firstProcedureParameter>
*/

Chat Completion API

CALL apoc.ml.bedrock.chat(
[ {prompt: "\n\nHuman: Hello world\n\nAssistant:",max_tokens_to_sample: 200} ]
)

/*
Endpoint:
https://bedrock-runtime.us-east-1.amazonaws.com/model/anthropic.claude-v2/invoke

Body:
<firstProcedureParameter>
*/

Image API

Additionally, we can call one of the Image APIs available in Bedrock:

CALL apoc.ml.bedrock.image({
text_prompts: [{text: "picture of a bird", weight: 1.0}],
cfg_scale: 5,
seed: 123,
steps: 70,
style_preset: "photographic"
})


/*
-- Result --
+------------------------------------------------------------------------+
| base64Image |
+------------------------------------------------------------------------+
| "iVBORw0KGgoAAAANSUhEUgAAAgAAAAIACAIAAAB7GkOtAAABjmVYSWZNTQAqAAAA...." |
+------------------------------------------------------------------------+


-- RestAPI Call --
Endpoint:
https://bedrock-runtime.us-east-1.amazonaws.com/model/stability.stable-diffusion-xl-v0/invoke

Body:
<firstProcedureParameter>
*/

List of models

Moreover, we can return the API List:

CALL apoc.ml.bedrock.list()


/*
-- Result --
+--------------------------------------------------------------------------------------------------------+
| modelId | modelArn | modelName | providerName |....|
+--------------------------------------------------------------------------------------------------------+
| "amazon.titan-tg1-large" | "arn:aws:bedrock:us-east-1... | "Titan Text Large" | "Amazon" |....|
+--------------------------------------------------------------------------------------------------------+

-- RestAPI Call (with HTTP method: GET)--
Endpoint:
https://bedrock-runtime.us-east-1.amazonaws.com/model/stability.stable-diffusion-xl-v0/invoke

Body:
<firstProcedureParameter>
*/

Custom API

And we can also create a Customized RestAPI call, by specifying the endpoint, method, headers, and more:

CALL apoc.ml.bedrock.custom(null, {
endpoint: "https://bedrock.us-east-1.amazonaws.com/logging/modelinvocations",
method: "GET"
})

/*
-- Result --
+-------------------------------------------------+
| value |
+-------------------------------------------------+
| { "loggingConfig": {"cloudWatchConfig": { …​ }}} |
+-------------------------------------------------+

-- RestAPI Call (with HTTP method: GET)--
Endpoint:
https://bedrock.us-east-1.amazonaws.com/logging/modelinvocations

Body: null
*/

Sagemaker procedures

Available since version 5.15

These procedures leverage the Amazon SageMaker API.

These procedures share all the configurations available for Bedrock. In fact, we can similarly authenticate via apoc.conf :

apoc.aws.key.id=<AWS Key ID>
apoc.aws.secret.key=<AWS Secret Access Key>

The default RestAPI header entries are similar to the Bedrock ones:

Authorization: <AWS4 Auth>
X-Amz-Date: <timestamp>
Host: Host -> runtime.sagemaker.eu-central-1.amazonaws.com
Content-Type: application/json
accept: */*

Assuming we have defined the credentials via apoc.conf, we can then invoke the API as follows:

Embedding API

CALL apoc.ml.sagemaker.embedding(
["How is the weather today?", "What's the color of an orange?"],
{<configMap>})

/*
Equivalent to a RestAPI request with:

Endpoint:
https://runtime.sagemaker.eu-central-1.amazonaws.com/endpoints/Endpoint-Jina-Embeddings-v2-Base-en-1/invocations

Body:
{"data":[{"text":"How is the weather today?"},{"text":"What's the color of an orange?"}]}
*/

Chat Completion API

CALL apoc.ml.sagemaker.chat([{role: "admin", content: "Test answer"}],
{endpointName: 'Endpoint-Distilbart-xsum-1-1-1', region: 'us-east-1'}
)

/*
Equivalent to a RestAPI request with:

Endpoint:
https://runtime.sagemaker.us-east-1.amazonaws.com/endpoints/Endpoint-Distilbart-xsum-1-1-1/invocations

Body:
"Test endpoint"
*/

Text Completion API

CALL apoc.ml.sagemaker.completion('Test answer API',
{endpointName: 'Endpoint-GPT-2-1-0', headers: {`Content-Type`: "application/x-text"}, region: "eu-central-1"}
)

/*
Equivalent to a RestAPI request with:

Endpoint:
https://runtime.sagemaker.eu-central-1.amazonaws.com/endpoints/Endpoint-GPT-2-1-0/invocations

Body:
<firstParameter>
*/

Custom API Call

As in Bedrock, a custom procedure is available in SageMaker:

CALL apoc.ml.sagemaker.custom({SortBy: "Name"},
{
endpointName: "Endpoint-GPT-2-1-0",
headers: {`Content-Type`: "application/x-text"},
region: "eu-central-1"
})

/*
Endpoint:
https://runtime.sagemaker.eu-central-1.amazonaws.com/endpoints/Endpoint-GPT-2-1-0/invocations

Body: {}

Headers:
Authorization: <AWS4 Auth>
X-Amz-Date: <timestamp>
Host: Host -> runtime.sagemaker.eu-central-1.amazonaws.com
Content-Type: application/x-text
accept: */*

*/

See here for more info

Watson procedures

Available since version 5.16

Finally, let’s look at how to interact with the IMB Watson Foundation Models APIs.

Unlike the others, this set of procedures doesn’t have apoc.ml.watson.embedding, since Watson does not have built-in embedding APIs, so far.

To use these procedures, we need to acquire a Watson bearer access token, which must be defined in the second parameter.

Moreover, we need to put the project ID in the configuration map or apoc.conf as here:

apoc.ml.watson.project.id=<WATSON_PROJECT_ID>

Assuming we use this apoc.conf, we can execute the following procedures:

Text Completion API

CALL apoc.ml.watson.completion(
'What color is the sky? Answer in one word: ',
'<bearerToken>',
{<$optionalConfigMap>}
)

/*
Equivalent to a RestAPI request with:

Endpoint:
https://eu-de.ml.cloud.ibm.com/ml/v1-beta/generation/text?version=2023-05-29

Header:
accept: application/json
Content-Type: application/json
Authorization: Bearer <accessToken>

Body:
{
"input": "What color is the sky? Answer in one word: ,
"model_id": "ibm/granite-13b-chat-v2",
"projectId": "<the one explicited in the APOC config>",
}
*/

Chat Completion API

In this case, the default API matches that of xxx, the difference is that the input parameter is a string like: <roleValue1>: <contentValue1> \n\n <roleValue2>: <contentValue2>\n\n ...

That is, for each element of the list of maps placed in the first parameter, the value corresponding to the key role (<roleValueN>) and the value of the key content (<contentValueN>) are taken.

For example:

CALL apoc.ml.watson.chat([
{role:"system", content:"Only answer with a single word"},
{role:"user", content:"What planet do humans live on?"}
],
'<accessToken>',
{<$optionalConfigMap>)


/*
Endpoint and Headers: like apoc.ml.watson.completion

Body:
{
"input": "system: Only answer with a single word \n\n user: What planet do humans live on?",
"projectId": "<the one explicited in the APOC config>",
"model_id": "ibm/granite-13b-chat-v2"
}
*/

See the documentation for more details

Use cases and conclusions

We have seen how to easily perform a single-line integration between an LLM and a Knowledge Graph.

Among the many possibilities, we can use apoc.ml together with Neo4j Vector Search indexes, which allow users to query vector embeddings from large datasets.

For example, we can create an index:

CREATE VECTOR INDEX test-embeddings
FOR (n:TestLabel)
ON (n.testEmbed)

then we can use an embedding procedure, like the apoc.ml.openai.embedding , to retrieve the embeddings of some text and save the result in nodes with property testEmbedd and label TestLabel, using the db.create.setNodeVectorProperty procedure, that is:

// -- create some embedding starting from this words
CALL apoc.ml.openai.embedding(["bottle", "man", "men", "boy", "alien"], null, {})
YIELD embedding, text
WITH embedding, text

// -- we save the original text in the `text` property
CREATE (u:TestLabel {text: text})
WITH u, embedding

// -- we save the embedding result in the `testEmbed` property
CALL db.create.setNodeVectorProperty(u, "testEmbed", embedding)

RETURN count(*)

Therefore, we can query the index:

MATCH (n:TestLabel {text: 'man'})
CALL db.index.vector.queryNodes('test-embeddings', 10, n.embedding)
YIELD node, score
RETURN node.text AS text, score

with a result similar to the following:

+-------------------------------------------------+
| text | score |
+-------------------------------------------------+
| "man" | 1.0 |
| "men" | 0.9710835814476013 |
| "boy" | 0.9344362020492554 |
| "alien" | 0.9147796630859375 |
| "bottle" | 0.8950000405311584 |
+-------------------------------------------------+

We can see that a score of 1 indicates that the word is the same.
The lower the score, the more the word is conceptually different from “man”.

Certainly, it is worth noting that, given the continuing evolution of the ML world and the relatively recent introduction of the apoc.ml procedures, it is very likely that there will be new procedures and/or configurations in subsequent versions that will add to and / or enhance the currently available APIs.

--

--