Data Ingestion With Neo4j: Leveraging CSV and JSON

Published in

Aarth Software

10 min readMay 23, 2023

In today’s data-driven world, efficient data ingestion is crucial for organizations to unlock valuable insights from their datasets.

Neo4j is a highly popular graph database that allows for efficient management and analysis of connected data.

Unlike traditional relational databases, Neo4j’s data model is based on nodes and relationships, providing a flexible and intuitive way to represent complex relationships and hierarchies.

Data ingestion is a critical step in utilizing Neo4j’s capabilities effectively. It involves importing data from various sources, such as CSV or JSON, into the graph database.

By ingesting data into Neo4j, organizations can unlock valuable insights, perform advanced graph-based queries, and leverage powerful graph algorithms.

This blog post will delve into the process of data ingestion with Neo4j, focusing on the utilization of CSV and JSON formats.

In this model, data is represented as nodes, which represent entities, and relationships, which depict connections between nodes.

Nodes can have properties that provide additional information, while relationships can have properties to describe their nature.

This model enables the representation of complex relationships and hierarchies in a more intuitive and flexible manner.

By comprehending the graph data model, users can design schemas that accurately reflect their domain, ensuring efficient data ingestion and powerful querying capabilities in Neo4j.

Dataset: E-commerce Fraud Dataset

For this data ingestion example, we will be using the E-commerce Fraud Dataset, which can be obtained from the Kaggle link: E-commerce Fraud Dataset.

This dataset contains information related to e-commerce transactions and includes attributes such as user IDs, transaction amounts, product details, and fraud indicators.

By ingesting this dataset into Neo4j, we can analyze the relationships between users, transactions, and products, and gain insights into fraudulent activities in the e-commerce domain.

Graph Model With Arrow.App

Using Arrow.app, I designed a graph model prior to ingesting data into Neo4j Browser.

Arrow.app provided an intuitive and efficient interface for drawing nodes, relationships, and properties, aligning with the Neo4j property graph model.

The tool allowed me to customize the styling, including sizes, layouts, and colors, to create a visually appealing representation of the graph.

After designing the model, I exported it as an Cypher Query and used it for documentation and presentations.

The ability to generate corresponding Cypher code from the diagram further enhanced the seamless integration with the Neo4j ecosystem, facilitating the data ingestion process in Neo4j Browser.

CREATE (:Customer {customer_email: "", customer_phone: "", customer_device: "", customerip_address: "", customerbilling_address: ""})-[:MADE_TRANSACTION]->(:Order {order_id: "", paymentmethod_id: "", paymentmethod_registrationfailure: "", paymentmethod_type: "", paymentmethod_provider: "", transaction_amount: "", transaction_failed: "", order_state: "", transaction_id: "", customer_email: ""})

Ingesting CSV Data into Neo4j

a. Preparing the CSV Data:

Before ingesting CSV data into Neo4j, it is important to ensure that the data is properly prepared. Follow these steps to prepare the CSV data:

Ensure the data is structured: Make sure the CSV file has a well-defined structure, with each column representing a specific attribute or property.
Clean and validate the data: Clean the data by removing any unnecessary or inconsistent values, duplicates, or errors. Validate the data to ensure it conforms to the expected format and quality.
Handle missing values: Decide how to handle missing values in the CSV. You can either leave them as null values or choose a specific representation that suits your data model.
Normalize or denormalize the data: Depending on the complexity of your data and the desired data model in Neo4j, you may need to normalize or denormalize the CSV data to establish relationships between entities.

b. Defining a Schema in Neo4j:

To effectively ingest CSV data into Neo4j, it is essential to define a schema that represents the structure of the data. Here’s how you can define a schema in Neo4j:

Identify the node labels: Analyze the CSV data and identify the main entities or concepts. Each entity will typically correspond to a node label in Neo4j.
Determine the node properties: For each node label, identify the relevant attributes or properties that describe the entity. These properties will become the node properties in Neo4j.
Establish relationships: Identify the relationships between nodes based on the CSV data. Determine the relationship types and any additional properties associated with the relationships.
Create indexes and constraints: Define indexes on the properties that are frequently used for querying to optimize performance. Apply constraints to ensure data integrity and consistency.
By defining a clear schema in Neo4j, you provide a structure to the ingested CSV data, enabling efficient querying and analysis of the graph database.

c. Importing CSV Data into Neo4j

Creating Constraints:

CREATE CONSTRAINT FOR (c:Customer) REQUIRE (c.customer_email,c.customer_device) IS UNIQUE;
CREATE CONSTRAINT FOR (o:Order) REQUIRE (o.order_id, o.transaction_id) IS UNIQUE;

Note: Constraints in Neo4j ensure data integrity, consistency, and performance by enforcing rules, preventing duplicates, and maintaining valid relationships within the graph database.

Ingesting Data Into Customers Node:

LOAD CSV WITH HEADERS FROM 'file:///Customer_DF.csv' AS row
WITH row
MERGE(c:Customer{customer_email:row.customerEmail,customer_device: row.customerDevice})
SET c.customer_phone= row.customerPhone, c.customerip_address= row.customerIPAddress, c.customerbilling_address= row.customerBillingAddress
RETURN row

Ingesting Data Into Orders Node:

LOAD CSV WITH HEADERS FROM 'file:///cust_transaction_details.csv' AS row WITH row
MERGE(o:Order{order_id:row.orderId, transaction_id: row.transactionId})
SET o.payment_methodid= row.paymentMethodId, o.paymentmethodregistration_failure= CASE row.paymentMethodRegistrationFailure WHEN '0' THEN 'FALSE' WHEN '1' THEN 'TRUE' END, o.payment_methodtype= row.paymentMethodType, o.paymentmethod_provider= row.payment_methodprovider, o.transaction_amount= toInteger(row.transactionAmount), o.transaction_failed= CASE row.transactionFailed WHEN '0' THEN 'FALSE' WHEN '1' THEN 'TRUE' END, o.order_state= row.orderState
RETURN row;

Ingesting Data to Create Relationships Between Nodes:

LOAD CSV WITH HEADERS FROM 'file:///cust_transaction_details.csv' AS row
WITH row
MATCH(c:Customer{customer_email:row.customerEmail})
MATCH(o:Order{order_id:row.orderId, transaction_id: row.transactionId})
MERGE (c)-[r:MADE_TRANSACTION]->(o)
RETURN row;

You can check all the nodes and relationships (with ingested data), when you click on a Database Icon which is on the left side of the screen.

For Graph Visualization:

CALL db.schema.visualization()

Ingesting JSON Data into Neo4j

a. Understanding JSON Data Structure:

Before ingesting JSON data into Neo4j, it’s important to understand the structure of the JSON data. JSON data consists of key-value pairs, arrays, and nested objects.

Key aspects to understand in JSON data structure are:

Objects: JSON objects encapsulate data using key-value pairs, where keys are strings and values can be strings, numbers, arrays, or nested objects.
Arrays: JSON arrays contain an ordered list of values, which can be strings, numbers, objects, or arrays themselves.
Nesting: JSON data often involves nesting, where objects or arrays are nested within one another to represent complex relationships.

b. Transforming JSON Data to Graph Model:

To transform JSON data into a graph model for ingestion into Neo4j, follow these steps:

Analyze the JSON data: Understand the relationships and hierarchies present in the JSON data. Identify the main entities, attributes, and their relationships to define the graph model.
Map JSON elements to nodes: Each distinct entity or concept in the JSON data should be mapped to a node in the graph model. Identify the key attributes of each entity to become the node properties.
Establish relationships: Identify the relationships between nodes based on the JSON data structure. Determine the relationship types and any additional properties associated with the relationships.
Handle nested structures: If the JSON data contains nested objects or arrays, determine how to represent them in the graph model. Consider creating additional nodes or connecting existing nodes through relationships to capture the nested relationships.
Transform JSON attributes to properties: For each node, map the relevant attributes from the JSON data to node properties in Neo4j. This step involves extracting the corresponding values and ensuring they are correctly formatted for the desired data types.

By transforming JSON data into a graph model that accurately represents the relationships and entities, you can effectively ingest the data into Neo4j and leverage its powerful graph database capabilities.

c. Importing CSV Data into JSON Using Pycharm

Step 1: Converting the file CSV to JSON

Step 2: Now you can download the file

Step 3: Using Pycharm to Load JSON With Python Code

Note: Make sure that your particular neo4j instance should be active to load the data.

To run the code go to terminal
Pip install neo4j
Pip install fastapi
Pip install dotenv

This code defines a Cypher query that creates a new Customer node in the Neo4j database for each customer object in the request.

The UNWIND statement is used to iterate over the request parameter and create a new node for each customer.

The SET statement is used to set the properties of the new Customer node to the values of the corresponding customer object.

Finally, the session.run method is used to execute the Cypher query, passing the request parameter as a parameter to the query.

@app.post("/Customer/")
async def create_customer(request: List[Dict[str, Any]]):
    load_dotenv()
    uri = os.getenv("uri")
    user = os.getenv("user")
    pwd = os.getenv("pwd")
    driver = GraphDatabase.driver(uri=uri, auth=(user, pwd))
    session = driver.session()
    print(request)
    q = """
    UNWIND $customers as row
    create (a:Customer)
    SET a += row;
    """
    session.run(q, customers=request)
    print(q)
 return {"status": "ok"}

This code defines a Cypher query that creates a new Order node in the Neo4j database for each order object in the request.

The UNWIND statement is used to iterate over the request parameter and create a new node for each order.

The SET statement is used to set the properties of the new Order node to the values of the corresponding order object.

Finally, the session.run method is used to execute the Cypher query, passing the request parameter as a parameter to the query.

@app.post("/Order/")
async def create_order(request: List[Dict[str, Any]]):
    load_dotenv()
    uri = os.getenv("uri")
    user = os.getenv("user")
    pwd = os.getenv("pwd")
    driver = GraphDatabase.driver(uri=uri, auth=(user, pwd))
    session = driver.session()
    print(request)
    q = """
    UNWIND $order as row
    create (a:Order)
    SET a += row;
    """
    session.run(q, order=request)
    print(q)
    return {"status": "ok"}

To create a relationship between the customer node and the order node.

@app.post("/Transaction/")
async def create_transaction(request: List[Dict[str, Any]]):
    load_dotenv()
    uri = os.getenv("uri")
    user = os.getenv("user")
    pwd = os.getenv("pwd")
    driver = GraphDatabase.driver(uri=uri, auth=(user, pwd))
    session = driver.session()
    for row in request:
        q = """
            MATCH (c:Customer{customerEmail: $customerEmail})
            MATCH (o:Order{orderId: $orderId, transactionId: $transactionId})
            MERGE (c)-[r:MADE_TRANSACTION]->(o)
            RETURN c,o,r
            """

        session.run(q, customerEmail=row['customerEmail'], orderId=row['orderId'], transactionId=row['transactionId'])

    return {"status": "ok"}

In Pycharm, to check the versions of packages

Credentials for connecting to a neo4j database instance

To set params in Pycharm first we will set them in the Neo4j desktop

:params {customerEmail: 'josephhoward@yahoo.com', customerPhone: '400-108-5415', customerDevice: 'yyeiaxpltf82440jnb3v', customerIPAddress: '8.129.104.40', customerBillingAddress: '5493 Jones Islands\nBrownside, CA 51896', No_Transactions: 2, No_Orders: 2, No_Payments: 1}

Step 4: Powershell to run the JSON code

Powershell commands:

PS C:\Users\rswat> cd ‘.\PycharmProjects\Ecommerce Fraud Data\’

2. poetry shell

3. poetry update

4. uvicorn main:app — reload — host 0.0.0.0 — port 8080

5. localhost:8080/docs

Step 5: FAST API for JSON

Paste this link localhost:8080/docs in your browser to get the below page.

Step 6: Post the data for Customer Details

The JSON file which we downloaded should be posted to the customer in FastApi.
By using the FastaAPI the data is directly loaded to the Neo4j desktop.

To post the order details click on the order try it out and paste it in the blank box.

To load the relationship in FastAPI

Step 7: Now go to Neo4j Desktop and Open the Active Graph Database

Step 8: Visualisation

d. Importing JSON Using APOC

To use APOC queries to import JSON, I have used Neo4j Desktop Version: . Because versions above 5 aren’t supporting it at times.

Once you create a Local DBMS in the Project on Neo4j, click on setting and then add the following and then click on Apply

apoc.import.file.enabled=true
apoc.import.file.use_neo4j_config=true
server.config.strict_validation.enabled=true

Next, Activate the APOC Plugin to that same Local DBMS

Now Ingesting JSON Data into Customer Node with APOC

CALL apoc.load.json('file://Customer_DF.json')
YIELD value
WITH value
MERGE(c:Customer{customer_email:value.customerEmail,customer_device:value.customerDevice})
SET c.customer_phone= value.customerPhone, c.customerip_address= value.customerIPAddress, c.customerbilling_address= value.customerBillingAddress
RETURN c

Now Ingesting JSON Data into Order Node with APOC

CALL apoc.load.json('file://cust_transaction_details.json')
YIELD value
MERGE (o:Order {order_id: value.orderId, transaction_id: value.transactionId})
SET o.payment_methodid = value.paymentMethodId,
    o.paymentmethodregistration_failure = CASE value.paymentMethodRegistrationFailure WHEN '0' THEN FALSE WHEN '1' THEN TRUE END,
    o.payment_methodtype = value.paymentMethodType,
    o.paymentmethod_provider = value.payment_methodprovider,
    o.transaction_amount = toInteger(value.transactionAmount),
    o.transaction_failed = CASE value.transactionFailed WHEN '0' THEN FALSE WHEN '1' THEN TRUE END,
    o.order_state = value.orderState
RETURN o

Now Ingesting JSON Data To Create Relationship Between Customer and Order Nodes with APOC

CALL apoc.load.json('file://cust_transaction_details.json')
YIELD value
WITH value
MATCH(c:Customer{customer_email:value.customerEmail})
MATCH(o:Order{order_id:value.orderId, transaction_id: value.transactionId})
MERGE (c)-[r:MADE_TRANSACTION]->(o)
RETURN c,o