Discover AuraDB Free: Week 14 — Consumer Complaints Database

Published in

Neo4j Developer Blog

8 min readNov 24, 2021

In 2015 our data scientist Nicole White ran a Webinar “LOAD CSV in the Real World,” which is really popular. It has more than 56k views and people are still referring to it frequently because it is so well done.

That’s why we decided this time in our live-stream to re-enact Nicole’s webinar with Neo4j 4.3 and AuraDB Free to give people today’s equivalent of that session.

Here is this week’s video:

References

Consumer Complaints Database: https://catalog.data.gov/dataset/consumer-complaint-database
Nicole White’s webinar “LOAD CVS in the real World” and GitHub repository
Our GitHub Repository

Dataset

The consumer complaints database is available with data since 2011 from data.gov, with more than 2.3 million complaints about products from different companies totalling 1.4GB (380MB compressed).

We’re using xsv a really neat and fast CSV processing tool written in Rust for some pre-processing.

xsv frequency -s 'Company' complaints.csv

To fit the data into the LIMITs of AuraDB free, we need to sub-filter to a certain range — the complaints received in November 2021 are 12295, which is a good number for us.

`xsv search` for filtering

xsv search -s 'Date received' '2021-11-' complaints.csv | wc -l
   12295xsv search -s 'Date received' '2021-11-' complaints.csv > complaints-nov-2021.csv

The file is also available here.

To load the data into Neo4j, we need it on a publicly accessible URL.

We can put it into a GitHub Gist or a Google Spreadsheet.

GitHub Gist
Short URL: https://git.io/J1m7e
Google Sheet (File → Publish to Web → Publish Sheet)

Let’s try to load all rows?

LOAD CSV WITH HEADERS FROM "https://git.io/J1m7e" AS row
RETURN count(*);
// 12295

What does the data look like in Cypher?

LOAD CSV WITH HEADERS FROM "https://git.io/J1m7e" AS row
RETURN row LIMIT 1;

When loading the data via Cypher we get each row RETURNed as a map/dict/hash object that we then can use to create our graph.

{
  "Company": "EQUIFAX, INC.",
  "Date received": "2021-11-03",
  "Issue": "Incorrect information on your report",
  "Timely response?": "Yes",
  "Sub-product": "Credit reporting",
  "Consumer complaint narrative": null,
  "ZIP code": "45458",
  "Consumer consent provided?": null,
  "Product": "Credit reporting, credit repair services, or ...",
  "Consumer disputed?": "N/A",
  "Company public response": null,
  "Sub-issue": "Information belongs to someone else",
  "Date sent to company": "2021-11-03",
  "State": "OH",
  "Complaint ID": "4874335",
  "Submitted via": "Web",
  "Tags": null,
  "Company response to consumer": "In progress"
}

Data Model

We closely follow Nicole’s modeling, with the exception of new datatypes for dates. All names are capitalized to ensure uniqueness.

We have the following Nodes in our models that are connected with appropriate relationships.

Complaint - core entity, also holds user information
Company - the company the complaint is against
Product - the product that is complained about
SubProduct - optional subclassification of product
Issue - the categorized issue with the product
SubIssue - optional subclassification of the issue
Response - Response of the company
Tag - there are a few tagged values

Here is our data model, which is very similar to the original.

Data Import

First we create constraints for our key-properties, the complaint-id, and the names for the other entities.

CREATE CONSTRAINT ON (c:Complaint) ASSERT c.id IS UNIQUE;CREATE CONSTRAINT ON (c:Company) ASSERT c.name IS UNIQUE;CREATE CONSTRAINT ON (c:Response) ASSERT c.name IS UNIQUE;CREATE CONSTRAINT ON (c:Product) ASSERT c.name IS UNIQUE;
CREATE CONSTRAINT ON (c:SubProduct) ASSERT c.name IS UNIQUE;CREATE CONSTRAINT ON (c:Issue) ASSERT c.name IS UNIQUE;
CREATE CONSTRAINT ON (c:SubIssue) ASSERT c.name IS UNIQUE;CREATE CONSTRAINT ON (c:Tag) ASSERT c.name IS UNIQUE;

We can run :schema to see if our constraints were correctly created.

Then we import the data step by step, starting with the Complaints. And then we run additional passes to load and connect the other entities.

We need to deal with a few optional elements, like SubIssue, SubProduct, and Tag. Tags also form a comma separated list.

For complaints, we MERGE (get-or-create) each by the key-id and set the date, zip-code, and state as properties. Note the backticked field names if they are not just alphanumeric characters.

Importing complaint

LOAD CSV WITH HEADERS FROM "https://git.io/J1m7e" AS row
MERGE (c:Complaint {id:row.`Complaint ID`})
SET c.dateReceived = date(row.`Date received`)
SET c.zip = row.`ZIP code`
SET c.state = row.State;

For companies, we do our second pass over the data. We find the Complaint by id and MERGE the Company and then MERGE the AGAINST relationship (so only one relationship can exist between each complaint and a company). Then we set the date properties for when the complaint was sent to this company.

Importing company

LOAD CSV WITH HEADERS FROM "https://git.io/J1m7e" AS rowMATCH (c:Complaint {id:row.`Complaint ID`})MERGE (co:Company {name:toUpper(row.Company)})MERGE (c)-[rel:AGAINST]->(co)
SET rel.date = date(row.`Date sent to company`);

Most frequently companies complained against

MATCH (n:Company)<-[:AGAINST]-()
RETURN n.name, count(*) AS c
ORDER BY c DESC LIMIT 10;╒════════════════════════════════════════╤════╕
│"n.name"                                │"c" │
╞════════════════════════════════════════╪════╡
│"EQUIFAX, INC."                         │5858│
├────────────────────────────────────────┼────┤
│"TRANSUNION INTERMEDIATE HOLDINGS, INC."│3583│
├────────────────────────────────────────┼────┤
│"EXPERIAN INFORMATION SOLUTIONS INC."   │707 │
├────────────────────────────────────────┼────┤
│"NAVY FEDERAL CREDIT UNION"             │114 │
├────────────────────────────────────────┼────┤
│"ALLY FINANCIAL INC."                   │96  │
└────────────────────────────────────────┴────┘

For adding Products, we need to first connect the complaint to the product as before, but then filter out rows where the Sub-product cell contains an empty string. For all others we merge and connect the SubProduct both to the Product and the Complaint.

Importing product and sub-product

LOAD CSV WITH HEADERS FROM "https://git.io/J1m7e" as rowMATCH (c:Complaint {id:row.`Complaint ID`})MERGE (p:Product {name:toUpper(row.Product)})
MERGE (c)-[:ABOUT]->(p)WITH * WHERE trim(row.`Sub-product`) <> ""MERGE (sp:SubProduct {name:toUpper(row.`Sub-product`)})
MERGE (c)-[:ABOUT]->(sp)
MERGE (sp)-[:IN_CATEGORY]->(p);

We add issues and sub-issues the same way.

Importing issue and sub-issue

LOAD CSV WITH HEADERS FROM "https://git.io/J1m7e" as rowMATCH (c:Complaint {id:row.`Complaint ID`})MERGE (iss:Issue {name:toUpper(row.Issue)})
MERGE (c)-[:WITH]->(iss)WITH * WHERE trim(row.`Sub-issue`) <> ""MERGE (si:SubIssue {name:toUpper(row.`Sub-issue`)})
MERGE (c)-[:WITH]->(si)
MERGE (si)-[:IN_CATEGORY]->(iss);

Now we can have a bit of fun with the data.

Complaint counts by state and issue

MATCH (n:Complaint)-[:WITH]->(iss:Issue)
RETURN n.state, iss.name, count(*) as c
ORDER BY c DESC LIMIT 5;╒═════════╤═══════════════════════════════════════════╤═══╕
│"n.state"│"iss.name"                                 │"c"│
╞═════════╪═══════════════════════════════════════════╪═══╡
│"TX"     │"PROBLEM WITH A CREDIT REPORTING COMPANY'S │847│
│         │INVESTIGATION INTO AN EXISTING PROBLEM"    │   │
├─────────┼───────────────────────────────────────────┼───┤
│"PA"     │"INCORRECT INFORMATION ON YOUR REPORT"     │596│
├─────────┼───────────────────────────────────────────┼───┤
│"FL"     │"INCORRECT INFORMATION ON YOUR REPORT"     │542│
├─────────┼───────────────────────────────────────────┼───┤
│"CA"     │"PROBLEM WITH A CREDIT REPORTING COMPANY'S │455│
│         │INVESTIGATION INTO AN EXISTING PROBLEM"    │   │
├─────────┼───────────────────────────────────────────┼───┤
│"TX"     │"INCORRECT INFORMATION ON YOUR REPORT"     │425│
└─────────┴───────────────────────────────────────────┴───┘

If there is a Response, it is coming from the Company pointing to the Complaint. So we need to match both in our pass and then connect the response to the complaint and to the company. Additionally, we store some properties on the relationship. Yes/No/Blank values are converted to boolean just with a boolean expression (value='text'). For other conditional expressions we can use CASE.

Importing response

LOAD CSV WITH HEADERS FROM "https://git.io/J1m7e" as rowMATCH (c:Complaint {id:row.`Complaint ID`})
MATCH (co:Company {name:toUpper(row.Company)})
WITH * WHERE trim(row.`Company response to consumer`) <> ""MERGE (res:Response {name:toUpper(row.`Company response to consumer`)})MERGE (c)<-[rel:TO]-(res)
SET rel.disputed = (row.`Consumer disputed?` = "Yes")
SET rel.timely = (row.`Timely response?` = "Yes")
SET rel.text = CASE row.`Company public response`
                WHEN "" THEN null
                ELSE row.`Company public response` ENDMERGE (co)-[:ANSWERED]->(res);

For Tags, we need to do two things. First,split the comma-separated name into its constituent parts. Then taking that list, turn it into rows with UNWIND and connect the newly merged Tag nodes to the previously found Complaint.

The WITH * WHERE … is there to filter arbitrary data with a WHERE clause.

Importing tag

LOAD CSV WITH HEADERS FROM "https://git.io/J1m7e" as rowMATCH (c:Complaint {id:row.`Complaint ID`})WITH * WHERE trim(row.`Tags`) <> ""
WITH distinct row.Tags as tagsName,c
UNWIND split(tagsName,", ") as tagNameMERGE (t:Tag {name:toUpper(tagName)})
MERGE (c)-[:TAGGED]->(t);

Now we can look at our imported graph model with
call db.schema.visualization(); or better call apoc.meta.graph(); to see the schema of the data we have imported. It’s great to see how well it fits our model.

Exploration

The queries here were taken and adjusted from Nicole’s GitHub repository.

Sub-issues with communication tactics

MATCH (i:Issue {name:'COMMUNICATION TACTICS'})
MATCH (sub:SubIssue)-[:IN_CATEGORY]->(i)
RETURN sub.name AS subissue
ORDER BY subissue;╒════════════════════════════════════════════════════════════╕
│"subissue"                                                  │
╞════════════════════════════════════════════════════════════╡
│"CALLED BEFORE 8AM OR AFTER 9PM"                            │
├────────────────────────────────────────────────────────────┤
│"FREQUENT OR REPEATED CALLS"                                │
├────────────────────────────────────────────────────────────┤
│"USED OBSCENE, PROFANE, OR OTHER ABUSIVE LANGUAGE"          │
├────────────────────────────────────────────────────────────┤
│"YOU TOLD THEM TO STOP CONTACTING YOU, BUT THEY KEEP TRYING"│
└────────────────────────────────────────────────────────────┘

Common responses

MATCH (r:Response)-[:TO]->(:Complaint)
RETURN r.name AS response, COUNT(*) AS count
ORDER BY count DESC;╒═════════════════════════════════╤═══════╕
│"response"                       │"count"│
╞═════════════════════════════════╪═══════╡
│"IN PROGRESS"                    │6097   │
├─────────────────────────────────┼───────┤
│"CLOSED WITH EXPLANATION"        │5970   │
├─────────────────────────────────┼───────┤
│"CLOSED WITH NON-MONETARY RELIEF"│192    │
├─────────────────────────────────┼───────┤
│"CLOSED WITH MONETARY RELIEF"    │35     │
└─────────────────────────────────┴───────┘

Sub-issues in multiple different issues

MATCH (sub:SubIssue)-[:IN_CATEGORY]->(i:Issue)
WITH sub, COLLECT(i.name) AS issues
WHERE size(issues) > 1
RETURN sub.name, issues LIMIT 2;
╒══════════════════════════════╤══════════════════════════════╕
│"sub.name"                    │"issues"                      │
╞══════════════════════════════╪══════════════════════════════╡
│"DIFFICULTY SUBMITTING A DISPU│["PROBLEM WITH A CREDIT REPORT│
│TE OR GETTING INFORMATION ABOU│ING COMPANY'S INVESTIGATION IN│
│T A DISPUTE OVER THE PHONE"   │TO AN EXISTING PROBLEM","PROBL│
│                              │EM WITH A COMPANY'S INVESTIGAT│
│                              │ION INTO AN EXISTING ISSUE"]  │
├──────────────────────────────┼──────────────────────────────┤
│"THEIR INVESTIGATION DID NOT F│["PROBLEM WITH A CREDIT REPORT│
│IX AN ERROR ON YOUR REPORT"   │ING COMPANY'S INVESTIGATION IN│
│                              │TO AN EXISTING PROBLEM","PROBL│
│                              │EM WITH A COMPANY'S INVESTIGAT│
│                              │ION INTO AN EXISTING ISSUE"]  │
└──────────────────────────────┴──────────────────────────────┘

Product and issues with ‘EQUIFAX, INC.’

MATCH (ef:Company {name:'EQUIFAX, INC.'})
MATCH (complaint:Complaint)-[:AGAINST]->(ef)
MATCH (:Response)-[:TO]->(complaint)
MATCH (complaint)-[:ABOUT]->(p:Product)
MATCH (complaint)-[:WITH]->(i:Issue)
RETURN p.name AS product, i.name AS issue, COUNT(*) AS count
ORDER BY count DESC LIMIT 2;╒═══════════════════════╤═══════════════════════╤═══════╕
│"product"              │"issue"                │"count"│
╞═══════════════════════╪═══════════════════════╪═══════╡
│"CREDIT REPORTING, CRED│"PROBLEM WITH A CREDIT │2394   │
│IT REPAIR SERVICES, OR │REPORTING COMPANY'S INV│       │
│OTHER PERSONAL CONSUMER│ESTIGATION INTO AN EXIS│       │
│ REPORTS"              │TING PROBLEM"          │       │
├───────────────────────┼───────────────────────┼───────┤
│"CREDIT REPORTING, CRED│"INCORRECT INFORMATION │2262   │
│IT REPAIR SERVICES, OR │ON YOUR REPORT"        │       │
│OTHER PERSONAL CONSUMER│                       │       │
│ REPORTS"              │                       │       │
└───────────────────────┴───────────────────────┴───────┘

Which (sub-)products have sub-issues about obscene language

MATCH (subIssue:SubIssue)
WHERE subIssue.name contains 'OBSCENE'
MATCH (complaint:Complaint)-[:WITH]->(subIssue)
MATCH (complaint)-[:ABOUT]->(p:Product)
OPTIONAL MATCH (complaint)-[:ABOUT]->(sub:SubProduct)
RETURN p.name AS product, sub.name AS subproduct, COUNT(*) AS count
ORDER BY count DESC;╒═════════════════╤══════════════════╤═══════╕
│"product"        │"subproduct"      │"count"│
╞═════════════════╪══════════════════╪═══════╡
│"DEBT COLLECTION"│"OTHER DEBT"      │3      │
├─────────────────┼──────────────────┼───────┤
│"DEBT COLLECTION"│"CREDIT CARD DEBT"│2      │
├─────────────────┼──────────────────┼───────┤
│"DEBT COLLECTION"│"PAYDAY LOAN DEBT"│2      │
├─────────────────┼──────────────────┼───────┤
│"DEBT COLLECTION"│"MEDICAL DEBT"    │1      │
└─────────────────┴──────────────────┴───────┘

Typical response percentages per product

MATCH ()<--(r:Response) with r, count(*) as rCount
MATCH (p:Product)<--(:Complaint)<--(r)
RETURN p.name, r.name, count(*) as c, (count(*)*100)/rCount as percent ORDER BY percent DESC LIMIT 10;╒══════════════════════╤══════════════════════╤════╤═════════╕
│"p.name"              │"r.name"              │"c" │"percent"│
╞══════════════════════╪══════════════════════╪════╪═════════╡
│"CREDIT REPORTING, CRE│"IN PROGRESS"         │5672│93       │
│DIT REPAIR SERVICES, O│                      │    │         │
│R OTHER PERSONAL CONSU│                      │    │         │
│MER REPORTS"          │                      │    │         │
├──────────────────────┼──────────────────────┼────┼─────────┤
│"CREDIT REPORTING, CRE│"CLOSED WITH EXPLANATI│4752│79       │
│DIT REPAIR SERVICES, O│ON"                   │    │         │
│R OTHER PERSONAL CONSU│                      │    │         │
│MER REPORTS"          │                      │    │         │
├──────────────────────┼──────────────────────┼────┼─────────┤
│"CREDIT REPORTING, CRE│"CLOSED WITH NON-MONET│135 │70       │
│DIT REPAIR SERVICES, O│ARY RELIEF"           │    │         │
│R OTHER PERSONAL CONSU│                      │    │         │
│MER REPORTS"          │                      │    │         │
├──────────────────────┼──────────────────────┼────┼─────────┤

Conclusion

This was a fun dataset and a bit more challenging to import. There are lots of possibilities to build analytics and visualizations with the data now that we have it in the graph. Let us know if you found it helpful to explore with us here and if you built anything on top of the data.