Discover AuraDB Free: Week 14 — Consumer Complaints Database
In 2015 our data scientist Nicole White ran a Webinar “LOAD CSV in the Real World,” which is really popular. It has more than 56k views and people are still referring to it frequently because it is so well done.
That’s why we decided this time in our live-stream to re-enact Nicole’s webinar with Neo4j 4.3 and AuraDB Free to give people today’s equivalent of that session.
Here is this week’s video:
References
- Consumer Complaints Database: https://catalog.data.gov/dataset/consumer-complaint-database
- Nicole White’s webinar “LOAD CVS in the real World” and GitHub repository
- Our GitHub Repository
Dataset
The consumer complaints database is available with data since 2011 from data.gov, with more than 2.3 million complaints about products from different companies totalling 1.4GB (380MB compressed).
We’re using xsv
a really neat and fast CSV processing tool written in Rust for some pre-processing.
xsv frequency -s 'Company' complaints.csv
To fit the data into the LIMITs of AuraDB free, we need to sub-filter to a certain range — the complaints received in November 2021 are 12295, which is a good number for us.
xsv search
for filtering
xsv search -s 'Date received' '2021-11-' complaints.csv | wc -l
12295xsv search -s 'Date received' '2021-11-' complaints.csv > complaints-nov-2021.csv
The file is also available here.
To load the data into Neo4j, we need it on a publicly accessible URL.
We can put it into a GitHub Gist or a Google Spreadsheet.
- GitHub Gist
- Short URL: https://git.io/J1m7e
- Google Sheet (File → Publish to Web → Publish Sheet)
Let’s try to load all rows?
LOAD CSV WITH HEADERS FROM "https://git.io/J1m7e" AS row
RETURN count(*);
// 12295
What does the data look like in Cypher?
LOAD CSV WITH HEADERS FROM "https://git.io/J1m7e" AS row
RETURN row LIMIT 1;
When loading the data via Cypher we get each row RETURN
ed as a map/dict/hash object that we then can use to create our graph.
{
"Company": "EQUIFAX, INC.",
"Date received": "2021-11-03",
"Issue": "Incorrect information on your report",
"Timely response?": "Yes",
"Sub-product": "Credit reporting",
"Consumer complaint narrative": null,
"ZIP code": "45458",
"Consumer consent provided?": null,
"Product": "Credit reporting, credit repair services, or ...",
"Consumer disputed?": "N/A",
"Company public response": null,
"Sub-issue": "Information belongs to someone else",
"Date sent to company": "2021-11-03",
"State": "OH",
"Complaint ID": "4874335",
"Submitted via": "Web",
"Tags": null,
"Company response to consumer": "In progress"
}
Data Model
We closely follow Nicole’s modeling, with the exception of new datatypes for dates. All names are capitalized to ensure uniqueness.
We have the following Nodes in our models that are connected with appropriate relationships.
Complaint
- core entity, also holds user informationCompany
- the company the complaint is againstProduct
- the product that is complained aboutSubProduct
- optional subclassification of productIssue
- the categorized issue with the productSubIssue
- optional subclassification of the issueResponse
- Response of the companyTag
- there are a few tagged values
Here is our data model, which is very similar to the original.
Data Import
First we create constraints for our key-properties, the complaint-id, and the names for the other entities.
CREATE CONSTRAINT ON (c:Complaint) ASSERT c.id IS UNIQUE;CREATE CONSTRAINT ON (c:Company) ASSERT c.name IS UNIQUE;CREATE CONSTRAINT ON (c:Response) ASSERT c.name IS UNIQUE;CREATE CONSTRAINT ON (c:Product) ASSERT c.name IS UNIQUE;
CREATE CONSTRAINT ON (c:SubProduct) ASSERT c.name IS UNIQUE;CREATE CONSTRAINT ON (c:Issue) ASSERT c.name IS UNIQUE;
CREATE CONSTRAINT ON (c:SubIssue) ASSERT c.name IS UNIQUE;CREATE CONSTRAINT ON (c:Tag) ASSERT c.name IS UNIQUE;
We can run :schema
to see if our constraints were correctly created.
Then we import the data step by step, starting with the Complaint
s. And then we run additional passes to load and connect the other entities.
We need to deal with a few optional elements, like SubIssue
, SubProduct
, and Tag
. Tags also form a comma separated list.
For complaints, we MERGE
(get-or-create) each by the key-id and set the date, zip-code, and state as properties. Note the backticked field names if they are not just alphanumeric characters.
Importing complaint
LOAD CSV WITH HEADERS FROM "https://git.io/J1m7e" AS row
MERGE (c:Complaint {id:row.`Complaint ID`})
SET c.dateReceived = date(row.`Date received`)
SET c.zip = row.`ZIP code`
SET c.state = row.State;
For companies, we do our second pass over the data. We find the Complaint
by id
and MERGE
the Company
and then MERGE
the AGAINST
relationship (so only one relationship can exist between each complaint and a company). Then we set the date properties for when the complaint was sent to this company.
Importing company
LOAD CSV WITH HEADERS FROM "https://git.io/J1m7e" AS rowMATCH (c:Complaint {id:row.`Complaint ID`})MERGE (co:Company {name:toUpper(row.Company)})MERGE (c)-[rel:AGAINST]->(co)
SET rel.date = date(row.`Date sent to company`);
Most frequently companies complained against
MATCH (n:Company)<-[:AGAINST]-()
RETURN n.name, count(*) AS c
ORDER BY c DESC LIMIT 10;╒════════════════════════════════════════╤════╕
│"n.name" │"c" │
╞════════════════════════════════════════╪════╡
│"EQUIFAX, INC." │5858│
├────────────────────────────────────────┼────┤
│"TRANSUNION INTERMEDIATE HOLDINGS, INC."│3583│
├────────────────────────────────────────┼────┤
│"EXPERIAN INFORMATION SOLUTIONS INC." │707 │
├────────────────────────────────────────┼────┤
│"NAVY FEDERAL CREDIT UNION" │114 │
├────────────────────────────────────────┼────┤
│"ALLY FINANCIAL INC." │96 │
└────────────────────────────────────────┴────┘
For adding Product
s, we need to first connect the complaint to the product as before, but then filter out rows where the Sub-product
cell contains an empty string. For all others we merge and connect the SubProduct
both to the Product
and the Complaint
.
Importing product and sub-product
LOAD CSV WITH HEADERS FROM "https://git.io/J1m7e" as rowMATCH (c:Complaint {id:row.`Complaint ID`})MERGE (p:Product {name:toUpper(row.Product)})
MERGE (c)-[:ABOUT]->(p)WITH * WHERE trim(row.`Sub-product`) <> ""MERGE (sp:SubProduct {name:toUpper(row.`Sub-product`)})
MERGE (c)-[:ABOUT]->(sp)
MERGE (sp)-[:IN_CATEGORY]->(p);
We add issues and sub-issues the same way.
Importing issue and sub-issue
LOAD CSV WITH HEADERS FROM "https://git.io/J1m7e" as rowMATCH (c:Complaint {id:row.`Complaint ID`})MERGE (iss:Issue {name:toUpper(row.Issue)})
MERGE (c)-[:WITH]->(iss)WITH * WHERE trim(row.`Sub-issue`) <> ""MERGE (si:SubIssue {name:toUpper(row.`Sub-issue`)})
MERGE (c)-[:WITH]->(si)
MERGE (si)-[:IN_CATEGORY]->(iss);
Now we can have a bit of fun with the data.
Complaint counts by state and issue
MATCH (n:Complaint)-[:WITH]->(iss:Issue)
RETURN n.state, iss.name, count(*) as c
ORDER BY c DESC LIMIT 5;╒═════════╤═══════════════════════════════════════════╤═══╕
│"n.state"│"iss.name" │"c"│
╞═════════╪═══════════════════════════════════════════╪═══╡
│"TX" │"PROBLEM WITH A CREDIT REPORTING COMPANY'S │847│
│ │INVESTIGATION INTO AN EXISTING PROBLEM" │ │
├─────────┼───────────────────────────────────────────┼───┤
│"PA" │"INCORRECT INFORMATION ON YOUR REPORT" │596│
├─────────┼───────────────────────────────────────────┼───┤
│"FL" │"INCORRECT INFORMATION ON YOUR REPORT" │542│
├─────────┼───────────────────────────────────────────┼───┤
│"CA" │"PROBLEM WITH A CREDIT REPORTING COMPANY'S │455│
│ │INVESTIGATION INTO AN EXISTING PROBLEM" │ │
├─────────┼───────────────────────────────────────────┼───┤
│"TX" │"INCORRECT INFORMATION ON YOUR REPORT" │425│
└─────────┴───────────────────────────────────────────┴───┘
If there is a Response
, it is coming from the Company
pointing to the Complaint
. So we need to match both in our pass and then connect the response to the complaint and to the company. Additionally, we store some properties on the relationship. Yes/No/Blank
values are converted to boolean just with a boolean expression (value='text'
). For other conditional expressions we can use CASE
.
Importing response
LOAD CSV WITH HEADERS FROM "https://git.io/J1m7e" as rowMATCH (c:Complaint {id:row.`Complaint ID`})
MATCH (co:Company {name:toUpper(row.Company)})
WITH * WHERE trim(row.`Company response to consumer`) <> ""MERGE (res:Response {name:toUpper(row.`Company response to consumer`)})MERGE (c)<-[rel:TO]-(res)
SET rel.disputed = (row.`Consumer disputed?` = "Yes")
SET rel.timely = (row.`Timely response?` = "Yes")
SET rel.text = CASE row.`Company public response`
WHEN "" THEN null
ELSE row.`Company public response` ENDMERGE (co)-[:ANSWERED]->(res);
For Tag
s, we need to do two things. First,split
the comma-separated name into its constituent parts. Then taking that list, turn it into rows with UNWIND and connect the newly merged Tag nodes to the previously found Complaint.
The WITH * WHERE …
is there to filter arbitrary data with a WHERE
clause.
Importing tag
LOAD CSV WITH HEADERS FROM "https://git.io/J1m7e" as rowMATCH (c:Complaint {id:row.`Complaint ID`})WITH * WHERE trim(row.`Tags`) <> ""
WITH distinct row.Tags as tagsName,c
UNWIND split(tagsName,", ") as tagNameMERGE (t:Tag {name:toUpper(tagName)})
MERGE (c)-[:TAGGED]->(t);
Now we can look at our imported graph model with
call db.schema.visualization();
or better call apoc.meta.graph();
to see the schema of the data we have imported. It’s great to see how well it fits our model.
Exploration
The queries here were taken and adjusted from Nicole’s GitHub repository.
Sub-issues with communication tactics
MATCH (i:Issue {name:'COMMUNICATION TACTICS'})
MATCH (sub:SubIssue)-[:IN_CATEGORY]->(i)
RETURN sub.name AS subissue
ORDER BY subissue;╒════════════════════════════════════════════════════════════╕
│"subissue" │
╞════════════════════════════════════════════════════════════╡
│"CALLED BEFORE 8AM OR AFTER 9PM" │
├────────────────────────────────────────────────────────────┤
│"FREQUENT OR REPEATED CALLS" │
├────────────────────────────────────────────────────────────┤
│"USED OBSCENE, PROFANE, OR OTHER ABUSIVE LANGUAGE" │
├────────────────────────────────────────────────────────────┤
│"YOU TOLD THEM TO STOP CONTACTING YOU, BUT THEY KEEP TRYING"│
└────────────────────────────────────────────────────────────┘
Common responses
MATCH (r:Response)-[:TO]->(:Complaint)
RETURN r.name AS response, COUNT(*) AS count
ORDER BY count DESC;╒═════════════════════════════════╤═══════╕
│"response" │"count"│
╞═════════════════════════════════╪═══════╡
│"IN PROGRESS" │6097 │
├─────────────────────────────────┼───────┤
│"CLOSED WITH EXPLANATION" │5970 │
├─────────────────────────────────┼───────┤
│"CLOSED WITH NON-MONETARY RELIEF"│192 │
├─────────────────────────────────┼───────┤
│"CLOSED WITH MONETARY RELIEF" │35 │
└─────────────────────────────────┴───────┘
Sub-issues in multiple different issues
MATCH (sub:SubIssue)-[:IN_CATEGORY]->(i:Issue)
WITH sub, COLLECT(i.name) AS issues
WHERE size(issues) > 1
RETURN sub.name, issues LIMIT 2;
╒══════════════════════════════╤══════════════════════════════╕
│"sub.name" │"issues" │
╞══════════════════════════════╪══════════════════════════════╡
│"DIFFICULTY SUBMITTING A DISPU│["PROBLEM WITH A CREDIT REPORT│
│TE OR GETTING INFORMATION ABOU│ING COMPANY'S INVESTIGATION IN│
│T A DISPUTE OVER THE PHONE" │TO AN EXISTING PROBLEM","PROBL│
│ │EM WITH A COMPANY'S INVESTIGAT│
│ │ION INTO AN EXISTING ISSUE"] │
├──────────────────────────────┼──────────────────────────────┤
│"THEIR INVESTIGATION DID NOT F│["PROBLEM WITH A CREDIT REPORT│
│IX AN ERROR ON YOUR REPORT" │ING COMPANY'S INVESTIGATION IN│
│ │TO AN EXISTING PROBLEM","PROBL│
│ │EM WITH A COMPANY'S INVESTIGAT│
│ │ION INTO AN EXISTING ISSUE"] │
└──────────────────────────────┴──────────────────────────────┘
Product and issues with ‘EQUIFAX, INC.’
MATCH (ef:Company {name:'EQUIFAX, INC.'})
MATCH (complaint:Complaint)-[:AGAINST]->(ef)
MATCH (:Response)-[:TO]->(complaint)
MATCH (complaint)-[:ABOUT]->(p:Product)
MATCH (complaint)-[:WITH]->(i:Issue)
RETURN p.name AS product, i.name AS issue, COUNT(*) AS count
ORDER BY count DESC LIMIT 2;╒═══════════════════════╤═══════════════════════╤═══════╕
│"product" │"issue" │"count"│
╞═══════════════════════╪═══════════════════════╪═══════╡
│"CREDIT REPORTING, CRED│"PROBLEM WITH A CREDIT │2394 │
│IT REPAIR SERVICES, OR │REPORTING COMPANY'S INV│ │
│OTHER PERSONAL CONSUMER│ESTIGATION INTO AN EXIS│ │
│ REPORTS" │TING PROBLEM" │ │
├───────────────────────┼───────────────────────┼───────┤
│"CREDIT REPORTING, CRED│"INCORRECT INFORMATION │2262 │
│IT REPAIR SERVICES, OR │ON YOUR REPORT" │ │
│OTHER PERSONAL CONSUMER│ │ │
│ REPORTS" │ │ │
└───────────────────────┴───────────────────────┴───────┘
Which (sub-)products have sub-issues about obscene language
MATCH (subIssue:SubIssue)
WHERE subIssue.name contains 'OBSCENE'
MATCH (complaint:Complaint)-[:WITH]->(subIssue)
MATCH (complaint)-[:ABOUT]->(p:Product)
OPTIONAL MATCH (complaint)-[:ABOUT]->(sub:SubProduct)
RETURN p.name AS product, sub.name AS subproduct, COUNT(*) AS count
ORDER BY count DESC;╒═════════════════╤══════════════════╤═══════╕
│"product" │"subproduct" │"count"│
╞═════════════════╪══════════════════╪═══════╡
│"DEBT COLLECTION"│"OTHER DEBT" │3 │
├─────────────────┼──────────────────┼───────┤
│"DEBT COLLECTION"│"CREDIT CARD DEBT"│2 │
├─────────────────┼──────────────────┼───────┤
│"DEBT COLLECTION"│"PAYDAY LOAN DEBT"│2 │
├─────────────────┼──────────────────┼───────┤
│"DEBT COLLECTION"│"MEDICAL DEBT" │1 │
└─────────────────┴──────────────────┴───────┘
Typical response percentages per product
MATCH ()<--(r:Response) with r, count(*) as rCount
MATCH (p:Product)<--(:Complaint)<--(r)
RETURN p.name, r.name, count(*) as c, (count(*)*100)/rCount as percent ORDER BY percent DESC LIMIT 10;╒══════════════════════╤══════════════════════╤════╤═════════╕
│"p.name" │"r.name" │"c" │"percent"│
╞══════════════════════╪══════════════════════╪════╪═════════╡
│"CREDIT REPORTING, CRE│"IN PROGRESS" │5672│93 │
│DIT REPAIR SERVICES, O│ │ │ │
│R OTHER PERSONAL CONSU│ │ │ │
│MER REPORTS" │ │ │ │
├──────────────────────┼──────────────────────┼────┼─────────┤
│"CREDIT REPORTING, CRE│"CLOSED WITH EXPLANATI│4752│79 │
│DIT REPAIR SERVICES, O│ON" │ │ │
│R OTHER PERSONAL CONSU│ │ │ │
│MER REPORTS" │ │ │ │
├──────────────────────┼──────────────────────┼────┼─────────┤
│"CREDIT REPORTING, CRE│"CLOSED WITH NON-MONET│135 │70 │
│DIT REPAIR SERVICES, O│ARY RELIEF" │ │ │
│R OTHER PERSONAL CONSU│ │ │ │
│MER REPORTS" │ │ │ │
├──────────────────────┼──────────────────────┼────┼─────────┤
Conclusion
This was a fun dataset and a bit more challenging to import. There are lots of possibilities to build analytics and visualizations with the data now that we have it in the graph. Let us know if you found it helpful to explore with us here and if you built anything on top of the data.