Getting started with Druid(Imply)

  • this one talks about how airbnb use it.
  • this one cover the challanges metamarkers run into using Druid.
  • this one does great job explaning core concept in Druid(roll up, schema design, architecture graph & what each component is doing etc). If you don’t have time to read original Druid doc, I highly recommend reading this one instead.
  • We don’t want to be the one setting it up, because we don’t want the over-head of maintaining, 24 * 7 oncall (seriouly, who like oncall)
  • The cluster is setup for a different team in a different department before we even start look into using it. The contract they have with Imply is pretty good. So yes, it’s free. (Nothing is really free, but cost center wise, it appears to be).
+---------------------------------------------------------------+
|AssignmentID | AssignmentValue| EventName | UserId | Timestamp |
+---------------------------------------------------------------+
| a00001 | Button:Red | Click | u000001 | 1557017456613 |
| a00002 | Button:Red | Null | u000002 | 1557017456622 |
| a00003 | Button:Blue | Null | u000003 | 1557017456655 |
| a00004 | Button:Red | Click | u000004 | 1557017456699 |
+---------------------------------------------------------------+

Ingestion.

//ingestion_spec.json
{
“type”: “index”,
“spec”: {
“dataSchema”: {
“dataSource”: “my_datasource”,
“parser”: {
“type”: “string”,
“parseSpec”: {
“format”: “csv”,
“columns”: [“project_id”, “assignment_id”, “user_id”, “ts”, “...”, “inputs”, “...”, “...”, “...”, “name”, “...”],
“dimensionsSpec”: {
“dimensions”: [
“user_id”,
“assignment_id”,
“...”,
“...”,
“...”,
“name”,
“...”,
“inputs”
]
},
“timestampSpec”: {
“column”: “ts”,
“format”: “auto”
}
}
},
“metricsSpec”: [
{
“type”: “count”,
“name”: “count”
}
],
“granularitySpec”: {
“type”: “uniform”,
“segmentGranularity”: “DAY”,
“queryGranularity”: {
“type”: “none”
},
“rollup”: false,
“intervals” : [ “2019–01–01/2019–01–07” ]
}
},
“ioConfig”: {
“type”: “index”,
“firehose”: {
“type”: “static-s3”,
“uris”: [
“s3://path/to/our.csv”
]
},
“appendToExisting”: false
},
“tuningConfig”: {
“forceExtendableShardSpecs”: false,
“type”: “index”
}
}
}
  • inputs → AssignmentValue
  • name → EventName.
curl -H "Content-Type: application/json" -X POST -d @ingestion_spec.json -v "our_druid_server:port/druid/indexer/v1/task"
curl -X 'GET' -v "our_druid_server:port/druid/indexer/v1/task/{job_id}/status"
{
"task":{job_id},
"status":{
...
"type":"index",
"statusCode":"SUCCESS",
"status":"SUCCESS",
"duration":445274
}
}

Query

SELECT COUNT(DISTINCT user_id) FROM "my_datasource"
WHERE (.... AND inputs='{"button":"blue"}')

Limits.

//ingestion2_spec.json
{
"type": "index",
"spec": {
"dataSchema": {
"dataSource": "my_datasource",
"parser": {
"type": "string",
"parseSpec": {
"format": "csv",
"columns": ["project_id", "assignment_id", "user_id", "ts", "...", "inputs", "session_id", "...", "...", "name", "..."],
"dimensionsSpec": {
"dimensions": [
"...",
"project_id",
"...",
"...",
"name",
"...",
"inputs"
]
},
"timestampSpec": {
"column": "ts",
"format": "auto"
}
}
},
"metricsSpec": [
{ "type": "thetaSketch", "name": "user_id_sketch", "fieldName": "user_id" },
{ "type": "thetaSketch", "name": "assignment_id_sketch", "fieldName": "assignment_id" }
],
"granularitySpec": {
"type": "uniform",
"segmentGranularity": "DAY",
"queryGranularity": {
"type": "none"
},
"rollup": false,
"intervals" : [ "2019-04-01/2019-04-07" ]
}
},
"ioConfig": {
"type": "index",
"firehose": {
"type": "static-s3",
"uris": [
"s3://….csv"
]
},
"appendToExisting": false
},
"tuningConfig": {
"forceExtendableShardSpecs": false,
"type": "index"
}
}
}
{
"task":{job_id},
"status":{
...
"type":"index",
"statusCode":"SUCCESS",
"status":"SUCCESS",
"duration":261299
}
}
{
"queryType": "groupBy",
"dataSource": "my_datasource",
"granularity": "ALL",
"dimensions": [],
"aggregations": [
{ "type": "thetaSketch", "name": "unique_users", "fieldName": "user_id_sketch" }
],
"intervals": [ "2019-01-01/2019-01-07" ]
}
curl -X 'POST' -H 'Content-Type:application/json' -d @join2_query.json -v "our_druid_server:port/druid/v2?pretty"

JOIN

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store