Move your analytics, not your data!
Containers open a whole new set of options for data analytics. Deploying and configuring a complete analytical stack can take less time than executing a query in your database.
GoodData has launched the modular analytical stack GoodData.CN that can be deployed as a single Docker image or as an elastic k8s application. The deployment and configuration of this stack can be easily automated. Let me share a quick example of how this stack works.
1. Pull Docker image
docker pull gooddata/gooddata-cn-ce
I tried this from my local Mac, Google cloud VM, and AWS EC3. In all cases, I was able to install the stack in ~ 20s (this depends on your network speed).
2. Run the image
docker run -t -i -p 3000:3000 gooddata/gooddata-cn-ce
After the container starts, you can access it on the http://localhost:3000
3. Connect to the database
curl http://localhost:3000/api/data-sources \
-H “Content-Type: application/vnd.gooddata.api+json” \
-H “Accept: application/vnd.gooddata.api+json” \
-H “Authorization: Bearer YWRtaW46Ym9vdHN0cmFwOmFkbWluMTIz” \
-X POST \
-d ‘{
“data”: {
“attributes”: {
“name”: “demo-ds”,
“url”: “jdbc:postgresql://localhost:5432/demo”,
“schema”: “demo”,
“type”: “POSTGRESQL”,
“username”: “demouser”,
“password”: “demopass”,
“enableCaching”: true
},
},
“id”: “demo-ds”,
“type”: “data-source”
}
}’
As you can see, this is a simple JDBC connection to my local Postgres database.
3. Create a workspace
curl http://localhost:3000/api/workspaces \
-H “Content-Type: application/vnd.gooddata.api+json” \
-H “Accept: application/vnd.gooddata.api+json” \
-H “Authorization: Bearer YWRtaW46Ym9vdHN0cmFwOmFkbWluMTIz” \
-X POST \
-d ‘{
“data”: {
“attributes”: {
“name”: “Demo”
}
},
“id”: “demo”,
“type”: “workspace”
}’
The workspace is a sandbox for a specific analytics scenario. It can be also used for deploying the same analytical scenario many times to different tenants (e.g. your customers).
4. Create an analytical model
curl http://localhost:3000/api/workspaces \
-H “Content-Type: application/vnd.gooddata.api+json” \
-H “Accept: application/vnd.gooddata.api+json” \
-H “Authorization: Bearer YWRtaW46Ym9vdHN0cmFwOmFkbWluMTIz” \
-X POST \
-d ‘{
“mappingOnly”:false,
”mode”:”append”,
”scanTables”:true,
”scanViews”:false,
”separator”:”__”,
”tablePrefix”:””,
”viewPrefix”:””,
”primaryLabelPrefix”:””,
”secondaryLabelPrefix”:”ls”,
”factPrefix”:”f”,
”datePrefix”:””,
”grainPrefix”:”gr”,
”referencePrefix”:”r”,
”grainReferencePrefix”:””,
”denormPrefix”:””
}'
This API scans the connected Postgres database for table and creates a Logical Data Model on top of the Postgres data. The model is returned as a JSON structure.
I published the model via HTTP PUT to my workspace. GoodData provides a Swagger UI console in their stack (http://localhost:3000/apidocs) for API documentation. I was able to execute the model publishing PUT request from there
Here is the model in the GoodData visual model editor.
I was able to create dozens of the workspaces with this setup with a few lines of Python code (LMK if you’re interested, I can share).
5. Analytics!
Go to http://localhost:3000/analyze/#/demo/ and start creating beautiful data visualizations.
The data visualizations can be embedded as components (React, Vue, Angular, etc.) to a web or mobile application. I played with GoodData.UI framework and was able to generate a boilerplate of a web application in few seconds.
Summary
With lightweight Python scripting, it took me less than 1 minute to deploy, and configure nice self-service analytics running in a container co-located with my database. 2 command lines, and 4 curl API calls.
GoodData Community Edition
Free GoodData.CN Community Edition is available from Docker Hub. You can read more about it on the GoodData website.