Real time analytic — Live Table

By John Cheng, Senior Data Engineer

HK01 Product & Technology team
HK01 Tech Blog
4 min readAug 7, 2020

--

with Dash DataTable and ElasticSearch

When talking about visualization tools, Tableau and Qlik come in the first place. However, they are weak in doing real time stuffs, e.g. automatic refresh and user input.

ElasticSearch + Kibana can get things done, why bother?
At first, we were using Kibana but we found that the visualization is too limited. BI charts usually have aggregation on multiple dimensions but the way Kibana Visualization handles dimensions/metrics is poor.

Here are the tools used to make a live table.
- Plotly Dash
- ElasticSearch (Open Distro)

Why Dash?

Dash is pure Python. That means I can transform and parse data whatever I want, not limited by the result from a relational database.

Dash is a productive Python framework for building web applications. Written on top of Flask, Plotly.js, and React.js, Dash is ideal for building data visualization apps with highly custom user interfaces in pure Python. It’s particularly suited for anyone who works with data in Python.

Why ElasticSearch?

ElasticSearch is a time series data store. It is fast, distributed and optimized for real time data analysis. With proper index management, old data is archived or removed from time to time to keep the querying process fast.

Why Open Distro?

Open Distro distribution of ElasticSearch enables us to use SQL to query ElasticSearch, avoiding complicated DSL, especially when we do group by multiple fields. (And it is available on AWS :P)

Implementation

With the help of this blog post, I built a live table showing the click through rate on links in my company’s web site.

In the following demo, I am going to show you a simplified version of the live table. It will host a page with a table refreshing every 6s, showing the TCP dump of the local machine in the past hour.

Behind the scene, a whenever is served on http://localhost:5000/tcpdump/ . Every 6s, the JS script will send a post request to http://localhost:5000/tcpdump/_dash-update-component to get the latest result.

The core logic is done in application/dash_application/tcpdump.py and the structure follows what Todd suggests.

Inside dash_app.layout , we added dcc.Interval which is a component that will fire a callback periodically and the table structure.

dash_app.layout = html.Div([
dcc.Interval(
id='graph-update',
interval=APP_CONFIG.get('refresh_interval', 6000)
),
html.Div(
children=dash_table.DataTable(
id='stats_table',
columns=[
{"name": 'source_ip', "id": 'source_ip'},
{"name": 'source_port', "id": 'source_port'},
{"name": 'destination_ip', "id": 'destination_ip'},
{"name": 'destination_port', "id": 'destination_port'},
{"name": 'protocol', "id": 'protocol'},
{"name": 'count', "id": 'count'},
{"name": 'total_size', "id": 'total_size'},
],
)
)
])

Then we defined the callback function. The callback function needs to query ES and return a list of dict fitting the schema every time it is called. For more details, you may read https://dash.plotly.com/live-updates

def init_callbacks(dash_app):
# Create an ES connection
es = connect_es(
ES_ENDPOINT,
port=ES_PORT,
verify_certs=ES_VERIFY_CERTS
)
@dash_app.callback(
Output('stats_table', 'data'),
[Input('graph-update', 'n_intervals')]
)
def update_stats_table(n):
"""
Define the callback function here
"""
nonlocal es
...

What else can be done?

This is just a simple demo. The following items are worth trying but not covered here.

  1. Integrate Dash with Relational database.
  2. Enable cache to reduce frequent access on data stores.
  3. Conditional formatting on metrics
  4. CSS style

References

Thanks for reading, clap 👏 if you like it. We are hiring, job descriptions can be found HERE.

Originally published at https://medium.com on July 13, 2020.

--

--