Realtime analytics, monitoring and campaign management over 100 million documents a day

Vipin Chaudhary
Deutsche Telekom Digital Labs
4 min readNov 3, 2019

At a Glance -

  1. 100 Million documents a day
  2. 8 Billion searchable documents
  3. 100 active users on Kibana
  4. 10M campaign queries a day

The Challenges

  1. Business need visibility & analytics for Telekom self care app ( What we called OneApp ) in 8 European countries.
  2. Central Development & Operation team need performance visibility across countries.
  3. App click event analysis
  4. Realtime campaign banner

The Solution

By building an analytics solution on Elastic-search, processing 100 million documents per day to deliver real-time visibility of app traffic across the Telekom Europe.
Leverage real-time analytics
1. Easily query 8billion documents
2. See traffic for all content as it happens
3. Gain insight into how updates impact OneApp traffic
4. Serve real time banner to customer
Empower the organisation
1. Give the entire organisation real-time insight into customer engagement
2. Analytics access for more than 100 users with document based security.
3. Encourage a culture of exploration and innovation for all employees
4. Transparency — Business have visibility like never before

Tech Stack -

Realtime analytics, monitoring and campaign management over 50 million documents a day — Deutsche Telekom
OneApp Real Time Analytics — Deutsche Telekom

MQTT Over TLS
MQTT (MQ Telemetry Transport) is ideal for mobile messaging; the small-footprint, low bandwidth nature of the protocol helps to minimise both battery use and network traffic. Just what you want to stay connected.

RabbitMQ
RabbitMQ is our choice as open source message broker. RabbitMQ allows clients to connect over a range of different open and standardised protocols such as AMQP, HTTP, STOMP, MQTT, MQTT over web-sockets and WebSockets/Web-Stomp. Publishing is scaled by using route 53 dns routing to spread traffic across multiple cluster.

We have different queue for every country , different queue for performance and click events data. Queue specific authentication and access control is implemented by regular expression per user.

Ansible
Ansible is our choice for IAAC automation. Rabbitmq ( installation , policy , queue , authentication ) is managed by Ansible playbook. Log-stash consumer , elastic-search also managed by Ansible playbook.

Logstash
Logstash grok and filter are used for parsing data on demand , however data format is json but still we need to define mapping for an index sometime. Index rotation policies also defined in Logstash.
Logstash consume data from Rabbitmq over AMQP and insert the same to Elasticsearch. For high throughput , we used auto ack in log-stash consumer.

MongoDb
Attribute that require upsert and mandatory for campaign is inserted in Mongodb replica set.

Elasticsearch
Elasticsearch is where magic happens. We are using self managed elastic-search with thread pool tuning on amazon cloud. We decided to go with instance store rather EBS for better IO and great cost savings. S3 used for regular backup and snapshot of data on i3 instance store. Amazon EC2 I3 instances include Non-Volatile Memory Express (NVMe) SSD-based instance storage optimised for low latency, very high random I/O performance, and high sequential read throughput, and deliver high IOPS at a low cost.

Glimpse of our cluster -
50 TB of data with 1 replica.
8 Billion Document with 3 months of data store.
Regular s3 backup
10M campaign queries / day.
Daily index rotation with force segment merge on old index.

Kibana ( Open Distro From AWS )

We have multi country user environment so we need something that provide document based security on index. User from Poland can only view document from Poland ( every document have country code ) , so we decided to go with open Distro from AWS. This help us to save huge cost.

Kibana provide awesome visualisation over ES data store , we can monitor app health in realtime across countries, create report on demand , automated alerts and so on.

KeyCloak
Keycloak is our choice for Identity and access management , we integrate opendistro with keycloak saml provider. That means all login to our Kibana dashboard is secured with Keycloak SSO. Having a multi country environment sometime we are not aware if a user leave in a specific country ( every country is an individual organisation itself ) , so for this we implement email based MFA in Keycloak out of the box ( Thanks to our Development Team ).

We are using JDBC ping on mysql for keycloak HA . In JDBC_PING solution each instance insert its own information into mysql database and the instances discover peers by the ping data read from database.

Zabbix
Zabbix is used for monitoring Rabbitmq queue along with all infrastructure components. Our instance automatically register to Zabbix during autoscaling.

--

--