Light & Wonder Monitoring Systems

Light & Wonder Tech Blog
The Light & Wonder Tech Blog
3 min readSep 14, 2022

By Ravi Velliyagounder, Director Technical Operations Engineering / iGaming Technical Operations

Live production monitoring is an integral part of every digital technology company, and it plays a critical role in post product launch to live environments. In a digital world, maintaining system uptime and system performance is critical, and both play a major role in customer satisfaction and subsequent revenue generation. It is impossible to manually monitor every system in the production network; what’s required is a way to continually monitor system heath and to alert when an abnormality or deviation is found in the system or a threshold is breached. See Rob Hulme’s blog on Problem Solving Methodology for more detail problem handling.

There are many monitoring tools available on the market, both commercial and open source. Light & Wonder has chosen Zabbix, an open source, enterprise-grade monitoring platform, which monitors the tech stack from servers to databases in a seamless and efficient manner to be a key component of our monitoring and alerting ecosystem.

Why Zabbix?

  • Open Source;
  • Easy to deploy and configure;
  • Scalability with distributed systems;
  • Flexibility to add custom scripts and templates to monitor application services;
  • Prebuilt templates are available to monitor most systems infrastructure; and
  • Zabbix alerts are well integrated with many industry-standard collaboration tools such as Slack, Teams etc.

In addition to monitoring the system and application health, Zabbix collects metrics from applications, services, servers, and network devices in almost real-time. These metrics assist in analysing and visualising performance and statistics promptly and to take corrective action to keep the production systems in a healthy condition.

Additional Considerations

Light & Wonder operates in regulated jurisdictions, and we are required to adhere to all compliance rules in each of those locations. The ability to show the history of system performance and availability, and overall system health is valuable to address any questions from the relevant regulator.

As our business continues to grow, Light & Wonder proactively plans capacity increases to support the increased activity across our network. Historical trends in utilisation consumption help us to understand consumption and to plan for future growth.

Whilst Zabbix has its own visualisation capabilities for time series, Grafana is used to chart metrics and trends across a wide range of KPIs. Light & Wonder’s traffic profile is predictable and recognisable as a time series, allowing an ‘at a glance’ inspection of service health.

Light & Wonder deploys a distributed monitoring system, which is scalable and meets the requirements of being able to continuously add new customers and data centres. Zabbix proxies are installed in every data centre, with these connecting to the main Zabbix system. Metrics and monitoring are deployed through Light & Wonder’s developed standard templates, and Grafana dashboards are configured via Terraform. This allows 24/7 monitoring of 60 data centres and ~10,000 servers/endpoints.

At Light & Wonder, we have an “eyes on the business” culture and with the scale and growth of our network, the right setup to allow us to monitor, to alert and to take action has become the cornerstone of our 24/7 operation.

The opinions expressed in this blog post are strictly those of the author in their personal capacity. They do not purport to reflect the opinions or views of Light & Wonder or of its employees.

WE’RE HIRING

Find out more about life at Light & Wonder and the roles we are looking to fill: https://igaming.lnw.com/careers/

--

--