Project Flash update: Azure VM availability monitoring upgraded

5 min readNov 6, 2023

Secure VM health with Azure

Try Microsoft Azure products with high-quality VM health data to start your observability journey. They include resource health, activity logs, Azure resource graph, Azure Monitor metrics, and Azure event grid.

Azure team’s great developments from the past year are exciting to reveal! A sneak peek at our work:

New VM availability monitoring feature: Azure now monitor VM availability degradation. It proactively notifies of availability or performance issues.
HealthResources event grid system subject public preview: This feature sends low-latency VM availability notifications to help you mitigate changes quickly.
Application freeze notifications: Azure now notify users of application freezes during specific network and storage agent changes. Your disruption management becomes clearer with increased visibility.

Azure never compromise on quality. Their aim for 100% data consistency and strict quality standards across all Flash experiences.

“Last year, Azure updated Project Flash in the Advancing Reliability blog series to help Azure customers diagnose VM availability issues quickly and easily. They are excited to provide the latest VM availability monitoring improvements for clients to trust for flawless Azure workload running. I’ve asked Azure Core Platform Fundamentals Senior Technical Program Manager Pujitha Desiraju to discuss Project Flash’s newest initiatives.Azure CTO Mark Russinovich.

Adding degraded VM availability state for better monitoring

As part of Azure ongoing work to improve VM health detection, they are happy to introduce the degraded VM availability condition, which improves quality. This new functionality uses machine learning-based anomaly detection models to forecast VM degradations caused by host server hardware difficulties like CPU, disk, and memory issues. Along with VM health annotations, Azure resource graph, event grid, resource health, and activity logs smoothly incorporate this capability.

With this functionality, monitoring your VM’s health and understanding its degradation is easier than before. All Flash experiences have views that make it easier to determine if VM deterioration was planned or unforeseen. The views also identify the culprit, advise mitigation methods, and provide a precise redeployment date to avoid operational delays.

By 2024, Azure hope to include inoperable faster networking and novel hardware failure scenarios. They also want to add the degraded status to Azure Monitor’s VM availability measure to improve downtime attribution.

Low-latency event grid notifications on VM availability changes public preview

Any event that could affect Azure VM availability must be detected in real time to assure business-critical application continuity. This insight lets you quickly protect end-users from disruptions. Azure are pleased to introduce the public preview of the HealthResources event grid system subject with new Azure VM health annotations to enhance your daily operations!

This system subject delivers detailed VM health statistics to immediately understand VM availability variations and context. Single-instance and Virtual Machine Scale Set VMs in this Azure subscription can receive events. Azure Resource Notifications (ARN),their cutting-edge publisher-subscriber service with RBAC and powerful filtering, publishes data to this subject. This lets you subscribe to an event grid system subject and instantly send relevant events to downstream tools using event grid’s powerful filtering. This lets you resolve issues immediately.

Getting Started:

Step 1:

Users create a system topic in their Azure subscription to receive notifications.

Step 2:

Users then create a system subject event subscription in Step 1. They’ll choose an endpoint (like Event Hubs) to route events in this step. Users can configure event filters to limit sent events.

Consider these best practices while subscribing to HealthResources system events:

Based on expected event size, choose a destination or event handler.
Event hubs are ideal for fan-in scenarios that consolidate notifications from multiple system subjects. Real-time processing for data freshness and periodic analytics processing with adjustable retention periods benefit from this.

Azure hope to make the preview broad availability in 2024.

Improved application freeze visibility

When operating sensitive workloads, insight into system reboots and freezes is essential. The introduction of VM health annotations on freeze impact in future network and storage agent changes is exciting. Resource health, Azure resource graph, and event grid receive these signals.

This new functionality provides precise system freeze impact and attribution details. This information comprises whether the activity was scheduled or unplanned, if it was completed, the impact length as noticed by you, and the update kind. This lets you track and examine application freezes and receive customized alerts.

Azure goal for 2024 is to broaden the conditions for which these notifications are emitted.

Summary of Project Flash solution

The Flash initiative has spent years developing solutions for their customers’ different monitoring needs. Refer below to get the best Flash monitoring solution(s) for your needs:

Azure HealthResources graph

Generally available. Large-scale investigations benefit from it. Kusto query language makes information retrieval easy. It can also centralize resource information and make historical data retrieval easy.

HealthResources at Azure event grid

Open to public preview. It lets you perform time-sensitive mitigation measures like redeployment and VM restart to avoid end-user disturbances. Customers can receive important resource availability alerts in seconds.

Azure monitor VM availability

Open to public preview. It can detect trends, aggregate platform indicators like CPU and storage consumption, and provide precise threshold-based warnings. Azure Monitor offers this VM availability metric.

Currently, Azure resource health is generally available. Through the portal, individual resource health checks are instant and easy. Customers can immediately access the resource health blade on the portal and see a 30-day health check history for easy troubleshooting.

Monitoring VM availability holistically

Scheduled events (SE) and Flash health events should be used to monitor VM availability during routine maintenance, live migration, service healing, and degradation.

Scheduled events provide up to 15 minutes’ notice before maintenance. With enough notice, you can avoid or plan for downtime. You can acknowledge these events or delay actions during this 15-minute interval, depending on your maintenance readiness.

Flash Health events track availability problems, including VM deterioration, in real time. This functionality helps you manage downtime with automated mitigation, investigations, and post-mortem analysis.

Explore the Azure products to which azure emit high-quality VM health data to start your observability journey. These items include Azure resource health, activity logs, resource graph, monitor metrics, and event grid system subject.