How to push serial console output logs to Stackdriver and set alerts in GCP

GCP provides to use serial consoles for all the VMs. Every VM have 4 serial ports. Serial ports are similar to Termal windows which supports both input and output. Its completely in full text based windows, no GUI. These ports are will help to troubleshoot Boot manager or GRUB related issues.

What are the logs we can see from Serial Console:

  • OS logs (During sysprep)
  • BIOS
  • System level entries

Most of the system level entries are captured in Serial Port 1.

To read more details about Serial console, please refer the official documentation from GCP.

Why we are checking this logs?

We were in a process of setting an autoscaling group in a private subnet and use NAT for external internet communication. During the autoscale the Windows image(fully hardened) will be launched. Once the sysprep has been done, then this will communicate to the Microsoft KMS server and get the new Protect key and Activate the Windows OS. But in our case the communication to the KMS was failing and the Windows is not activated during the launch.

This complete information will be provided by the Serial Console. So we decided to push the logs to StackDriver and set the alert if the windows is not activated.

Ways to get the serial console logs

Problem with getSerialPortOutput API

To call this API we should use Oauth. If we need to use OAuth then we need to create another API to generate the Token. Just using API keys generated by the GCP Credentials will not work. You will get the below error.

{
"error": {
"errors": [
{
"domain": "global",
"reason": "required",
"message": "Login Required",
"locationType": "header",
"location": "Authorization"
}
],
"code": 401,
"message": "Login Required"
}
}

How we implemented the solution:

  • Attach the service account to the VM which has access to view VM metadata.
  • Use gcloud cli get-serial-port-output to get the logs in the startup.
  • Push those logs to stackdriver logs.
  • Grep the word NOT ACTIVATE.
  • If it is found send the alert.

In this blog we are going to test the alert if the log has Starting startup scripts.

Creating the Service Account:

Go to IAM & Admin -> Service Accounts

  • Create Service Account.
  • Attach the below roles.
Compute Engine -> Compute Instance Admin (v1)
Logging -> Logs Writer
Monitoring -> Monitoring Metric Writer

Add startup script:

Google Cloud SDK is installed on the VMs (if not please make your Golden Image with Google Cloud SDK).

For getting logs for Windows Instance:

In Custom Metadata add the Powershell as startup.

Key: sysprep-specialize-script-ps1
Value:$msg=gcloud compute --project=your_project_name instances get-serial-port-output vm_name --zone=instance_zone | select-string -Pattern "Starting startup scripts" -Context 1,10
$log=$msg -replace ">","" -replace "\n"
gcloud logging write windows-serial-log $log --severity=ERROR

If you want to capture something for Linux use use the following Cloud CLI commands in Automation script.

In the Automation add the below script.

gcloud logging write linux-serial-log "$(gcloud compute --project=your_project_name instances get-serial-port-output vm_name --zone=instance_zone | awk '/Checking instance license/ ? c++ : c')" --severity=ERROR

Get the logs in StackDriver Logging:

  • Go to Logs Viewer.
  • Select the Resources as Global.
  • Logs group as windows-serial-log.
  • For Linux select linux-serial-log.

We have mentioned this logs as --ERROR.

So this will show in Orange flag, if you make it — -CRITICAL then it’ll show in Red color.

  • Expand the log, and we can see the Starting Startup Script.

Setting up Alert:

  • Click on the textPayload and click show matching entries.
  • In the filter Box type the below commands.
  • It’ll show the matched logs.
  • Then click submit filter.
resource.type="global"
logName="projects/YOUR_PROJECT_NAME/logs/windows-serial-log"
textPayload:"Starting startup scripts"
  • Then Click Create Metric.
  • Give a Name and Description.

Now its time to play in StackDriver.

  • Go to Alerting -> Create a Policy
  • Click Add conditions -> Metric Threshold/Rate Change/Absence.
  • In the Target type the metric name which you created.
  • In the Configuration set the Threshold as you want.
  • Click Save Condition.
  • Now in Notifications add the email addresses.
  • Give a name and Save the Policy.

Now launch a VM and it’ll push the logs to the StackDriver. If it found any matching pattern like Starting startup scripts it’ll send the email.

Its time for you to play with the Serial Console logs and StackDriver. Hope you find this helpful, if you are also going to try this please feel free to give some Claps before that :)