Troubleshooting Basic Windows and Linux VM Issues on GCP

Omkar Nadkarni
Niveus Solutions
5 min readJul 5, 2024

--

Introduction

While working on VMs(Virtual machines) in GCP, we will often face different issues and spend a lot of time troubleshooting. Here are some common issues and tools or approaches which gcp enables us to be more productive while working with vms. This blog has links to different Google documentation which will help in solving this common issues.

Common issues

As a system admin, below are some common issues. Some are misconfiguration, few are design issues or capacity issues.

  • Error while creating virtual machines.
  • Error while creating images. Unable to import vhd files and convert to images or importing a image from AWS or azure.
  • Error while migrating vm from onprem or other cloud provider.
  • RDP or SSH issues.
  • Disk space management.
  • Services spitting some errors or warnings or Services in failed state.
  • Resource alerts like CPU/Memory/Disk utilization error.
  • Lack of Backup of vms and workloads.
  • Downloading and installing packages.
  • Route or Firewall issues.
  • Permission denied error.
  • MIG scaling issue.
  • MIG startup or deployment times.

Solution

  1. Error while creating virtual machine
  • While creating vms it is important we have access to the service project and host project to use network vpc subnet.
  • Subnet should be present for the region in which we are creating the vm.
  • We need access to trusted or golden image project to use the right image for the organization, if no organization policy for trusted images is implied then we can use Google public images.
  • Org policy for resource location needs to be adhered.
  • We need to ensure we use CMEK or go with google managed.
  • Firewall rules should be allowed for accessing vm over IAP.
  • Use the serial console to check the error and troubleshoot further in case of no boot.

2. Error while creating images in GCP.

  • While creating images, there are different errors however some common errors could be related to customisation of the os, lack of permissions, mount points.
  • Check the logs and troubleshoot further.

3. Error while migrating vm from onprem or other cloud provider

  • If migrating from onprem or other cloud providers which are supported, ensure we use migrate to virtual machines tool.
  • We can also import images or alternatively use a partner solution.
  • While migrating vms, please ensure we have ssh access to the source vm as post migration, if the guest environment is not installed, then it’s not possible to use IAP.
  • If the OS is not supported, then it’s best to create a vm with supported os in gce rather than migration.
  • Ensure correct access on gcp cloud to create vm.
  • Ensure access to kms keys in case using customer supplied encryption keys.

4. RDP or SSH issues

  • Rdp or ssh issues can be solved in gcp with various tools.
  • First check if the vm has a private ip or public ip. If a private ip then it uses IAP. IAP creates a tunnel from the internet to vm with private ip post authentication. It will not work if the firewall rule or guest agent is not installed.
  • Check the error and troubleshoot further by checking connectivity and the vm serial console.
  • If vm has startup issues, then check the article.
  • Check if the port number is changed.
  • Check for access issues.

5.Access denied error while accessing Google services via application installed on vm

6. Disk space management

  • Use cloud monitoring to alert for disk space alerts.
  • Login to Linux or windows vm and find out where the disk consumption is happening, if log rotation is set, app or database consuming data etc.
  • Troubleshoot further with this Google article.
  • 7.Services spitting some errors or warnings. Since all logs are sent from vm to cloud logging, Use cloud logging and check further.
  • Set log based alerts if errors are critical. Services in failed state. Login to the vm and troubleshoot further. If an important service then sets an alert using cloud logging based alert.

8. Resource alerts like CPU/Memory/Disk utilization error

  • Use a cloud monitoring dashboard to monitor. Use cloud monitoring alerts to send notifications.
  • Find memory leaks or cpu thread leaks by using os based utilities.
  • Use MIG to scale up or down vms based on resource consumption.

9. Backup of vms and workloads

  • Set alerts via cloud monitoring to set alerts for backup not taken or having any errors.
  • Use cloud native backup tools which enables us to troubleshoot the error via logs.
  • Check error and troubleshoot further in case of file lock or other errors.

10. Downloading and installing packages.

  • Use vm manager and it will help schedule downloading and installing packages.
  • Ensure vm has internet access or can access satellite server or wsus server to download packages.

11. Route or firewall issue

  • Use connectivity test to check for route or firewall issues.
  • Please refer to this link.

12. Permission denied error

  • Cloud logging and it will help diagnose the error.
  • All IAM errors are logged.

13. MIG scaling issue

  • MIG scaling issues can be caused due to the fact that the OS or the application taking time to start or is failing due to other underlying issues like connection to database etc.
  • we need to also ensure we have enough ip addresses available for scaling in the subnet allocated.
  • Ensure there are no quota limits causing the scaling issue. Refer this article if there are any MIG related issues.

14. MIG startup or deployment times

  • MIG startup might take time if the OS or app is taking time to load. Check each layer and find areas for improvement.
  • If there is a startup script, find the logic in which binaries are downloaded and applications are bootstrapped. Resolve the underlying issues with downloading binaries or restarting services in incorrect sequence or permissions error etc.

15. Windows license error

We need to ensure the kms key is reachable using byol on sole tenant otherwise we can use a metadata server for kms activation of windows. Please access (routing) and firewall to kms.windows.googlecloud.com (35.190.247.13)

Conclusion

With GCP, managing VM becomes easy with tools available to manage them efficiently. There are tools available for. troubleshooting, patching, backup and to keep them secured. In this blog, we have tried to cover the tooling available across troubleshooting a few common issues and scenarios.

--

--

Omkar Nadkarni
Niveus Solutions

Omkar Nadkarni is a principal cloud architect passionate about technology and its impact on business. Has skillset around GCP, azure, AWS, devops and infra.