Why you should not use Google Cloud.

Punch a Server
4 min readJun 29, 2018

--

FINAL UPDATE (25-March-2022): Because of this article, we often keep getting asked if we still use GCP and if we would recommend it to others. The answer is ‘YES, WE DO’, even more than before. GCP is an excellent service just like AWS and Azure (we use all three). Some of GCP’s products (BigQuery, CloudRun, Spanner, all of which we use) are insanely good and have no parallels (IMHO). You should try GCP and not let this article (which was written many years ago in a state of anger) influence you. And, NO, we did not get any ‘stuff’ from GCP to say this.

Update (18-July-2018): GCP has updated their account management policies to be more friendlier to GCP customers. https://cloudplatform.googleblog.com/2018/07/improving-our-account-management-policies-to-better-support-customers.html

UPDATE (2-July-2018): Thanks to the people from GCP support team who have reached out and assured us these incidents will not repeat. Here’s a direct message from them … “there is a large group of folk (within GCP) interested in making things better, not just for you but for all GCP customers.”

Follow discussions here.
HACKERNEWS:
https://news.ycombinator.com/item?id=17431609
REDDIT: https://www.reddit.com/r/programming/comments/8v4wrh/why_you_should_not_use_google_cloud_this_is_about/

Note: This post is not about the quality of Google Cloud products. They are excellent, on par with AWS. This is about the “no-warnings-given, abrupt way” they pull the plug on your entire systems if they (or the machines) believe something is wrong. This is the second time this has happened to us.

Background.

We have a project running in production on Google Cloud (GCP) that is used to monitor hundreds of wind turbines and scores of solar plants scattered across 8 countries. We have control centers with wall-to-wall screens with dashboards full of metrics that are monitored 24/7. Asset Managers use this system to monitor the health of individual wind turbines and solar strings in real time and take immediate corrective maintenance. Development and Forecasting teams use the system to run algorithms on data in BigQuery. All these actions translate directly to revenue. We deal in a ‘wind/solar energy’ — a perishable commodity. If we over produce, we cannot store and sell later. If we under produce, there are penalties to be paid. For this reason assets need to be monitored 24/7 to keep up/down with the needs of the power grid and the power purchase agreements made.

What happened.

Early today morning (28 June 2018) i receive an alert from Uptime Robot telling me my entire site is down. I receive a barrage of emails from Google saying there is some ‘potential suspicious activity’ and all my systems have been turned off. EVERYTHING IS OFF. THE MACHINE HAS PULLED THE PLUG WITH NO WARNING. The site is down, app engine, databases are unreachable, multiple Firebases say i’ve been downgraded and therefore exceeded limits.

It’s a lonely cloud.

Customer service chat is off. There’s no phone to call. I have an email asking me to fill in a form and upload a picture of the credit card and a government issued photo id of the card holder. Great, let’s wake up the CFO who happens to be the card holder.

We will delete project within 3 business days.

“We will delete your project unless the billing owner corrects the violation by filling out the Account Verification Form within three business days. This form verifies your identity and ownership of the payment instrument. Failure to provide the requested documents may result in permanent account closure.”

What if the card holder is on leave and is unreachable for three days? We would have lost everything — years of work — millions of dollars in lost revenue.

I fill in the form with the details and thankfully within 20 minutes all the services started coming alive. The first time this happened, we were down for a few hours. In all we lost everything for about an hour. An automated email arrives apologizing for ‘inconvenience’ caused. Unfortunately The Machine has no understanding of the ‘quantum of inconvenience’ caused.

You just can’t turn things off and then ask for an explanation.

I understand Google’s need to monitor and prevent suspicious activity. But how you handle things after some suspicious activity is detected matters a lot. You need a human element here — one that cannot be replaced by any amount of code/AI. You just can’t turn things off and then ask for an explanation. Do it the other way round.

This is the first project we built entirely on the Google Cloud. All our previous works were built on AWS. In our experience AWS handles billing issues in a much more humane way. They warn you about suspicious activity and give you time to explain and sort things out. They don’t kick you down the stairs.

I hope GCP team is listening and changes things for better. Until then i’m never building any project on GCP.

--

--