By Kenneth MacArthur and Nicholas Brenwald
King’s recent attendance at Google Cloud Next ’18 London offered a stage (or several!) where we could share our knowledge and experience working with Google Cloud Platform (GCP) thus far, and our ideas for leveraging GCP in the future.
King took part in a series of sessions at Next about our ongoing GCP analytics platform migration, including Rethinking Big Data: Moving from Complex Analytics to Actionable Insights with Google Cloud, where we had the opportunity to share with Google’s Sudhir Hasbe the progress of our petabyte-scale migration to GCP.
Below is an edited version of our interview with Sudhir.
What made King start looking at public cloud?
We started to notice a general industry trend towards public cloud — both for infrastructure in general and particularly in the analytics and machine learning (ML) space, where the direction of travel seemed pretty clear — hardly anyone is doing greenfield data warehouse deployments on-premise these days, for example.
At the same time as that, we started to see some themes emerging inside our business that also pointed us towards looking at public cloud — teams wanting to be able to more easily provision multiple environments, the business wanting us to be able to support multiple tenants, some of our colleagues working with machine learning wanting to have access to the latest ML hardware: GPUs, TPUs and so on. We had a generally reliable stack, but with some reliability issues here and there. And we were asking ourselves: do we want to be in the low-level infrastructure operations game, or do we want to focus higher up the stack?
Taking those two strands together — the wider industry trends, and the themes inside our business — we thought: there’s something here; we need to look at this further.
Why did King choose GCP?
The first use case we wanted to tackle with cloud was the replacement of our on premise data warehouse. King used to store all of its data in a 20 PB, 500 node, on-premise Hadoop cluster. Most users would query this data using Hive, or perform more complex analytics using MapReduce or Spark. For performance reasons, we also used to copy a subset of our data to an in-memory analytics database that whilst quick, didn’t scale so well and really struggled with our volumes.
Over many months, we did an extensive evaluation of various cloud solutions and found that GCP provided the best analytics capabilities and performance for our needs. In addition, we really believe in Google’s future vision for the platform and when you consider Google’s contributions to the analytics and ML space with things like Beam, TensorFlow, we felt that you’re clearly leading the market.
We also felt that options such as BigQuery flat-rate pricing, auto scaling, transparent billing allowed us to present a really compelling argument for choosing GCP. The billing aspect has been really eye opening. Previously if a team needed a reserved YARN pool, or to store a large volume of data in HDFS, it was really difficult to determine the cost. Now we’re able to accurately determine this for all new initiatives and then decide if they are actually worth pursuing.
Which GCP products are King using?
BigQuery forms the core of our new analytics platform and we are already heavily invested with over 10 petabytes stored. We love it, it takes us out of the capacity planning game and being a fully managed service, allows us to focus on use cases actually relevant to King and gaming.
Each day, we ingest around 50 billion game events into BigQuery using Dataflow. In fact, we’ve already processed over 8 Trillion game events this way and find Dataflow provides a really neat and cost-effective way to ingest data. We make heavy use of auto scaling, game launches used to be really demanding; even with the best playtesting in the world, it’s really hard to predict how successful a game will be and capacity plan accordingly. With Dataflow autoscaling if we get spikes in traffic, our jobs now auto scale out accordingly.
We’re also quite heavy Dataproc users. We were basically a Hadoop shop prior to embarking on this migration, so we have years of IP written in Hive, MR and Spark. We don’t have the time, or desire to rewrite all of these old jobs, so we use Dataproc as a really useful migration tool allowing people to migrate to the cloud quickly without much thought or attention. Once our migration is complete, we can always go back and revisit these jobs, to see if they would be better handled with a different product such as Dataflow, or ML Engine.
Adding to what Nick touched on around ML Engine, we have a team at King that is really focused on applications of AI for our business and they’ve been all-in on GCP for some time — using, amongst other things, ML Engine, GKE and Cloud Pub/Sub. They’ve built a deep learning tool on GCP that helps our creative teams design the most enjoyable content for our players, which is proving to be super useful.
More broadly, with the data teams and the machine learning folks now using GCP, we’re seeing some of our colleagues also beginning to look at the platform, and a hybrid cloud strategy starting to take shape at King.
What are the key business benefits King is seeing from its GCP migration?
It’s still early days — we’re just coming towards the final stages of our migration at the moment. But there are three benefits we’ve already seen that are worth calling out:
Previously, the bulk of the analytics queries at King were run on Hive, and now a lot of that work has moved to BigQuery — which has provided a step change in performance for our users. We also enjoy the tooling around BigQuery — a clean web UI, command-line tools, client libraries, and so on.
One of the things that drove us to look at GCP was the chance to unify. We used to have three SQL analytics engines at King; with the move to GCP, we’re consolidating on one in BigQuery. For our users, this means only having to get familiar with one SQL dialect; for our data management teams, it means generally only having to land data in one place; and for the organisation as a whole, it means we can focus on training and tooling for one platform rather than several — which frees up cycles which we can spend on other topics.
We love the project construct in GCP. It may sound a bit administrative, but it has allowed our teams to be much more agile than they’ve been able to in the past. Teams own their own environments in GCP at King, which means they can provision VMs or create new BigQuery datasets without having to raise a ticket to a central team. They can use multiple projects to create different environments — production, QA, development and so on — which before often meant new physical metal, with all of the associated costs and delays. And we are also using different projects to support requirements from the business to support multiple tenants on our platform, with data clearly segregated between each tenant, something we started trying to solve in our on-premise environment, but gave up trying to do — most legacy big data solutions just don’t have native multi-tenancy support.
What tips does King have for other companies considering making a similar move?
My main advice would be to do everything in your power to reduce the migration period. At King, we took the conscious decision to complete our migration within 1 year, knowing full well that this would cause a certain amount of pain and discomfort.
We felt the alternative would have been far worse. Running in parallel both on prem and in the cloud for an extended period would have cost us more financially and distracted our engineers from the more important value-add work that is so key in making King a world leader in mobile games.
Additionally, we found that once we started on the migration path, our whole engineering org got really excited about new possibilities and were really impatient to get stuck in using the new tools. Once they got a taste of working with GCP, the speed at which they could run their queries, the ease at which they could spin up new environments and track their costs, administering the old systems became the least appealing job in the company.
Where do you go from here?
As Nick mentioned, we took the conscious decision to move our data warehouse to GCP relatively quickly, so as we move into 2019, we’re looking forward to spending some cycles optimizing and making more cloud-native some of the things that we lifted and shifted from our old environment this year.
Looking beyond that, we’re excited to start looking at some of the unique capabilities that drove us to GCP in the first place — BigQuery ML, for example, which we’ve got a couple of people looking at already. As Airflow users, we’re also keen to look at Cloud Composer, and whether that can allow us to work even more efficiently.
Let’s not forget, GCP isn’t only a platform for analytics and machine learning. Whilst we expect certain key parts of our business to remain on-prem for now, we absolutely see our hybrid cloud strategy broadening over time. We are already starting to identify other areas within King where it may make sense to migrate fully or in part to GCP.
“We are already starting to identify other areas within King where it may make sense to migrate fully or in part to GCP”
King has a great partnership with the Google Cloud team, and we enjoyed sharing our journey to GCP on the Next London main stage.
As we look towards the coming year, we would like to see Google continue to evolve and improve the platform — particularly in the following three areas:
- Enterprise-wide overview — We’d like to have better oversight out of the box of things like how many and what BigQuery queries our users are running across our organization and/or billing account(s), not just within a GCP project. We’d love this principle to be applied across GCP more broadly too — eg, a single pane of glass showing how much data we have stored in GCS across all projects.
- Supporting scale — We are a petabyte-scale GCP customer and sometimes hit up against some of the limits and quotas which, for entirely understandable reasons, exist in the platform. Google is clearly able to handle our scale, so we’d love, for example, to see some of the limits and quotas start to be automatically proportionate to the amount of data stored (and paid for), to reduce the friction of having to manually request increases through support tickets. We’d also like for there to be faster and easier ways to consume data stored in BigQuery from other GCP services.
- Manageability — We’d like to see richer APIs — particularly in BigQuery and in the billing area — to allow us to programmatically configure things, without using the UI or engaging support. (The UI is also important — we just want APIs too!) We’d also love to see finer controls on what users are and aren’t allowed to do — eg, “This group of users can use BigQuery and Dataflow but nothing else”, “This group can use Dataproc too — but only if they create clusters with at least x% preemptibles”, and so on.
We hope both King and Google Cloud will have more exciting things to share at Next San Francisco ‘19!
You can watch the full interview here.
See our latest jobs at king.com