What’s Next for Group Nine’s Data Platform Team?

Mimi Z
Group Nine Media Product Team
4 min readMay 7, 2019

Last month, the Group Nine Media Data Platform team attended Google Cloud Next, the Google Cloud Platform conference. After three days of incredible talks, demos, and learning, here are some of our takeaways.

Data on the Cloud is Awesome and other companies agree.

At Group Nine Media, we ingest millions of rows of data a day. The world of social media moves quickly, so it’s important that this data is easily queryable, agile, and, most importantly, reliable.

A year ago, we were running queries on our Postgres instance that took 40+ minutes. Those same queries ran on BigQuery take less than 5 seconds. As a result, our team was able to move more quickly and ingest more data, process more data, and make more data available and usable to others within Group Nine Media.

But, don’t just take our word for it. Our experience echoes that of other companies that presented and spoke at Google Cloud Next. Take leading music streaming service Pandora as an example. Pandora’s VP of Platform Services, Brett Uyeshiro, and Google Big Data and Analytics Architect, Blake DuBois, spoke about Pandora’s migration from on-premises BI and analytics to Google Cloud Platform. Pandora drastically sped up their analytical queries using the power of BigQuery.

Note none of the BigQuery run times exceed 1 minute! [Source]

Google’s managed services also mean less time spent on maintenance and more time on creating business value. Data spends less time stuck in the pipeline and more time in the hands of our insights team, ready to help inform decisions.

New technologies that excite us (and maybe you)!

Google announced several new technologies we find relevant to our work over the course of the conference. Here are a few:

Cloud Run

Some of you may be familiar with Google’s Cloud Functions. However, after using Cloud Function, we’ve realized that despite Cloud Function’s many pros — quick to spin up, on-demand, and easy to hook into Cloud Storage — it had its limitations. This is where Cloud Run comes in:

Cloud Run is a managed compute platform that enables you to run stateless containers that are invocable via HTTP requests. Cloud Run is serverless: it abstracts away all infrastructure management, so you can focus on what matters most — building great applications. — Google Cloud Run

Cloud Run essentially allows the user to spin up a container with a URL in minutes. It only runs when an HTTP request comes in. Otherwise, it scales down to 0. As long as you can build the Docker container, Google Run will handle the rest: provision servers, set up routing, and handle logging.

We’re excited for the potential of using Cloud Run for internal services such as fetching API tokens due to its ease to start up and ability to live both in and outside of Kubernetes.

Cloud Code

Cloud Code is a plug-in for VS Code and IntelliJ. Using the power of Skaffold, Cloud Code helps developers write, debug, and deploy code to a local or remote Kubernetes cluster. The ability to have a live debug code living on a Kubernetes cluster on any of the major cloud providers brings agility to the Kubernetes development process.

Cloud Dataflow SQL

Cloud Dataflow SQL allows users to start Dataflow streaming or batch jobs from the BigQuery UI using SQL. As the Data Platforms team at Group Nine, we are equipped and confident in our ability to write custom Beam jobs for Dataflow. The prospect of Cloud Dataflow SQL remains exciting because of this fact, not despite it.

Cloud Dataflow SQL enables analysts who may not have the technical background in writing Dataflow jobs write jobs. By giving our analysts and data scientists the ability to use the power of Apache Beam to query and join different source of data — files, PubSub streams, and other BigQuery tables — independently, data can be organized and arrive at its intended destination faster. Meanwhile, we can focus on building custom solutions for difficult to parse third-party payloads or expanding the catalog of data that we offer to the rest of the business.

Cloud Dataflow SQL is expected to be in public alpha sometime this month. Register for it here (and Data Catalog too, if you’re interested).

We had an amazing three days at Google Cloud Next. Although we have only highlighted three of the many, many technologies announced at the conference, we’re eager to try out many others. Please tell us about your experience with Google Cloud Next, GCP, or what is exciting you about data in the cloud in the comments!

Further Readings/Watchings:

Hey! We are hiring! Check out the Group Nine Media job board here!

--

--