We know the potential of BigQuery, writing SQL without worrying about the infrastructure or scaling problems. This article tries to go further in the understanding of BigQuery APIs since instead of putting all features in a single API service, It offers 5 different APIs and client libraries (Python, Go, or Java).

Image for post
Image for post

Let’s review one by one giving real examples using the Python client library.

1. BigQuery API

The principal API for core interaction. Using this API you can interact with core resources as datasets, views, jobs, and routines. Up today exists 7 client libraries: C#, Go, Java, Node.js, PHP, Python, and Ruby.

Example

For this example, I will use the python client library for the BigQuery API on my personal computer. Consider that you need to have python already installed.


Exist many technologies to make Data Enrichment, although, one that could work with a simple language like SQL and at the same time allow you to do a batch and streaming processing, there are few and one of them is Dataflow on Google Cloud.

What is Apache Beam?

Apache Beam is a unified model for defining both batch and streaming data-parallel processing pipelines, as well as a set of language-specific SDKs for constructing pipelines and Runners for executing them on distributed processing backends, including Apache Flink, Apache Spark, Google Cloud Dataflow and Hazelcast Jet. [Github, Apache Beam]

This time we’ll be using Google Cloud…


I’ve used BigQuery every day with small and big datasets querying tables, views, and materialized views. During this time I’ve learned some things, I would have liked to know since the beginning. The goal of this article is to give you some tips and recommendations for optimizing your costs and performance.

Basic concepts

BigQuery: Serverless, highly scalable, and cost-effective multi-cloud data warehouse designed for business agility [Google Cloud doc].

Image for post
Image for post
BigQuery logo

GCP Project: Google Cloud projects form the basis for creating, enabling, and using all Google Cloud services including managing APIs, enabling billing, adding and removing collaborators, and managing permissions for Google Cloud resources…


Learn how to start handling security, scalability, access, and documentation in a modern Data API.

Imagine, you’ve developed a simple python data API following many tutorials, and now some questions come to your mind.

  • How our API could handle hundreds or even thousands of requests?
  • How could you establish a minimum security level?
  • What is the easy way to share your API with other departments
  • Generate and share documentation in a simple way

This post aims to answer these questions. Let’s start with the architecture we will develop. …


In this article, I’ll show you a simple way to build in minutes a few Data APIs for exploiting data from a BigQuery dataset. These APIs will be deployed with dockers using a GCP serverless service called Cloud Run.

Architecture

Image for post
Image for post
Architecture

The idea behind is to work with serverless components. First, let’s understand these services and their purpose on the architecture.

  • Cloud Run: Cloud Run is a fully managed compute platform that automatically scales your stateless containers [Cloud Run Doc]. It will handle all the APIs requests since It’s fully managed we don’t need to worry about scaling. …

Imagine you’d developed a transformation process in a local Spark and you want to schedule it so a simple Cron Job would be sufficient. Now think that after that process you need to start many other like a python transformation or an HTTP request and also this is your production environment so you need to monitor each step
Did that sound difficult? Only with Spark and Cron Job, yes, but thanks we have Apache Airflow.

Image for post
Image for post
Airflow Logo

Airflow is a platform to programmatically author, schedule and monitor workflows [Airflow docs].

Objective

In our case, we need to make a workflow that runs a…


Imagine you want to start building some data pipelines in Spark or implement a model with Spark ML, the first step before anything is to deploy a Spark Cluster to make it easy you could set up in minutes a Dataproc cluster, It’s a fully-managed cloud service that includes Spark, Hadoop, and Hive. Now imagine doing it many times, reproducing it in other projects or your organization want to make your Dataproc configurations a standard.

This is when a new approach comes Infrastructure as Code, IaC is the process of managing and provisioning computer data centers through machine-readable definition files…


After following the 12 weeks of preparation recommended by Google, I passed the exam for Associate Cloud Engineer, here is what I’ve learned that could help you.

Image for post
Image for post

This story began 3 months ago when I was like every day checking my Linkedin feed and I saw a post from Google Cloud about the Certification Challenge. The first time I read I was considering getting a cloud specialization and was wondering which of the three main competitors should I.

Why did I choose Google Cloud?

First, at that time the decision wasn’t technically since I didn’t have deep experience in Azure, AWS or GCP just basic projects…


Making easy to analyze billions of rows

Image for post
Image for post
Druid UI

Concept and purpose

In order to have a clear understanding of Apache Druid, I’m going to refer what the official documentation says:

Apache Druid (incubating) is a real-time analytics database designed for fast slice-and-dice analytics (“OLAP” queries) on large data sets. Druid is most often used as a database for powering use cases where real-time ingest, fast query performance, and high uptime is important.


Image for post
Image for post
Bootcamp Developer Kit

Learn by doing and never fear to failure.

This phrase could resume all my experience in the MIT Deep Technology Bootcamp and additionally to that, intense describe well each day at the classroom. Although the intention of this article is not to be an experience-describer I want to give some thoughts during the explanation of definitions, topics, and trends. In the end, the idea of sharing this is to give some inspiration explaining a big picture of deep technologies where is possible to go further especially if you are starting in the Data Science or AI world.

Lesson 1: Deep Technology

So What exactly…

Antonio Cachuan

Google Cloud Professional Data Engineer (2x GCP). When code meets data, success is assured 🧡. Happy to share code and ideas 💡 linkedin.com/in/antoniocachuan/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store