Drop in-memory processing in favour of parallel execution and horizontal scaling

TL; DR: If you’re working with large amounts of data, BigQuery and Dataflow on the Google Cloud Platform (GCP) can boost your efficiency and make your life easier when generating datasets for machine learning.

Recently I was approached by the startup reviewr.ai for a data engineering task that consisted of preparing data for machine learning (ML) and training an ML model. Unlike larger corporations that apparently can afford to let their data scientists spend days or weeks on such a task because they have to work with on-prem infrastructure or with a cloud provider that was chosen mainly because the…


TL;DR: If your Google Cloud AutoML Vision deployment is underutilised, consider moving it to Cloud Run.

Google Cloud’s AutoML is a fine thing. If you have a standard AI problem like image classification, object detection or entity extraction, chances are that AutoML will solve it with very little effort from your side. Just import your own dataset and retrain a state of the art model with it. After a couple of hours of training, you can use your customised model right away and make predictions via an automatically deployed endpoint.

It’s a great shortcut if you need fast results without…


In my last article, I presented my solution for an automated algorithmic trading infrastructure on the Google Cloud Platform (GCP). Initially, I had intended to deploy the application as small pieces of serverless Cloud Functions, triggered by Cloud Scheduler. I refrained from this preferred approach and instead chose to build a Kubernetes-based application because I had to run the Interactive Brokers (IB) gateway software on a compute instance or in a Docker container. Hence, the serverless way seemed to be doomed from the beginning.

Run time!

While working on the Kubernetes application though, I realised that this is actually not true. There…


A long, long time ago, I worked in quant asset management. It’s the business of rule-based investing or algorithmic trading where you base investment decisions on data and models rather than human judgement and fundamental assessment of markets or companies. Nowadays one would say it’s data-driven investing.

Get rich or lie on the beach tryin’

My actual job was to do data analytics and to design profitable strategies that would typically earn some risk premium and to combine several of them in a portfolio. As the decision making is purely algorithmic, the implementation can be fully automated.

When you do this all day long, you’re eventually attracted by…


Did you ever have the impression that after deploying a Cloud Function to the cloud, your view of it was… well… clouded? Because it was hard to understand what caused its latency, now that it had ascended to no-ops heaven?

Among the application performance management (APM) products on the Google Cloud Platform, there is a nice tool called Stackdriver Trace to analyse and dissect the latency of your applications. It’s very useful to find out what part of your function it may be worth spending time on optimising. …


In my last article, I illustrated how we built a completely serverless application for brand detection in videos on the Google Cloud Platform (GCP). This time, I outline how we added an admin tool that lets the operators of the app manage the brand detection engine itself. I would have loved to call it “Episode II: The Server Strikes Back”. But it doesn’t. This is as serverless as the main application.

Motivation

Which serverless GCP product didn’t we use so far? Ah right, Cloud Datastore! So let’s do something with Datastore, shall we?

Admittedly, the reasoning behind that admin tool was…


Some time ago, we were asked by a client to help them on the technical implementation of a business venture. The idea was to detect brands and logos in TV sports broadcasts and videos in order to measure the brand exposure during e.g. a football match or a ski race. The project was abandoned by the client before the MVP was completed because of commercial reasons. We decided to finish the MVP nevertheless and put part of it online because we reckoned it is fun. Here is how we built a full end-to-end solution on the Google Cloud Platform (GCP).

The Requirements

Juri Sarbach

Google Cloud Certified Professional Cloud Architect • Google Cloud Certified Professional Data Engineer • Data engineer and Google Cloud specialist at Panter AG

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store