Image for post
Image for post
Trying to find value

There are so many tools that have been developed for machine learning (ML) practitioners over the last several years. In some cases, what is developed is rendered obsolete in a surprisingly short period of time i.e. 6 to 12 months.

That’s tools. And if they are all you care about you’re missing the bigger picture. If you are creating an ML model that adds little or no value to your consumers, tooling choices don’t matter. Technical ML maturity should come second to finding value through ML models. Some practitioners will feel this sounds a bit chicken and egg. …


Image for post
Image for post
2020 was bananas — Aleksandar Pasaric

The year 2020 was… eh… you know it wasn’t good. However, there were some silver linings. For those of us practicing machine learning (ML), the MLOps Community was a much needed forum. Even before the pandemic, this community would have been a very welcomed development but arguably would not be the same without the reality that COVID thrust upon everyone. The MLOps community was a child of 2020. In case you missed it, or would like a recap, this article aims to do just that.

While this article is a TL;DR kinda post, I really want to point out that…


Image for post
Image for post
Identify possible issues (tip) and ‘enhance!’ for a closer look (cue)

One could accuse me of buzz-word-packing in this title. But I reckon you’ve never heard of tip-and-cue — something I was reminded of at Explore 2020. If you’re interested in satellite images as an alternative data source, it’s worth understanding tip-and-cue and how it can advance your use case.

The tip-and-cue technique is used to identify possible issues within an area of interest at a low resolution (tip) and zoom in with higher resolution satellites tasked to a specific area (cue). …


Image for post
Image for post
Photo by Miguel Á. Padriñán

I’ve been working on a machine learning (ML) demo for Datapalooza hosted by Servian in the UK. If you’re reading this before 1 October 2020 — make sure to check out the event!

In my session, I explain the fundamentals of a feature store and how to create and modify it in BigQuery along with how to train and deploy ML models in BigQuery in addition to how that process can be operationalized through Cloud Build and Looker.

BigQuery ML is unique in that it is the only OLAP database that enables training and predicting from within! …


This article assumes some background knowledge in data science and machine learning.

Image for post
Image for post
Hmmm… this spaghetti could use some cheese…

MLflow. Kubeflow.

Both are open-source projects. Both are supported by major players in the data analytics industry. MLflow is a Databricks project and Kubeflow is widely backed by Google. Both tools are touted as the next best thing since sliced bread when it comes to tracking ML experiments and supporting the production ML lifecycle. Both end in ‘flow’.

That said, what specifically are the key differences? Moreover, not every data scientist nor every data-driven organization is at the same phase of the ML journey, so how can they…


Image for post
Image for post
Source: lalo Hernandez

This is the third article in a series of three, which focus on production ML and the intersection between data science and engineering.

In my last article, Trawling Twitter for Trollish Tweets, I wrote about a simple, supervised approach to classifying tweets based on a disinformation dataset. I focused mostly on the development of the data model, the machine learning (ML) model, and the pipeline that bridges the two.

That provided a solid foundation for this article where I expand on how to take that pipeline into production, applying DevOps principles to enable reliability and accessibility.

Model Performance

I am leveraging my…


Image for post

Disinformation is one of the greatest threats to democracy. Fake news, trolling, and other politically and socially subversive content is plaguing society. Social media giants have played a role in enabling and incubating this behaviour. Several have even appeared before the U.S. Congress and the EU Parliament, promising to do more to address the challenge of disinformation and hate speech on social media. Some of these promises have taken nearly a year and are still limited.

So, when the opportunity came to create a demo of a production machine learning (ML) system in Google Cloud Platform (GCP), I jumped at…


Image for post

This is the second article in a series of three, which focus on production ML and the intersection between data science and engineering. The other two are Scaling the Wall Between Data Scientist and Data Engineer and Deploying an ML Model to Production using GCP and MLFlow.

Disinformation is one of the greatest threats to democracy. Fake news, trolling, and other politically and socially subversive content is plaguing society. Social media giants have played a role in enabling and incubating this behaviour. Several have even appeared before the U.S. Congress and the EU Parliament, promising to do more to address…


Image for post
Image for post
Source: Chris Gonzalez

This is the first article in a series of three, which focus on production ML and the intersection between data science and engineering. The other two are Trawling Twitter for Trollish Tweets and Deploying an ML Model to Production using GCP and MLFlow.

One of the most exciting things in machine learning (ML) today, for me at least, is not at the bleeding-edge of deep learning or reinforcement learning. Rather it has more to do with how models are managed and how data scientists and data engineers effectively collaborate as teams. …


Image for post
Image for post
Landsat 8 mosaic of Australia’s southeast coast and the tip of Tasmania created using Rasterio

While there are many advantages in moving to a cloud platform, the promise that captivates me is the idea of serverless infrastructure that automatically allocates compute power per the data being processed by a pipeline. In particular, I see this as essential for cloud-based machine learning, and even more specifically, the analysis of raster data like satellite images.

The Hadoop ecosystem sat — and still sits — at the heart of manipulating massive amounts of data. Installing and maintaining Hadoop-based clusters makes for a lot of work just from an infrastructure perspective — never mind the data analytics. …

Byron Allen

Texan transplant to Australia turned Australian transplant to the UK | ML Engineer | Senior Consultant at Servian

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store