My data engineering dev setup — March 2022

Jamie Thomson
7 min readMar 5, 2022

--

I’m one of those people that likes to peer at other people’s screens to see what apps they’ve got pinned in their taskbar/dock, what bookmarks they have in their browser bookmarks bar, what tools they prefer to use. I know other people do too. For that reason I thought it would be fun to share my current setup, thoughts, comments and opinions are very welcome.

Everything in here will likely change at some point in the future which is why I’m highlighting the date in the title. I look forward to writing an update in a few years time to see what has changed.

What do I do?

I work for MoneySupermarket, my job title is Principal Data Engineer. I have responsibility for delivery of our data solutions on Google Cloud Platform (GCP). The GCP services we use most frequently are Cloud Pub/Sub, Cloud Dataflow, Cloud Functions & BigQuery. Just recently we’ve dabbled with Bigtable too.

Equipment

For years I poured scorn on folks that espoused the use of Macs over Windows laptops, I pigeon-holed them as “style over substance” type people, more interested in blingy brands than functional tools. That changed in 2017 when I joined a new team at my previous employer and was given a Macbook Pro, ever since then I’ve been hooked.

The main reason is the unix-y feel on the command-line which means I can use the same tools on my laptop that I do on the linux infrastructure that we run our solutions upon. I’ve grown to prefer the “look-and-feel” of the Mac environment and my switch of allegiances was reinforced when in November 2019 I spent a significant amount of my own money on a Dell XPS 15 running Windows 10 and I’ve experienced non-stop glitches with it. The mouse pointer periodically freezes and the only way I’ve found to solve it is to reboot, the fans whirr while its idle (I was once woken in the middle of the night to the sound of the fans even though I’d put it to sleep and the whole unit was too hot to touch), try as I might I can’t get the damn thing to talk to my printer (every android & iOS device in our home works fine) and I find the wifi connection to be flaky. Compare this to the Macbook I have from work which is the epitome of “just works”. I’m a 100% Mac convert and I’m not going back. Sorry.

Today I’m rocking a 16-inch Macbook Pro 2019 which I was given when I started at MoneySupermarket in February 2021 and I love it. I’m writing this blog post on it.

Macbook Overview

I even like the Touch Bar on it even though I suspect I’m in the minority on this one. I use it for volume control, app shortcuts (particularly useful for Teams/Zoom calls) & inserting emojis.

Web Browser

I’ve been a Google Chrome user for many years now and although I’ve flirted with Firefox a bit I keep going back to Chrome, mainly because it “knows” me. It knows my extensions, my bookmarks and my profiles (I use 2 profiles — one for work account and one for my personal account). These are enough to keep me using it.

Extensions-wise I use:

  • LastPass — my password manager of choice, absolutely indispensable
  • FireShot — captures screenshots of entire web pages
  • Pocket — for bookmarking stuff

Command-line

The command-line is where I spend most of my day so having a productive setup is critical for me. I use zsh along with oh my zsh and if I’m perfectly honest I’m not entirely sure which features are provided by which because I installed oh my zsh the moment I started using zsh so to me they’re basically one discrete tool.

Here are the oh my zsh plugins that I use:

  • git — All our code is in git repositories and my main way to interact with them is on the command-line. This plugin provides many aliases which I‘ll cover later in this article.
  • zsh-autosuggestions — could not live without this. The list of commands I issue day-in, day-out is actually quite a short list and this plugin helps me get to them quickly, just type a few letters followed by right-arrow and I get what I need.
  • zsh-z — A nice tool for navigating to commonly-used directories

tmux

I was introduced to tmux by a former colleague (hey Rob) and I’ve become a convert to this too. I prefer to keep my hands on the keyboard as much as possible rather than reverting to the mouse and tmux allows me to do that while still using multiple command-line terminals on the same screen. It took me some time to get used to it and I’m still learning (I am by no means an expert) but I know enough to get by.

tmux at work

Command-line tools

These are the command-line tools I use every day:

  • git — no choice about this one, all our code is in git so this is mandatory. I generally prefer the git command-line to GUI tools so I use this a lot (more on this later).
  • exa — a modern replacement for ls
  • docker — Our solution makes heavy use of containers so docker is a must. We trialled a few alternatives when Docker Desktop recently became a paid-for product but didn’t find anything that had the compatibility we needed. We stumped up the cash for Docker Desktop and I was happy to do so (easy for me to say, its not my money, or my budget😊 ).
  • rich — a nice alternative to cat which provides colourful terminal output. It is built upon the rich python module (which we use in our data solutions).
  • gcloud / gsutil / bq — Crucial for interacting with GCP.
  • terraform — We’ve made a big bet on terraform for creating GCP infrastructure, I use this constantly.

git aliases

The git plugin for oh my zsh provides many aliases to help with git, I’ve memorised some of them and use them constantly. They’re critical for increasing my productivity:

Useful git aliases

(I’d like to provide those in a github gist rather than an image but gist.github.com is blocked for me, something I’ll be endeavouring to sort out, apologies)

I find glop to be particularly useful as it provides a view of git log that’s easy-on-the-eye:

Nice coloured output from glop

gitbranchclean removes local branches that have been merged. Very useful.

Visual Studio Code

We write a lot of python, terraform and SQL code, and a little bit of java too. My development tool of choice is Visual Studio Code although I occasionally dabble with PyCharm for python stuff.

The Visual Studio Code extensions that I use are:

  • peacock — differentiate your Code instances with colour
  • draw.io integration — draw architecture diagrams in your editor and automatically render them in markdown documents
  • Git Graph — beautiful git branch visualisation
  • HashiCorp Terraform — Syntax highlighting and autocompletion for terraform
  • pylance — A performant, feature-rich language server for Python in VS Code
  • Rainbow CSV — Highlight CSV and TSV files, Run SQL-like queries
  • ShellCheck — Integrates ShellCheck into VS Code, a linter for Shell scripts.
  • Sourcery — Refactor Python instantly
  • GitLens — Supercharge Git within VS Code — Visualize code authorship at a glance via Git blame annotations and CodeLens, seamlessly navigate and explore Git repositories, gain valuable insights via rich visualizations and powerful comparison commands, and so much more
  • Code Snapshot — Create custom code snippets as images
  • docker — Makes it easy to create, manage, and debug containerized applications.
  • footsteps — Highlight and navigate between your most recently edited chunks of code

Most important of all, we use Visual Studio Code remote containers to provide a consistent dev experience for developers. Burke Holland has a great blog post at Please, everyone, put your entire development environment in Github that extols the benefits of this feature.

Pre-commit

I’m an advocate of using pre-commit to run code linters when committing code. We use these hooks:

CI/CD

We store our git repositories in GitHub so it was an obvious choice to use GitHub Actions for CI/CD. I’m so glad we did, it a really great platform for CI/CD. Using GitHub Actions we’ve automated our production deployments so that we can take code from a developer’s laptop, run it through an integration test suite, and have it running in production in a little over an hour.

We host our own runners using Google Compute Engine. We run 32 runners through the working day and then dial that down to 5 runners at the end of the working day and then at midnight dial it down again to one runner just in case one is needed. This infrastructure costs us about £14 a day:

CI infrastructure costs

Wrap-up

Hope that was interesting to folk. I’d love to hear what tools other data engineers out there use so please do leave some comments.

--

--