Notes from the Week #28
Here’s a non-exhaustive summary of what happened this week.
The work I mentioned last week to run our smoke-tests from a different data-centre is effectively complete.
We’re running the new system in parallel with the existing system of checks for a bit to ensure that they behave the same before switching off the existing system.
I’m consistently impressed by the thought that’s clearly gone into Concourse CI’s domain concepts and API.
I’ve almost finished the write up for the “Mental Health in Software Development” session that I ran last month.
It turns out that writing…
Monday was pretty meeting-focused.
We had a huddle on deriving a set of SLOs from our initial graphite SLIs. The outcome of the session was that our metrics needed further refinement — what we actually want is to have a response time bound for well-formed requests and a different threshold for number of queries that time out or are invalid rather than overall request latency.
Our second session was a retrospective on how we handle ‘walk-ups’ — Shift is pretty lucky that we surrounded by our customers which keeps feedback loops tight but we become overloaded by questions and distractions…
Oh, it’s a long one. I’m trying another new format, breaking down by day — I often forget highlights in trying to limit myself to 2 or 3 things.
I had a two hour session with the rest of the Team Leads about ways to help us go faster, within the constraints of keeping the Unruly culture that makes us unique and not over-egging the process pudding (so to speak).
It feels a lot like a linear/combinatorial optimisation problem that I learned about in school and uni respectively (I’m shuddering at the memory of manually running the Simplex algorithm during…
The Monday of this week, I was drafted in to help resolve a production incident on a system that I had helped build before I moved teams. What makes this unusual is that I had no way to actually debug it at the time. So here’s a small experience post about what I did and what I learned.
NB: These are my personal conclusions, YMMV
I’ve spent more time pairing this week than I have all year — granted that’s just over 3 weeks, but still, I often miss the tight collaboration of pairing when I have a fragmented week due to discussions and huddles across the business.
This week’s main event was a kick-off session to make sure the Product Development is aligned around one of our explicit tech goals for the next quarter: Observability. As Shift team-lead, and the owner of the observability strand of work within Shift, I’ve been quite knee-deep and hands-on with this saga.
I ran a quick non-scientific poll…
Getting back into the habit of blogging after stopping for almost a month is a bit of a wrench, but it means there’s more juicy stuff for me to talk about!
I’ve been digging really hard into reliability and SLXs this year — we’re already well down the road to better understanding the existing reliability for our Graphite metrics-collection system, and are hoping to take these learnings into building SLXs and Error Budgets for our other core systems.
We’ve already gained a lot of insight into how our customers use the metrics collection API to build dashboards:
I was expecting a big parliamentary slap-fight today, but the Prime Minister had other ideas, so I’m going to write my week-notes instead — it’s a very short one because we’re about to start the new quarter, so most of my time is taken up with discussions that I can’t write about yet!
There’s an art to enabling yourself to do things, at work and at home — and as with anything, there is a balance.
On one hand if you only ever do what you want then there’s going to likely be conflict with those around you. On the…
I’m trying a slightly different style this week, let’s see how it feels!
It’s the end of ProDev’s quarter, and we celebrated with a … science fair. We’ve done one of these before and I absolutely love the creativity that comes out of a such a simple proposition. Each team prepares a “stall” that we take into our clubhouse meeting room and we can go around and see what each team has done during the last quarter.
I’m a bit pleased that I’ve managed to keep up weeknoting for 10 weeks now — I’m usually rubbish at building and maintaining habits.
Whilst I lead the Shift team, my “first team” in the Lencioni-sense is the Team of Team Leads. As a result of our highly-collaborative structure, we’re great at building rapport within our own teams as a result of working closely together day-in-day-out.
The Team of Team Leads (or TTL) however don’t spend that much time together, as we’ve got our own teams to be concerned with, but we’re trying to become a better team in the…
The consistency of my mind feels akin to my thoughts being wrapped in some kind of cling-film as I’m coming out of last week’s cold, but the main theme of this week was coming back to things we’ve done or thought before.
Last year we attempted and subsequently abandoned a rebuild of our Nagios infrastructure. The reasons were many, including the state of the configuration being close to spaghetti, and the application’s ubiquity meant it had a lot of inertia and rebuilding it was quite dangerous — would we know if it wasn’t alerting properly?
We’re about to attempt this…