Interview: Bob Gregory — Chief Architect at Cazoo (EventBridge, DDD & Microservices)

Bob Gregory is the Chief Architect at Cazoo, a platform that allows users a seamless experience in buying used cars that are delivered to their door. Before Cazoo, Bob was the Application Architect at MADE.COM — the flagship online furniture brand.

During his time at MADE.COM, I had the privilege of working with Bob as part of a Theodo team, helping MADE.COM migrate to React/React-Native and building an in-store showroom touchscreen application.
Bob has been working with service-oriented microservice architectures for a long time and at Cazoo he has invested heavily in Serverless. In this informal interview, we discussed how he approaches microservices, how he builds teams and their use of different AWS Serverless Services.
Ben: From your articles in the past it’s clear you’ve had a passion for Domain-Driven Design (DDD) for a long time. How did this inform you approach to microservices at MADE.COM?
Bob: The approach I took to services at MADE.COM was Guerrilla Service-oriented Architecture (SOA). Before microservices, there was the SOA approach. Having teams directly interact and work on services, ground-up rather than some centralized architecture. The result is loosely coupled systems owned by a team for a long time and focused on a particular business process.
Ben: With you move to Cazoo, a new startup, you moved to a predominantly Serverless architecture. Why was the approach different?
Bob: It was 5 years later!
At MADE.COM, we were early adopters of Docker, it was obviously winning in 3 years time. At Cazoo, Serverless is obviously winning in 3 years — with greenfield projects these are the kind of long bets you can make and for me it’s Serverless.
Ben: What Serverless services are you using?
Bob: Hmm, Lambda, of course, along with API Gateway, DynamoDB (when we need to store data ourselves — most is handled by commercetools and Prismic ) and EventBridge.
All managed by the Serverless Framework.
It’s relatively light as we’re mainly handling integrations.
Ben: Are you using Step Functions?
Bob: Not at the moment, but use cases are on the horizon. Generally, Lambdas are not calling each other, it’s all integration through the client (the browser). There is very little service to service communication as we wanted to avoid “temporal coupling” (multiple things need to happen at the same time).d
Ben: How are you handling async aspects of communication with the frontend?
Bob: The client interactions are synchronous by their nature, for example, when I place an order for a car I’m submitting a command and it completes synchronously. Either we succeed or fail right there or then.
Obviously, in the event-driven backend, there is still a lot of service integration but it’s async, via events, rather than API calls between services.
Ben: It’s great you’ve avoided inter-service communications. How did you coach your teams to achieve this?
Bob: It’s a good question. When we started out we talked about it a lot and for the most part, the teams just get it. We are working on formalising some good architectural decisions, but it helps generally that our teams are divided by functional areas. It’s much more likely they own their entire journey and just need to inform others.
It’s important that your services are built around some capability, it’s important that the services handle everything to do with that capability. This means you need much less inter-service communication than if you had built your system around nouns — keep your services focussed on business processes.
Ben: How did you manage to split your teams so effectively?
Bob: User story mapping and Event Storming allowed us to draw clear boundaries around subprocesses (e.g. finding a vehicle, checkout, finance).
You need to find all the subprocesses you can articulate, and it helps to think in terms of processes not data.
Bob: People tend to think in terms of nouns not verbs, and the result is tight coupling.
Ben: DDD and SOA approaches have been around for a while, does it seem like the Serverless community is “catching up” to it, or are these things generally cyclical in your opinion?

Bob: New technology and new programming languages allow people to start quick, and then they add complexity.
This results in a complete nightmare and then the DDD and messaging people say “we’ve solved this before” and the community catches up.
Enabling DDD and Event-Driven Microservice patterns is something I’ve been passionate about for a long time — in fact, it’s something I’ve recently written an O’Reilly book on with Harry Percival.
Ben: How many functions are you using and what is the split into microservices?
Bob: Currently we have 2 production environments, “Engineering” and “Data”. Engineering is for the platform itself and “Data” is for our data teams to do analysis. Engineering currently has 192 Lambda Functions in production, and “Data” has 113, so 305 in total.
Each microservice is composed of multiple Lambda functions. It’s not so much a case of thinking in terms of entities when assigning Lambda functions their purpose — it’s again verbs, not nouns.
Ben: I’ve seen you speak about EventBridge regularly, could you explain your single bus approach?
Bob: We indeed have a single custom EventBus (in addition to the default one). This reduces people having to know where to integrate, the teams can simply consume events. The only pushback on this has been around PII, but as it’s all ultimately in our system anyway there is no issue.
Currently, we use EventBridge for “Domain Events” (e.g order placed) and “Telemetry Events” (e.g. when you look for delivery slots, we want to know what slots you looked at and how many were available).
Ben: You mention using EventBridge for the “Telemetry Events”, why not Kinesis
Bob: Well, currently all those telemetry events end up on Kinesis anyway. EventBridge is forwarding onto Kinesis.
The reason this is needed is that the data science team finds it easier to work off a firehose, e.g. into S3, than listen to a whole bunch of events. This allows a single analytics stream containing the data events that they can pull from as needed rather than having to process in realtime and parallel. Here the throughput is quite high as they receive events from multiple AWS accounts.
Ben: What is your strategy around these multiple AWS Accounts?
Bob: At the moment Data and Engineering have Dev, Test Prod and in addition, there is a shared Tools account. If I was starting from scratch there would be no shared Dev, also I would move to individual teams having their own accounts to make it easier to self manage IAM.
AWS SSO would allow this to be easier, and we could have our JumpCloud IDP integrating more cleanly with IAM roles. This would allow us to manage this inside AWS and keep JumpCloud simpler. Finally, IAM provisioning would be handled via IaC.
Ben: Obviously your team makes heavy use of EventBridge, what about EventRegistry?
Bob: We would love to use it. We heavily rely on an event-driven architecture and defining shared Schemas. The main thing that prevents us currently is that it’s very difficult to share these across AWS Accounts, i.e. to have a shared schema. This goes against the account strategy we want to move to.
To be clear we would not adopt the “auto schema discover” — in my mind that would be a disaster. Teams need to communicate and agree on shared schemas, this is how they interface.
Running auto validation on top of all this would be great.
Ben: Serverless is a big space, how do you keep your teams up to date?
Bob: We actually were just discussing this. We’re planning of having the platform team come together every Monday to watch the ACG summary videos to keep up to date with new stuff.