This is the second part of an article about how we use server-less architecture at Gousto to power our supply chain operations. The first part elaborates on our journey to the new architecture, and this part details the architecture itself along with benefits and drawbacks.
A server-less architecture
The principles highlighted in the first part guided us towards an architecture based on the publish-subscribe pattern. Within our AWS platform, we already had the ability to define events that any deployable micro-service could publish and consume, following a convention-based approach using Amazon SNS and SQS.
Along with the capabilities offered by our chosen ERP system, these building blocks enabled us to rapidly iterate on our integration strategy, making substantial use of Amazon SQS, SNS, S3 and Lambda. Using these Amazon services eliminated the need to manage message brokers, file servers, or application servers, making the architecture entirely server-less.
The architecture is composed of a few different components.
- The Gousto platform and the ERP system itself, both of which provide REST APIs and event sources for interfacing.
- The existing WMS, which still requires interfacing via CSV files — we intentionally chose not to replace the WMS to contain the scope of the ERP replacement project and minimise disruption in the warehouse.
- A fleet of Lambda functions responsible for injecting and extracting data into and out of the ERP system using REST APIs.
- A set of SNS topics in the Gousto platform, to which messages can be published from Lambda functions, or from within other applications in the platform — all published messages follow a consistent JSON schema, agnostic of the platform generating the data.
- A set of SQS queues in the Gousto platform subscribing to relevant SNS topics, with each consumer possessing its own dedicated queue.
- A set of Lambda functions to handle conversion between JSON messages and WMS-compatible CSV files.
- An S3 bucket to store event-generated independent CSV files for interfacing with the WMS — files uploaded to S3 generate S3-file events which are subsequently consumed by the WMS or Lambda functions depending on the file type and direction.
The utilisation of this architecture is illustrated in the following example for purchasing carrots from a supplier.
Adopting this architecture has provided a significant number of benefits, to both the business, and to the Engineering teams who develop and maintain this ecosystem.
Specifically, some of the key benefits emanate from our use of AWS Lambda.
- Spikes in transactions, especially our order volumes, are handled seamlessly by AWS Lambda’s cost-efficient scalability — this new pipeline can easily handle over 100,000 transactions an hour, compared to only 4,000 with our first-generation systems. We are only limited by the ingest rate of the WMS and adjustable concurrency limits in AWS itself.
- With AWS Lambda, our developers have the ability to deploy new features and bug fixes quickly and highly granularly with zero-downtime deployments — this allows us to rapidly iterate and get feedback on functionality.
This architecture and the use of other AWS server-less components provide a number of other technical benefits as well.
- Treating transactions in a platform-agnostic way using consistent JSON schemas allows replacement or addition of end systems and interfaces, without any changes to the architecture.
- Transactions can be easily validated and traced for integrity and consistency at all stages due to the use of consistent JSON schemas.
- Using AWS SNS and SQS to implement a publisher-subscriber pattern, including SNS topics for S3-file events, enables multiple consumers to tap into the pipeline and process events in a flexible way.
- The use of AWS Lambda with SNS topics, SQS queues and dead-letter queues with automatic retries, makes this architecture inherently fault-tolerant — it can tolerate components being unavailable without any loss of data.
Finally, with our first-generation systems, we often had very limited knowledge of issues or bottlenecks until we were informed by business teams about missing data. Resolving these issues could take hours due to a lack of visibility of data flow between systems.
By using native AWS services in our new architecture, we are automatically provided with native monitoring support through AWS CloudWatch.
Measuring and alerting on CloudWatch metrics such as the number of events published to an SNS topic, queue depth of an SQS queue, or execution time of a Lambda function allows our developers to proactively identify bottlenecks and failures and resolve issues within minutes.
Moreover, these technical benefits have allowed us to release immense value to the business.
With the new architecture, sales orders can be made available for picking seconds after being confirmed on our e-commerce platform. Compared to the hours that this process took previously, this has provided us with the ability to significantly reduce the lead time between order placement and delivery.
With data synchronisation occurring within seconds between the warehouse, the ERP system, and our e-commerce platform, business teams now have real-time insight into warehouse operations and transactions, which can then be analysed and optimised.
The same real-time insight allows us to provide value to our customers too. For example, we can identify the specific batch of chicken packed into a customer’s box. This data can be made available to our e-commerce platform seconds after picking, and we can use the expiry date from the batch to present a customised meal planner for the customer, based on when ingredients should be consumed by.
Finally, a special Lambda function, which consumes and archives all messages published within this pipeline, is used to provide a detailed audit log of all transactions that occurred between these systems. This is especially beneficial because of the strict traceability requirements within the food industry.
However, we have also needed to adapt ourselves to cater for a number of drawbacks that accompany such an architecture.
The sheer number of entities and types of transactions that can occur in an e-commerce platform, the WMS, and the ERP system means that there is a large number of AWS Lambda functions, SNS topics, and SQS queues to monitor. Our engineering teams are responsible for building, deploying, and supporting their own services in production, which can add a significant overhead with the amount of components to monitor.
Additionally, with the use of native AWS components, it is not trivial to develop and test against the entire cloud-based ecosystem on a local development machine. We make use of a combination of Docker containers, mocking certain AWS components, and contract-testing, along with end-to-end testing on multiple AWS environments to overcome this limitation.
Finally, even though we have an automated and scalable pipeline, the end systems are still operated by humans, leaving scope for human error.
By ensuring that data validation rules are consistent in our e-commerce platform, ERP system, the WMS, and in all stages in between, we can reduce the likelihood of human error causing data consistency and integrity issues.
Since going live with our new ERP system and architecture, we have already realised a significant number of benefits and provided additional value.
However, such a flexible architecture provides countless more opportunities, and with an exciting roadmap ahead of proposition enhancements leveraging this architecture, we are only limited by our imagination and the capacity of our engineering team!
An obvious improvement we can make is replacing our dedicated WMS with a more modern, scalable system. Whether we build something ourselves, buy off-the-shelf software, or a combination of both, the only modification required is a new Lambda function to convert JSON messages into the appropriate interface format.
Additionally, using transactions from pick-events in the warehouse, we can potentially update stock on our website in real-time, or intelligently manage stock depending on objectives.
Finally, one of the most exciting use-cases involves a virtual simulation of our warehouse, where our engineers run machine-learning algorithms to optimise warehouse capacity.
By tapping into the sales order pipeline and sending real sales orders to the virtual warehouse, we can run parallel simulations of boxes being routed around our warehouse, and ensure that our real picking line is optimised for any given set of orders.
The benefits and value created by our new ERP system and the architecture around it have been revolutionary for Gousto, and we are tremendously excited to realise its full potential in the years ahead.
If you are interested in working with any of the technologies mentioned in this blog post, we are hiring! Visit https://www.gousto.co.uk/jobs to apply.