12-factor App Methodology, Distributed Computing, Microservices — Concepts
Notes about Twelve-factor App Methodology, Scaling Types, CAP Theorem, Distributed Computing
Twelve-factor App Methodology
Set of best practices that guide you to build a great cloud native application.
- Codebase — One codebase tracked in revision control, many deploys
Multiple codebases = Distributed System
Each Component in A Distributed System = an App
Factor shared code into libraries which can be included through the dependency manager (#2).
2. Dependencies — Explicitly declare and isolate dependencies
Dependencies (with its versions) such as frameworks that are needed to build a software application.
3. Config — Store config in the environment
Environments = development, QA, staging and production.
Environmental Variables = Different configurations (such as database resources, credentials to external services, canonical hostname for the deploy) that applications have per environment.
Strict separation of config from code = Do not store config as constants in the code. Config varies substantially across deploys, code does not.
Recommended: Store the configuration information in a centralized repository (such as Spring Cloud Config Server).
4. Backing services — Treat backing services as attached resources
A backing service is any service the app consumes over the network as part of its normal operation.
Examples of backing services:
- Datastores (such as MySQL or Casssandra),
- Messaging/Queueing Systems (such as RabbitMQ or Kafka)
- Caching Systems (such as Redis).
All of these need to be configurable, and it should be easy to switch from one backing service to another without any code changes (#3).
5. Build, release, run — Strictly separate build and run stages
Reusable Component = No change in component as the deployment environment changes (only specific configuration for a target environment).
Build Stage = converts a code repo into an executable bundle known as a build.
Release Stage =combines the build with the deploy’s current config (#3).
The run stage (also known as “runtime”) runs the app in the execution environment
A built, released entity -> containerize -> run it in the environment.
Recommended: Every release should always have a unique release ID (timestamp of the release or incrementing number (such as v.1.0.0).
Deployment Tools -> Management Tools -> Ability to Roll Back
6. Processes — Execute the app as one or more stateless processes
Stateless & Share-Nothing.
Sticky sessions = Violation of 12-factor!
Recommended: Session state data is a good candidate for a datastore that offers time-expiration, such as Redis.
7. Port binding — Export services via port binding
Applications = Services -> Port Binding
Example: Web App -> Bind to a Port -> HTTP as service -> Listen to requests on that port.
Port Binding also can make an app a backing service (#4) for another app, by providing the URL to the backing app as a resource handle in the config (#3) for the consuming app.
8. Concurrency — Scale out via the process model
Share-nothing (#6) helps adding more concurrency is a simple and reliable operation.
The array of process types and number of processes of each type is known as the process formation.
Do not daemonize, instead rely on the operating system’s process manager.
See “Scaling Types” section below.
Your applications should be built to be able to dynamically adapt to changing number of instances of various services.
9. Disposability — Maximize robustness with fast startup and graceful shutdown
Processes (#6) -> Disposable = can be started or stopped at a moment’s notice.
Disposability = A measure of the system’s robustness.
SIGTERM -> Graceful shutdown = Cease to listen to the service port (no more accepted new requests), Exit after processing of current requests finishes.
SIGKILL -> Killed Process, Sudden Death = Should be able to handle unexpected, non-graceful terminations.
10. Development / Production Parity — Keep development, staging, and production as similar as possible
- Time Gap: Developer writes new code, gap between development and production.
- Personnel Gap: Developers write code, ops engineers deploy it.
- Tools Gap: Keep development and production environment as similar as possible.
Continuous Deployment -> Keep the gap between development and production small.
VM Environments (Docker, Vagrant) & Packaging Systems (Homebrew/apt-get) -> provide closely approximate production environments for developers.
11. Logs — Treat logs as event streams
Visibility = Each Log Message -> Centralized Logging System
- Requests / Events
- Trends, Graphs
Log Indexing / Analysis Systems (ELK Stack, Grafana, Splunk, Hadoop)
12. Admin processes — Run admin/management tasks as one-off processes
Batch Programs, Database Migrations, Scripts -> Should have code base in version control, follow standard deployment processes and use the same environments as your application code.
Scaling types that can be applied to an application:
- Vertical Scaling (Scale-Out/In)
Add more nodes to (or remove nodes from) a system,
Increasing/decreasing the hardware infrastructure
Such as increasing the CPU processing power, or increasing the amount of physical memory available to the application.
- Horizontal Scaling (Scale-Up/Down)
Add resources to (or remove resources from) a single node in a system
Dynamically increasing/decreasing the number of instances of an application
Such as addition of CPUs or memory to a single computer
Summary: Horizontal scaling means adding more machines to the resource pool, rather than simply adding resources by scaling vertically.
Distributed computing systems cannot simultaneously support all three of:
1. Consistency: Every read receives the most recent write or an error.
2. Availability: Every request receives a (non-error) response, without the guarantee that it contains the most recent write.
3. Partition Tolerance: The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes.
Eventually Consistent — Building reliable distributed systems to achieve high availability that informally guarantees that, if no new updates are made to a given data item, eventually all accesses to that item will return the last updated value; trade-offs between consistency and availability.
Sacrifice at least one.
This is so different from what ACID database transactions guarantee.
Fallacies of distributed computing
Common assumptions made by developers and architects in distributed systems (#1–8 Peter Deutsch’94, James Gosling’97, #9–11 Neward 06):
- The network is reliable
Retry & Ack / Store & Forward / Transactions
Think of async scenarios, too many edge cases
MSMSQ/RabbitMQ/ActiveMQ -> Does not provide a request/response sync.
- Latency isn’t a problem
- Bandwidth isn’t a problem
Bandwith << Data Storage
Domain modals, ORM eagerly fetching too much data
- The network is secure
- The topology won’t change
- The administrator will know what to do
- Transport cost isn’t a problem
- The network is homogeneous
QoS (Quality of Service), Traffic prioritization: You can ensure that essential or time-critical traffic flows without delays by giving it a higher priority value.
- The system is atomic/monolithic
- The system is finished
- Business logic can and should be centralized
Logic -> Physically distributed
“Tag” by feature (in source code)
- Afferent and Efferent Coupling
Afferent — who depends on you (Incoming): Generic, Lib, Framework
Efferent — on who you depend (Outgoing): Domain Specific
How to count coupling? Fields, methods, …
Shared Resource (Database) ->Hidden Coupling
- Platform Coupling
Share contract and schema, not class or type
Proxy for Invocation -> Web Services (SOAP, WSDL, REST, XML, JSON)
- Temporal Coupling
Time -> Processing time of Service B affects that of Service A that calls it
Async / Wait
Pub/Sub Events -> Designing Events : Avoid requests/commands, State something that happened (past tense) -> OrderAccepted, ProductPriceUpdated
Service Agent Pattern -> Client Side Proxy/Gateway -> An extra component on the client can perform additional processing and logic operations to further separate the client from the remote service -> Abstraction on Client Side
Load Balancing (Routing -> Logical and Physical)
Logical Domain -> Message Type = Logical Destination
- Spatial Coupling
Architecture -> Machines
Messaging instead of RPC -> 1-Way Fire and Forget -> Reduces this kind of coupling.
- Return Address Pattern
- Correlated Request/Response (MessageId, CorrelationId)
Messages should be in DTO Format.
Backoff Mechanisms, Retries
Failed Messages -> Error Queue (Message+Exception Info) -> AdminUI (Notify Admin for Replay maybe after a code case fix)
For Auditing/Journaling -> Send a copy of the msg to another queue when it is processed. MessageId, Msg Headers to trace.
Calling Web Services
Keep the same transactionId (orderId, etc.)
TransactionId should not be DB generated but client generated.
Should check if processed (to prevent deduplication in retries)
- For long-running processes
- Cascade of events
- Similar to Message Handlers -> Can handle a number of different message types
- Different from Message Handlers -> Have state (kept usually in a database technology, to pick up where they left off), Message Handlers don’t
- Msg In -> Modifies its part of state / Msg Out -> Saying this is what I decided — State Machine
- Unique Constraint -> An transactionId (orderId) property
- Should not look back on history.
- Should not change master data.
- Saga + Long Running Processes → Domain Models
Saga Pattern: A sequence of local transactions where each transaction updates data within a single service.
Saga Pattern Tips:
- Add the reply address within the command.
This way you enable your participants to reply to multiple orchestrators.
- Idempotent Operations
Kafka, RabbitMQ, vs. -> chance of delivering same message twice
- Await Synchronous Communications
Add all the data required for operation to be executed
Avoid sync calls between services to request more data. This enables your services to execute their local transactions even when other services are offline.
Sagas sacrifice atomicity, relies on eventual consistency.
2PC (Two Phase Commit)
Phase 1 — Each server that needs to commit data writes its data records to the log.
A server is unsuccessful -> It responds with a failure message
A server is successful -> It replies with an OK message.
Phase 2 — This phase begins after all participants respond OK. Then, the coordinator sends a signal to each server with commit instructions. After committing, each writes the commit as part of its log record for reference and sends the coordinator a message that its commit has been successfully implemented. If a server fails, the coordinator sends instructions to all servers to roll back the transaction. After the servers roll back, each sends feedback that this has been completed.
Saga vs. 2PC → 2PC requires strong consistency, ACID. Not applicable to distributed systems.
Read more about Sagas in MicroServices section.
How to decompose the application into services?
From Object-oriented design (OOD)-> Single Responsibility Principle (SRP) defines a responsibility of a class as a reason to change, and states that a class should only have one reason to change. (Cohesion)
From Object-oriented design (OOD)-> Common Closure Principle (CCP) which states that classes that change for the same reason should be in the same package. (Coherence)
What is Bounded context?
A bounded context is simply the boundary within a domain where a particular domain model applies. DDD (Domain-Driven Design) deals with large models by dividing them into different Bounded Contexts and being explicit about their interrelationships.
- Decompose by business capability
Services = Business Capabilities.
A business capability is a concept from business architecture modeling.
- Decompose by subdomain
Domain-Driven Design (DDD) subdomains
- Core — key differentiator for the business and the most valuable part of the application
- Supporting — related to what the business does but not a differentiator. These can be implemented in-house or outsourced.
- Generic — not specific to the business and are ideally implemented using off the shelf software
- Service per team
Conway’s Law: An architecture mirrors the communication structure of the organization that builds it.
- Each team is responsible for one or more business functions.
- A team should have exactly one service unless there is a proven need to have multiple services.
- A team should only deploy its code as multiple services if it solves a tangible problem.
- It must ‘fit’ in the team’s heads.
- Self-contained service
Question: How to design a service so that it can respond to a synchronous request without waiting for the response from any other service? Collaborate with other services using the CQRS and the Saga patterns. A self-contained service uses the Saga pattern to asynchronously maintain data consistency. It uses the CQRS pattern to maintain a replica of data owned by other services. (Saga and CQRS are detailed below)
Services communicate using either synchronous protocols such as HTTP/REST or asynchronous protocols such as AMQP.
Each service has its own database in order to be decoupled from other services so 2PC is not an option; data consistency between services is maintained using the Saga pattern.
- A saga is a sequence of local transactions that spans multiple services.
- Each local transaction updates the database and publishes a message or event to trigger the next local transaction in the saga.
- If a local transaction fails because it violates a business rule then the saga executes a series of compensating transactions that undo the changes that were made by the preceding local transactions.
Drawback: Compensating transactions can get complicated.
Ways of coordinating sagas:
- Choreography — each local transaction publishes domain events that trigger local transactions in other services
- Orchestration — an orchestrator (object) tells the participants what local transactions to execute
Event sourcing persists the state of a business entity.
Applications persist events in an event store, which is a database of events. The store has an API for adding and retrieving an entity’s events. The event store also behaves like a message broker. It provides an API that enables services to subscribe to events. When a service saves an event in the event store, it is delivered to all interested subscribers.
The Database per Service pattern creates the need for this pattern. CQRS is often used with Event sourcing.
Command Query Responsibility Segregation (CQRS)
Question: How to implement a query that retrieves data from multiple services in a microservice architecture?
Solution: Define a view database, which is a read-only replica that is designed to support that query. The application keeps the replica up to date by subscribing to Domain events published by the service that own the data.
Implement a query by defining an API Composer, which invoking the services that own the data and performs an in-memory join of the results.
An API Gateway often does API composition.
You will frequently create new services, each of which will only take days or weeks to develop. There are additional cross-cutting concerns that you have to deal with including service registration and discovery, and circuit breakers for reliably handling partial failure.
Solution: Create a microservice chassis framework that can be foundation for developing your microservices.
One solution is to create a Service Template, which is a source code template that a developer can copy in order to quickly start developing a new service.
Remote Procedure Invocation (RPI)
Inter-process communication protocol
- REST (Representational State Transfer)
What is HATEOAS (Hypermedia as the Engine of Application State)?
A constraint of the REST application architecture that distinguishes it from other network application architectures.
With HATEOAS, a client interacts with a network application whose application servers provide information dynamically through hypermedia.
- gRPC (A modern open source high performance Remote Procedure Call (RPC) framework)
gRPC Implementation With Spring Boot
gRPC Implementation with Java and Spring Boot
- Request/asynchronous response — a service sends a request message to a recipient and expects to receive a reply message eventually
- Publish/asynchronous response — a service publishes a request to one or recipients, some of whom send back a reply
Transactional outbox (Application events)
A service command typically needs to update the database and send messages/events (like in saga). The order of the messages and idempotency must be assured.
- A service that uses a relational database inserts messages/events into an outbox table (e.g. MESSAGE) as part of the local transaction.
- A service that uses a NoSQL database appends the messages/events to attribute of the record (e.g. document or item) being updated.
- A separate Message Relay process publishes the events inserted into database to a message broker.
Transaction log tailing
Tail the database transaction log and publish each message/event inserted into the outbox to the message broker (MySQL binlog, Postgres WAL, AWS DynamoDB table streams).
Publish messages by polling the database’s outbox table.
- Client-side service discovery
When making a request to a service, the client obtains the location of a service instance by querying a Service Registry, which knows the locations of all service instances.
- Server-side service discovery
When making a request to a service, the client makes a request via a router (load balancer) that runs at a well known location. The router queries a service registry, which might be built into the router, and forwards the request to an available service instance.
The API Gateway authenticates the request and passes an access token (e.g. JSON Web Token) that securely identifies the requestor in each request to the services. A service can include the access token in requests it makes to other services.
- Log aggregation
- Exception tracking
- Application metrics (Prometheus, AWS Cloud Watch)
- Audit logging (Event Sourcing is a reliable way to implement auditing)
- Distributed tracing (Spring Cloud Sleuth instruments Spring components to gather trace information and can delivers it to a Zipkin Server, which gathers and displays traces)
Spring Cloud Sleuth and Zipkin
Integrate Sleuth and Zipkin to your Spring Boot Microservices for distributed logging
- Health Check API (Spring Boot Actuator module. It configures a “/health” HTTP endpoint that invokes extensible health check logic.)
- Log deployments and changes (AWS Cloud Trail)
Service Integration Contract Test (Service Component Test)
Spring Cloud Contract is an open source project that supports this style of testing.