Reactive Microservices: A Heartbeat Health Story
Heartbeat Health is a new telemedicine-based service aimed at giving people access to high-quality cardiology care with much greater ease and convenience, using the latest internet-enabled technologies and cutting edge user experience design. I’m excited to say I’ve recently joined them as Chief Architect after wrapping up three years in a similar role at Lark Health.
Heartbeat Health has already deployed a service which is currently live, in production, serving patients and the service is already cash flow positive. But we have an opportunity to build the next version of the service essentially from the ground up, building on the lessons of the first version while utilizing the best practices and techniques available to us today. I’ve been tasked with coming up with the new architecture which I’d like to describe here.
One fairly recent design pattern which has gained a lot of popularity in recent years has been the microservice architecture. It has the advantage that it strictly enforces modularity by completely separating microservices into their own self-contained, simple applications; each of which is loosely coupled with other microservices via contracts. New developers can ramp up quickly on any given microservice because they’re extremely simple; you can typically reimplement microservices even to the point of using entirely new technology, relatively quickly, without having to rewrite your whole application; microservices are typically designed to be horizontally scalable, making large scale much easier to attain.
However, the traditional microservice uses a multithreaded synchronous request-response pattern; that is, the code simply waits for blocking operations like I/O to complete. The problem with this model is it doesn’t scale well; threads are expensive.
Reactive programming is an answer to this: rather than blocking on IO and other operations, a small number of threads can be used to process many simultaneous requests by yielding control instead of blocking. In addition to dramatically improving scalability, the code is easier to read and understand, and you can write code at a higher level of abstraction. It’s an improvement in both scalability and code legibility at the same time.
We are building our architecture to be as cloud agnostic as possible. This means rather than deeply embed ourselves in proprietary technologies that tie us to a specific cloud provider, we’re using mostly open source solutions (for example, Kafka and RabbitMQ rather than SQS, open source databases rather than, say, DynamoDB. The idea is to be able to redeploy our service on whatever cloud provider (or even multiple cloud providers) relatively straightforwardly. We will make some exceptions, of course; for instance on AWS it makes sense to use S3 for storage, and in GCP we would use Cloud Storage.
Velocity Through Testability
To achieve true development velocity we have thorough unit and integration tests, enabling refactoring of systems and rapid release of features at a much quicker velocity.
Front and Back End Technologies
Consistent with the above principles, we’re planning to use the following technologies, initially:
React and React Native
- Mature technology with large community of developers and contributors
- Cross-Platform on mobile
Spring Reactor (Spring Boot + Reactive Programming)
- Reactive / non-blocking / efficient
- Despite Java libraries, with GraalVM can boot very quickly, making it compatible with both microservice and serverless deployment
- Large developer community, many libraries
- Easy to write and read database code via the repository abstraction, declarative and convention-based approach
- Doesn’t impose a particular event bus on other microservices, so can mix and match with other languages and technologies
- Using message queues rather than HTTP for requests
- Scales better, more robust to failure of microservices
- Starting with RabbitMQ for message bus, but using a shim so code is not tightly coupled with RabbitMQ — open source, cloud agnostic
- More flexible and powerful than REST
- Extensible, backward-compatible APIs
We may use this for microservices which talk directly with React Native-based apps (see below)
- Using TypeScript for ease of refactoring
- Same language as front end, for possible sharing of code
- Similar reactive coding model
- Single-threaded, but can scale horizontally
Kotlin is “Java done right” — taking many of the best ideas from other languages such as Scala and Groovy but retaining seamless Java interoperability. Unlike Java, Kotlin has real closures, meaning that you can reference temporary variables outside of the closure block — this enables far more powerful constructs that allow for control structures and functional programming idioms to be much more cleanly expressed than they can be in Java.
Kotlin is also much more concise than Java — you don’t need to declare types everywhere, and there’s a lot of emphasis on reducing an eliminating redundancy such as having to explicitly write scaffolding like getters and setters, etc. Kotlin achieves economy of style while also managing to be even more type strict than Java, taking advantage of modern compiler design and extensive type inference. The addition of the nullable type modifier is a seemingly small yet pivotal change which can find one of the most common and destructive bugs that can afflict any program and prevent it at compile time with static code analysis.
- For core backend services
- Easier to learn for Java engineers than Scala
- Real closures, ability to write in a functional style (though not as strong as Scala)
- Less impedance mismatch with Java (more efficient and straightforward interoperability)
- Downside: lacks Scala’s powerful pattern matching facility and more imperative and less functional programming oriented than Scala
We intend to use TypeScript for front end development and if we also use Node.js, also backend. Projects using untyped languages are easy to get going, but often become difficult to refactor as the codebase grows.
Python is the de facto standard for most data science/machine learning code, so we intend to support this for our ML and data science scripts.
Production: SQL + Liquibase
- SQL databases are transactional and robust
- There are new horizontally scalable SQL solutions such as CockroachDB / Cloud Spanner, etc.
- Liquibase allows for abstract definitions of schema in a database vendor-independent way for both migrations and schemas
Data Warehouse (“Data Lake”): Snowflake
Snowflake is a blazing fast, horizontally scalable data warehouse that can separate data analytics and machine learning jobs from our production infrastructure. I’ve used it to good effect in previous jobs and we intend to standardize on it. We can collect all of the data we need for analytics purposes in one place for speed of reporting and analytics without impacting production systems.
Kubernetes, Docker, Helm, Terraform, Infrastructure as code
These are the industry-standard ways of orchestrating services in ways that are both cloud-agnostic and allow for efficient use of cloud hardware. We also want to create a self-service DevOps model where engineers and stakeholders all have direct access to monitoring dashboards and can deploy test environments as needed, on demand, without having to involve DevOps.
The development lifecycle we want to encourage involves very frequent (even more than one per day) releases of small features, each tested independently and thoroughly in an automated as well as manual way before being deployed live. The master branch will be pushed to production as soon as possible after the feature is merged to master. This is a deployment strategy previous DevOps engineers I’ve recently worked with have recommended as well as Heartbeat’s DevOps lead. Some key features of this model:
- Give developers self-service access to create new test environments, etc., reducing load on the DevOps team to focus on building tools and infrastructure automation
- Small, frequent releases encourages velocity and responsiveness to new consumer needs
- Monitoring & tracing built in from the beginning — reliability and observability built into the design
- Dashboards, infrastructure accessible to developers and stakeholders directly — relieve pressure on DevOps, increase visibility of systems, self-service DevOps
Monitoring and Alerting
We intend to deploy the ELK stack for log querying, Prometheus and Grafana for metrics, monitoring, and alerting, and Jaeger for distributed tracing, giving us better visibility into the sometimes complex performance profile of microservice calls. Warnings and errors will go to Slack as well as OpsGenie for on-call paging and emergency alerting.
Security and Compliance
Since we are a health care service, we will need to be both HIPAA compliant and, eventually, also HITRUST. We will build this into both the DevOps and application infrastructure from the start, utilizing HIPAA-compliant services and segregating sensitive personal information, using comprehensive end to end encryption, and making sure we’re doing constant threat detection and patching of infrastructure.
[Martin Fowler on Event Collaboration](https://martinfowler.com/eaaDev/EventCollaboration.html)
[Fowler on Command Query Separation (should be called Mutator Query Separation)](https://martinfowler.com/bliki/CommandQuerySeparation.html)
[Paul Johnston on Serverless Best Practices](https://medium.com/@PaulDJohnston/serverless-best-practices-b3c97d551535)