System Architecture for B2B and SaaS

Tat Sean Pang
11 min readAug 16, 2019

IMHO, whether a system architecture and design is the best design is very subjective and tends to court some arguments. It is analogous to asking the question of which programming language(s) is best-suited to create a web portal or web services, regardless of whether it is a consumer-facing application or enterprise application. Should we use Java, C# with .NET, Ruby, Rust, Golang and so on? Same applies to machine learning and data science applications, whether we should use R or Python, or Julia? In terms of database servers, should we use MS Sql Server, Oracle, PostgreSQL, MySQL and so on? The list goes on …..

Anyway, i still think for small and medium size applications (i believe that majority of B2B and SaaS applications probably fall into this category’s size), “simple” and maintainable design with “simple” scalability mechanism are more than enough. In reality, how to design a “simple” yet maintainable and scalable application is rather challenging. Too much embellishment makes the system looks good on the paper, but in most of the cases, poses huge challenge to maintain. So, what components/modules are indispensable in a typical SaaS and B2B applications, especially we are referring to system which also requires certain extent of security management? The following shows what i think it should entail:

  • Logging and Monitoring Mechanism
  • Key Management
  • Multi-Tenancy Design
  • Data Denormalization for Data Warehouse/ Analytics which in turn for Reporting
  • Caching

If i were the one designing the SaaS/B2B solution, this will be my high-level architectural diagram:

Logging and Monitoring Mechanism

Personally, i still think the best storage medium for logging is plain text file in each deployment server. It seems to be common to see the design of using NoSQL and even RDBMS for storing the log data, but i would say that the most guaranteed way of storing log information with no data loss is plain text file. But since we know that in most SaaS/B2B applications, there will definitely be more than one server that contains log data, how should we consolidate and centralize the logging information? My personal preference will be to use Elastic Stack. The new version of Elastic Stack even supports basic authentication for Kibana. Previously, in order to use basic authentication for free, we have to rely on Search Guard. Some people i talked to even skeptical about Elastic Stack, particularly those without much exposure to open-source ecosystem, but the following two articles, i hope should quell their skepticism:

Key Management

Not every single SaaS/B2B application requires key management server. But if you are in financial industry, or having part of the businesses related to financial industry, key management software should be something which you have some experience and familiarity with. There are a few of enterprise-grade options, such as Hashicorp Vault, Confidant from Lyft, Keywhiz from Square and so forth, but if you are from Windows platform, and also in terms of completeness of documentation, i would say Hashicorp Vault comes as the first and probably the only choice. Hashicorp Vault allows us to store almost every sensitive keys and certificates in the Vault Server, in very secured and encrypted way. It even allows us to store the RDBMS connection strings in the Vault server without exposing the actual credentials of the databases.

I am compelled to share that Hashicorp is a unicorn, with not just having Hashicorp Vault under its belt, it also produces many famous open source tools such as Terraform.

Multi-Tenancy Design

I have heard quite a number of tech people raising the question of how to scale database and the first “solution” comes to mind is probably by leveraging on the Auto-Sharding feature in RDBMS. But again, i personally prefer the so-called “Manual Sharding” mechanism which has been used in some SaaS applications, and some people also call it Multi-Tenancy design. This design idea is particularly making sense in SaaS and B2B applications because we track the customer of SaaS/B2B applications by subscription account with each account has certain number of actual system’s users.

Manual Sharding is designed in such away that, (1) whenever there is a new request to the system, (2) the system will reference a “database router” to determine which account that particular user belongs to, and (3) route to the particular database that hosts the data for the customer (or account). This account-database relationship can always be cached in the caching system to allow fast retrieval without impacting the performance of the high traffic system.

With this design, we can even have different types of RDBMS to store our data, such a mix of MS SQL Server and PostgreSQL, or a mix of Oracle DB with MySQL. Also, the design and implementation should also be able to cater to the scenario where it is easily move out the data of any “tenant” or “account” from one database to the other database, and the other scenario in which we want to consolidate different “tenants” or “accounts” from different database to a single database. This, in fact, can be done easily if the database schema is designed in a way that allows this “expansion” and “merging”.

In fact, my first exposure to this Manual Sharing design was during my job tenure working with the teams in Dude Solutions, a SaaS solutions provider but with some tweaks on the new design which i personally think probably a little better.

Data Denormalization for Data Warehouse/ Analytics which in turn for Reporting

One of the most difficult challenges of designing a SaaS/B2B system is the design of data warehouse for reporting or analytics, particularly we are talking about real-time reporting and analytics. How do we design the system in such a way that high-traffic OLTP data can be flowed to a data pipeline which then allows streaming API to perform extraction and transformation of the OLTP data? In this context, Change Data Capture (CDC) is always brought to attention. The following is the definition of CDC from Wikipedia.

In databases, change data capture is a set of software design patterns used to determine the data that has changed so that action can be taken using the changed data. CDC is also an approach to data integration that is based on the identification, capture and delivery of the changes made to enterprise data sources.

CDC is available in most of the main RDBMS including Oracle, MS SQL, MySQL, and PostgreSQL. We can leverage on this feature, together with Apache Kafka Streaming, to implement a robust and near-to-real-time data warehouse. We can capture the relevant data changes in RDBMS, pump those data into Apache Kafka through supported CDC connectors, to Kafka pipeline and using the Kafka Streaming API to extract and transform the data, before pumping it into data warehouse. So far, as far as i am concerned, Debezium is the most popular open source distributed platform for CDC connectors.

The gist of it is: In the case of requiring the OLTP data to be flowed into other data stores (Relational or NoSQL or any other types of data store) for further processing, use a combination of Outbox Pattern, CDC, and Streaming or Message Queuing middleware to extract and transform the OLTP data to other data format which could be further utilized by other modules such as reporting, data analysis, data science, graphing, as well as fraud detection.

Transactions Behavior Monitoring

If we are involving in SaaS/B2B applications that are related to financial industry, there is one requirement which seems to be pretty common: Figure out the behavior or patterns of the transactions in order to identify frauds. For instance, in the industry of remittance, there is always a need from compliance perspective to figure out the relationship between sender and beneficiary. How many beneficiaries a particular sender has remitted the money to and who those beneficiaries are, or how many senders have remitted to a particular beneficiary and who those senders are. This is direct relationship.

We may also want to figure out the indirect relationship in which the beneficiary (B_A) of Sender (S_A) has also remitted money to other beneficiary (B_B) and whether there is any relationship between S_A and B_B. In a nutshell, we tend to want to figure out what are the direct and indirect relationship between sender and beneficiary.

So, with this software requirement, is RDBMS still the best option? Probably not and in this case, probably Graph DBs, such as Neo4j could be a better choice? If Graph DB is a better choice to store all those data for fraud detection analysis, then question arises of how do we extract and transform the OLTP data in RDBMS to Graph DB? Again, a combination of CDC connectors and Kafka and Graph DB probably can be the answer and be used to implement this software requirement.

Monolith, Distributed Monolith, MicroServices or Modular Monolith

A google search and you will be inundated and swarmed with a lot of articles discussing the pros, and cons of using Monolith and Microservices. But again, we should not take all those at face value and always be skeptical about what are being stated in those articles. Reason being, there is no single design solution which is panacea for all scenarios in software architecture.

For small-to-medium size SaaS/B2B solution, i believe Modular Monolith is still the best option. Reason being, i feel it is easier to comprehend and maintain with a small/medium number size of developers. Unless you have the human resource budget to fork out $millions to recruit some of the best software engineers in the world, like what gigantic software companies, unicorns and decacorns could afford, in that case, Microservices probably is a better option. But again, it also depends on whether we are designing a consumer-based application or enterprise system (SaaS/B2B).

If Shopify also adopts Modular Monolith, i see no reason why small-to-medium size SaaS/B2B solutions should go for full Microservices to “solve” their software architectural concern.

Regardless of which architectural approach, nowadays, in any solution design and implementation, there will be some piecemeal API services involved. When API services are involved, it tends to face the challenges of two important concepts, which are “dual-write” and “read-your-own-write”. If i have to recommend an article which talks about these issues and its suggested solution, this article from Debezium Blog would be must-read article.

Distributed system is always hard, is even harder if we are not some of the best engineers in the world. It looks “easy” when everything works perfectly, but things get real messy if something goes wrong and we have way too many services to deal with. Some of the challenges have been briefly mentioned in one article in Shopify Engineering blog, and there are many articles out there arguing about this too.

I believe a full solution which involves multiple databases is quite common. When more than one database is involved, in many real use cases i have encountered, people tend to achieve data consistency using distributed transaction (XA), aka immediate consistency or in some cases, is even worse that, data consistency is just ignored and it leads to data integrity issue, but from the use cases that i have personally experienced, all actually can be simplified and implemented using Eventual Consistency. Again, take a look at Outbox pattern and how it helps in implementing eventual consistency.

REST vs SOAP

Occasionally, we are still listening to the comment that REST is using JSON and SOAP is using XML. That thought is a fallacy! It is true that SOAP specification is about XML-based protocol, but REST is an architectural approach and is never fixed to be using JSON only. In reality, any message format can be used, including XML in REST.

Also, we have been listening to the idea that we should be using REST over SOAP if we want to develop new services. I guess it is true in certain extent, but the main argument definitely is not because one is faster than the other one, but i would say because it is easier to find developers who are more familiar with REST, than SOAP services. For instance, if a company intends to provide API for its potential customers to connect to, in order to attract more integration, it would be great if they implement it with the technology that, at least trendy and most of the developers are familiar with it, that includes developers from its potential customers. Nonetheless, there is no hard rule guiding this and i think we should not be too obsessed with what API design to use. For instance, Slack Web API is using HTTP RPC-style, but it could be an “outlier” among the most popular internet companies we have heard of. But please do also take note that that, many so-called RESTful APIs are not implementing HATEOAS too. Again, question is whether there is a need for that to implement full Richardson Maturity Model?

For internal modules integration which is not publicly accessible, in my opinion, it does not necessary have to be using REST nor SOAP. For instance, Reactive Programming could be one of the choices. Even if we want to develop “Services”, besides REST and SOAP, we could consider other options such as JSON-RPC, gRPC with protobuf, Apache Thrift and Cap’n Proto. In terms of message serialization, XML and JSON are not always the top choices if we talk about service communication within internal services. There are other options such as Protocol Buffer, FlatBuffers, Apache Avro and so on. When there are a lot of options, it complicates things and we tend to be too obsessed with which option to go for. One rule of thumb could be choosing the one which you think you are comfortable with even though it may not be the best choice.

Languages Choices

Personally, i have no preference on which programming language to use to develop a SaaS/B2B system, but i do think that it is almost inevitable that we need to adopt Polygot Programming. It is fine to use .NET or .NET Core for APIs and its associated components, and not necessary have to go for Golang and Rust, even though these latter two programming languages are almost a fad now. Again, if Checkout.com and Xero also utilize on .NET platform, why many other much smaller companies feel .NET (or .NET Core) is inferior compares to Java, Golang, Rust and so on?

For Kafka implementation, probably Java or Scala is a better choice, and for data science and data analytics, we can always choose Python.

Also to take note that just because Polygot Programming could be a better choice in most software development, it does not mean we have to use “as many programming languages as we can”. Support and maintenance and ease of recruiting developers to enhance and maintain the system are needed to be taken into consideration.

Summary

Again, software architectural choices are very subjective and i believe that there is no single design that fits all the scenario and also all sizes of the system that we want to build. Many parties also try out Docker, Kubernetes and Service Mesh with Microservices design in production, even though the complexity of their system does not warrant such technical choices.

Please, keep it simple, keep it easily maintainable as well as easy to comprehend. Also, give some thoughts on scalability to 2x to 3x growth, but not 100x growth which does not seem to be realistic in most nature of the applications which also deemed as YAGNI.

--

--