A SaaS Cloud Structure Blueprint: How we structure our AWS multi-account environment at Gong for machine learning, continuous delivery, and security

Published in

Gong Tech Blog

10 min readDec 9, 2018

Many companies publish their cloud-based system architecture: how their production systems are set up within their cloud environment.

But very little is written about structuring the cloud in terms of multiple environments. In the case of AWS, this breakdown is implemented using separate accounts.

The structure we’ve chosen reflects needs and trade-offs with regard to access control, privacy, reliability, billing, and more. Hopefully, our structure and the thought process behind it can help other SaaS companies, especially ones that practice machine learning and continuous delivery. And as is often the case, setting things up “properly” early in the process typically requires a small effort, while transitioning later is often expensive.

We use Amazon Web Services (AWS) as our cloud provider, but many of the practices described below apply to other cloud providers as well (Microsoft Azure and Google Cloud).

What are AWS accounts?

In AWS, an account is a unit that provides a level of isolation. It’s a “container” of multiple pieces:

It’s a billing unit.
Each AWS account has its own billing reports and statements. (Yet you can also “tag” elements within a single account to break down your billing statement, or roll up multiple accounts for consolidated billing.)

It’s an access control unit.
Each account has its own user list and permissions (a.k.a., IAM, Identity and Access Management). (Yet you can also manage granular access control within an account and let multiple individuals log in to multiple accounts.)

It’s a service unit.
While this often goes unnoticed, each account has its own AWS-driven service limits — i.e., limits on how many resources it may consume of each type (e.g., maximum number of machines of type m5.large). What’s more, some AWS resources within each account (e.g., CloudWatch) have their own limits around usage; requests that exceed the limits are rejected or throttled.

It’s a management, configuration, and metadata unit.
While somewhat obvious, each account has a separate set of management attributes (e.g., contact list), AWS configurations, and AWS metadata (e.g., set of AMIs). This data is isolated between account, and sharing must be explicit.

It’s a container of networking units.

Each account has its own network or set of networks, called Virtual Public Clouds (VPCs), separate from other accounts’ networks. (Yet you can connect multiple accounts using VPC Peering.)

In the following section, we cover how we structure our AWS accounts at Gong. We also highlight some of the considerations for creating separate accounts for different purposes, and the data flow between the accounts.

Serving our Customers: A Production Account

Most companies have a production account (if not more). This is the account that holds the production systems that process “real” data and with which users interact. Strangely enough, in most architectural discussions, people only refer to this account and mostly disregard the others.

Somewhat unsurprisingly, we also have a production account. Here it is, in all its glory:

In this post, we don’t cover the internal architecture of our production system in terms of systems and services (but we might cover it in a future post).

Here’s how we set up this account:

Access Control. Access to this environment is restricted to designated (DevOps) employees, using secure methods (VPN, MFA). So most developers at Gong do not have console access or other access to the machines in the production environment.
Billing. As a SaaS company, our production system is tracked as COGS (Cost of Goods Sold), since it is all about providing service to our customers.
Service Limits. The account is configured to reflect our customers’ usage patterns — i.e., thousands of servers that scale up and down, large storage limits, etc.

Disaster Recovery: A DRP Account

Like any SaaS provider, we back up our data so that in the event of a catastrophe (someone else’s or ours), we can recover lost data.

When we initially set up the system, we backed up the production data to a second region within our Production account.

However, we soon realized that this exposes us to a single-point-of-failure attack: if an attacker were to gain access to our Production account, it could forcefully delete all our data, backup included.

So, we set up a separate account for DRP. Data is backed up to this account — still in another region.

Data is backed up from our Production account to our DRP account

Here’s how we set up this account:

Access Control. This account has separate access controls, to reduce the chances that someone would be able to penetrate both accounts. It needs to be accessed only for data recovery, and only programmatically.
Billing. The DRP account is also tracked as COGS, since it’s part of our service to our customers.
Service Limits. This account has very low service limits. It’s pretty much set to allow data storage.

Internal Systems: A “Gong Internal” Account

Like many modern SaaS companies, we don’t have local IT systems. All our internal systems are either managed by SaaS vendors (e.g., Salesforce CRM or JIRA Bug Tracker) or are hosted in our “internal” cloud account.

Some of the key systems that we host in our “internal” cloud account are our Directory Service, our Build System (based on TeamCity) and our BI (Business Intelligence) System (based on Sisense).

Continuous Integration Process

We use TeamCity as our CI system. Builds and tests run on agent machines organized in a cluster in this account. The TeamCity system, also located in this account, builds the artifacts based on the latest source code and pushes the new binaries into an S3 bucket.

To deploy new code, code that runs on separate TeamCity agents located in the production account pull binaries from the S3 bucket and place the new artifacts in the various systems.

Business Intelligence (BI)

To let the business access operational data, we push applicable data from our Production account to the internal account and load it to the BI system. To respect our customers’ privacy, we only push non-confidential and non-personal (or anonymized) data. To that end, we’ve applied VPC peering to allow (restricted) data transfer between this account and the Production account.

Here’s how we set up this account:

Access Control. Our internal account has tight security controls, similar to the Production account. But, different team members may be allowed to access it (e.g., IT vs. DevOps).
Billing. Internal systems are tracked as G&A or R&D, as these do not directly relate to serving our customers.
Service Limits. Service limits are set to reflect small — SMB scale — use. (We’re clocking at around 100 people at Gong at the time of this writing.)

Development: A “DevTest” Account

At Gong, core development is done locally: each developer has a docker-based local system that contains instances of relevant systems (e.g., PostgreSQL, MongoDB, Redis, Elasticsearch). We use PaaS functionality sparingly to avoid the need for cloud-based systems during the development process. This way developers can develop and debug (most of) the system locally without continuous access to network in general or to an AWS account specifically.

Despite the above, developers occasionally need to develop and test code that’s specific to the cloud. For example, we have multiple systems that “supervise” clusters of elastic machines, similar to AWS’ Auto-Scale Groups. Thus, they need direct access to a cloud environment.

To that end, we use a DevTest account. This account is open to all developers, and they can use it to test stuff manually or automatically.

Development Data

To supply developers with sample data for development processes, we use the data that we generate using our own team members using Gong (luckily, we are our own customer!). The system exports that data daily from the Production account to an S3 bucket, anonymizing it as needed. Developers import this data and get a small-scale real-life dataset that can be used for development purposes. We also generate artificial datasets based on customer usage patterns that we can load into the local systems.

Test Data

We use artificial test data to support automated tests. Some data resides within the code, but when data is large (e.g., sets of videos), we store it in a cloud account. We use the DevTest account for this.

Here’s how we set up this account:

Access Control. The account is open to most developers, with very few restrictions. Developers have both API-based access and console access. We assume that this account does not have any sensitive (private or confidential) data.
Billing. The account is tracked as R&D.
Service Limits. The account is set up to allow development-scale activity — very little of it. If development-time code would attempt scaling up, it would get rejected by the AWS service limits.

Machine Learning Research: A “Research” Account

Unlike development, which can happen on local machines, machine learning research requires data to optimize models and train the systems. We consider such data, even when anonymized, to be confidential data.

To that end, we set up a Research account, which is used by machine learning experts to develop various models.

Research Data

To make data available in our Research account, we have custom-made export procedures in our Production account. The export procedures slice data, cleansing and anonymizing it as needed. Eventually, this makes the data available as static datasets in an S3 bucket, in an easy-to-digest format. As a result, data that may have been spread across systems or tables in our production environment is consolidated into simple JSON objects (or CSV files) that are easier to process as part of our research.

Data is exported from the Production account into a secure bucket in the Research account

This architecture serves multiple goals:

Researchers only have access to data that they need
Data is sanitized and anonymized before usage
Data is simplified
Data extraction is audited
It supports the “reproducible data science” paradigm that ensures that data remains unchanged when the production systems change

Machine Learning Training and Inference

As we outlined in an earlier post, there are two major types of models for SaaS vendors:

“Global” models. This type of models is trained and optimized once, and works across all of our customers. For example, detecting action items in a phone conversation is independent of the customer and its domain.
“Dynamic” models. This type of models is trained separately for each customer, or subsets thereof (like individual users). In the Gong environment, many of these models are unsupervised, so they do not require labeling of data for each customer anew.

Global models are trained and tested in the Research account. For example, our baseline speech-to-text engine is trained using hundreds of hours of human-transcribed conversations and millions of hours of machine-transcribed conversations. The result is a model that applies to any customer. This model is pushed to the production environment.

Dynamic models are trained and tested in the Production account. For example, customer-specific speech-to-text engines are trained using additional hours of the customers’ machine-transcribed conversations and some additional metadata supplied by the customer.

Similarly, our natural language understanding (NLU) models are also split into global models (like action-item detection in calls or signature analysis in emails) and dynamic models (like reverse-engineering the structure of a sales call or understanding the sentiment of an email).

In both cases, inference logic is run in the production environment and leverages the most applicable model.

Here’s how we set up this account:

Access Control. We consider this account to be as secure as our Production account. But unlike the Production account that’s set up for DevOps access, this account is set up to allow (restricted) access to researchers for both data access and training execution.
Billing. This account is tracked as R&D, since it does not directly support our service.
Service Limits. This account has both data and machines. For example, we use GPU-based systems to train our speech-to-text systems, which are based on deep neural networks, so we have service limits that reflect this behavior.

Exporting and Importing Data: Integrations Account

Part of the Gong service is obtaining data (e.g., conversations recorded in an on-premises recording system) and pushing back data (e.g., analysis data).

While we originally supported this capability in our production account, we realized that:

Customers requested to use AWS-specific integrations options, like granting Gong access to their own AWS S3 buckets. It did not make sense for us to implement such options directly through our Production account.
When customers push data our way via S3 buckets or FTP (processes that we have little control over), it’s beneficial to isolate the data from our production system to increase security (bogus data entering our production environment) and reduce the possibility that such data would affect our production environment’s service limits.

Here’s how we set up this account:

Access Control. We consider this account to be as secure as our Production account. But this account does not need any access by individuals; instead, only programmatic access is available to our applications.
Billing. This account is tracked as COGS, since it directly supports our service.
Service Limits. This account has data and limited services (e.g., FTP). The service limits reflect this behavior.

General Account Configuration

While we’ve focused this post on our multi-account strategy, there are several settings that span all of our accounts: from an access control perspective — SSO, MFA, and general principles (e.g., least privilege); from a security perspective — end-to-end encryption; from a billing standpoint — consolidation, and so on. These settings are not covered in this post; if there’s interest, we may cover them in a future post.

A SaaS Cloud Structure Blueprint: How we structure our AWS multi-account environment at Gong for machine learning, continuous delivery, and security

What are AWS accounts?

Serving our Customers: A Production Account

Disaster Recovery: A DRP Account

Internal Systems: A “Gong Internal” Account

Development: A “DevTest” Account

Machine Learning Research: A “Research” Account

Exporting and Importing Data: Integrations Account

Written by Eilon Reshef