Configuration management for serverless AWS applications

Published in

SMG Real Estate Engineering Blog

14 min readJan 22, 2019

Credit: https://unsplash.com/photos/_o6AAx9dl_Y

Introduction

Configuration changes may cause system outages or instability. In fact, according to Russ Miles those changes are one of the most common reasons for production outages. Therefore as a central part of the systems, configuration must cover requirements such as versioning (so that previous values can be restored or rolled back), auditing parameter changes, encrypting sensitive values (such as passwords), and, above all, access control (so that only authorized users are allowed to make changes).
Apart from versioned configuration files, which are local to a particular service and integrated into the application, there are many frameworks and tools that support proper configuration management. While deploying and managing these services by yourself when moving to the cloud is possible, management and operation of those (patching, guaranteeing the High Availability, stability, and short response times, etc.), is exactly what serverless applications are NOT about. Rather than fiddling around with things, which does not make your business unique or better than other businesses, you should concentrate on software providing added value to users. The serverless paradigm is, above all, about fast delivery and it is in this paradigm that AWS shines with all its great managed services. Among these is a configuration service called AWS Systems Manager (SSM) Parameter Store, which is part of a “Systems Manager” that helps automate management tasks and includes a great service called “Secrets Manager” (which we will look at later). Please note that all examples are provided in Node.js, but the same ideas apply to other languages.

Environment Variables

Before jumping into the usage of the “Parameter Store”, know that embedding the configuration value into the Lambda’s environment variable is very often sufficient if the value is used in only one place. Environment variables are defined by CloudFormation and can be accessed via ‘process.env’ Node environment variable (or ‘os.environ’ in Python), as shown in the example below.

EnvVarConfigFunction:
    Type: AWS::Serverless::Function
    Properties:
        Environment:
            Variables:
                MY_PARAM_USED_BY_ONE_FUNCTION: 'MY_PARAM_USED_BY_ONE_FUNCTION'

For a complete example see https://github.com/marcinzasepa/serverless-configuration/blob/master/env-var-config/app.js.
All environment variables are encrypted by default with standard CMK (Customer Master Key) at no additional cost and decrypted during function invocation. You can specify your own custom CMK for decryption/encryption using the KmsKeyArn property on the Function resource. If you do, every invocation to KMS (Key Management Service) is billed per request. Whenever environment variables are used, encryption takes place after deployment, so this approach is not recommended for storing sensitive information (such as passwords). Sensitive information should not be committed to a repository and exposed during the deployment process. Instead of this, encrypting sensitive data before deployment is strongly recommended. It can be encrypted with environment variables and some manual steps. The manual steps are necessary because CloudFormation does not currently support so-called encryption helpers, which are available when you use the AWS Console. You may be interested in configuring encryption via the console for testing or a private project. For instructions, see https://docs.aws.amazon.com/lambda/latest/dg/tutorial-env_console.html. Getting back to CloudFormation: first, you must encrypt sensitive data with the selected CMK (you can use the CloudFormation template provider in the GitHub repo to deploy a custom CMK https://github.com/marcinzasepa/serverlessconfiguration/blob/master/encryption-key-template.yaml). Management of the custom CMK costs $1 per month per key. Once you have a key id or its alias (see stack outputs), execute the following at the command line to encrypt your sensitive data with the newly created CMK:

aws kms encrypt --key-id alias/EncryptionKeyForEnvVariables --plaintext "SENSITIVE_DATA_STRING"

The output should be similar to the following:

{
“CiphertextBlob”:  “AQICAHgOsiIFdEQn0OiMlGPHXb8++AsCf+oEvvFwQRZHMtSsCAEnQXoFC/Wly7DmIvPcw841AAAAczBxBgkqhkiG9w0BBwagZDBiAgEAMF0GCSqGSIb3DQEHATAeBglghkgBZQMEAS4wEQQMk6vAgEQgDDaRGyUmWZFyYgxkUji2/qM1Wk69oxdWobArZBjCsiZM1qfxexJ5yFCLAulZVJXEBg=“,
 “KeyId”: “arn:aws:kms:eu-central-1:<AccountNr>:key/3424da05-a72b-4c58-ba83–026629d9a4fd”
}

You can copy and paste the value of the CiphertextBlob property as a value of the environment variable. To decrypt the sensitive data, you can use the SDK to call the KMS service as shown in the following example:

const AWS = require('aws-sdk');
const kms = new AWS.KMS();
  const sensitiveValueBlob = Buffer.from(process.env.OTHER_SENSITIVE_VAR, 'base64');
  decryptedValue = await kms.decrypt({CiphertextBlob: sensitiveValueBlob}).promise();

For the full example see https://github.com/marcinzasepa/serverless-configuration/blob/master/encrypted-env-var-config/app.js Please note that the KMS service, like most AWS services, has requests limits. Depending on your region, the limit for the ‘decrypt’ operation in the example above is 5500 (or 10,000) requests per seconds. This limit is shared with other KMS operations, such as Encrypt and GenerateDataKey. (For more information, see https://docs.aws.amazon.com/kms/latest/developerguide/limits.html#requests-per-second-table). Moreover, KMS requests are not free. There is a free tier of 20,000 requests/month and $0.03 for every 10,000 requests beyond the first 20,000. (For more information, see https://aws.amazon.com/kms/pricing/). Because of the pricing, you may consider caching decrypted values for the lifetime of the Lambda container, storing them in global variables.

Characteristics of the SSM

If the configuration parameter must be shared by multiple Lambdas, the Parameter Store (SSM) is recommended to avoid duplication and simplify management. The Parameter Store supports 3 types of variables: strings, numbers, and a secure string that is an encrypted version of the regular string type. The Key Management Service (KMS) is used for encryption and decryption; it allows you to either use the default KMS key or specify a custom Customer Master Key (CMK). Every parameter is versioned, and the history contains modification dates and IAM/the user who made modifications.
You can define configuration parameter values via the AWS console, awscli, or submit them to CloudFormation. The latter option allows you to submit dynamic values, such as the Arns of created resources and the URLs of newly deployed API gateways. Like all AWS services, the Parameter Store allows very fine-grained access and seamless integration with CloudTrail, which helps you understand changes and detect unwanted modifications. On top of that, the Parameter Store is provided by AWS at no additional cost; however, the lack of charge introduces some limitations, which will be discussed later. Configuration parameters can be organized in a hierarchy, which might be useful for grouping the config values i.e., per environment TEST/PROD or per service Service-A/Service-B. SSM Parameter values can be submitted either via Cloudformation (see: https://github.com/marcinzasepa/serverless-configuration/blob/master/encryption-key-template.yaml) or using AWS CLI:

aws ssm put-parameter --name '/service-A/externalSystem/username' --value 'ANY_USERNAME' --type String 
aws ssm put-parameter --name '/service-A/externalSystem/password' --value 'ANY_PASSWORD' --type SecureString

The results of this input are the parameters saved in ‘Parameter Store ’like shown below:

/service-A/externalSystem/username
/service-A/externalSystem/password

Depending on requirements, changes to parameters’ values must be reflected either at deployment (build time) or during run time. Let us evaluate the first option.

Embedding configuration parameters at build time (Reference Param in CloudFormation)

CloudFormation provides a syntax for reading the parameters’ values directly from the Parameter Store:

Environment:
   Variables:
     MY_PARAM_FROM_SSM: !Sub '{{resolve:ssm:/serviceA/anyConfigValue:1}}'

The value referenced in the {{resolve:ssm:/serviceA/anyConfigValue:1}} will be replaced with the value from the store automatically at build time. Notice that the parameter version has to be defined explicitly. Moreover, there is no option to use SSM labels (for more information about labelling, see this great post: https://aws.amazon.com/blogs/mt/use-parameter-labels-for-easy-configuration-update-across-environments/) The main reason for that is to force you to refer to the value you really know, a value that cannot be changed (labels can be moved from one version to another), so that you avoid an unexpected impact on your production after a new deployment. Moreover, this makes it much easier to rollback the state of the system, including values of the config parameters, if there are any problems after deployment. If you used the ‘latest’ alias, you would have to rollback CloudFormation and the config parameters values separately.
Although referring to a fixed version makes sense in many scenarios, sometimes it may complicate its usage. Consider the following example:

/service-A/externalSystem/username
/service-A/externalSystem/password

When deploying multiple AWS accounts (which is very common), you can use the same configuration parameter structure/hierarchy for both testing and production, by simply adding {{resolve:ssm:/serviceA/externalSystem/username:1}} in the CloudFormation template. Now, if the value of a parameter is changed in either production or testing, these environments will have different versions of the parameter. To keep the CloudFormation simple and skip the different parameter versions associated with different environments, you have to “touch” the parameter on all other stages. The possibility to use the parameter’s labelling feature would solve this problem (the downsides have been mentioned before).
As already mentioned: to manage sensitive information, configuration values must be encrypted, so you should use a parameter of type secure string. These parameters can be referred to in CloudFormation as well; however, not all resources are currently supported. For the list see https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/dynamic-references.html#template-parameters-dynamic-patterns-resources. As you may have noticed, there is no Lambda among them, which means that referencing a secure string via Lambda’s environment variables is impossible.
Luckily, there is an alternative. You can use another service provided by AWS called “Secrets Manager,” which is dedicated to storing secrets, rotating them, and much more. To create a secret via the AWS CLI, issue the following command:

aws secretsmanager create-secret --name secretValue --secret-string ConfidentialPassword

Then the secret can be referred to in CloudFormation, as shown in the example below:

Environment:
  Variables: MY_PARAM_FROM_SECRET_MANAGER: !Sub '{{resolve:secretsmanager:secretValue}}'

For the full example see CloudFormation.yaml and lambda code.
The approach described above requires redeployment of the whole stack. This approach should be sufficient in many cases. Changes to configuration ‘on the fly’ should be always considered very carefully as those changes may silently break the system. Very often short deployment times of services can guarantee fast propagation of changes to the configuration values, significantly reducing the risk of propagating a configuration that breaks the system. Every change must be propagated through the pipeline and thus tested.
The necessity of changing the configuration at run time to reflect config changes as quickly as possible is one reason. Another reason might be the complexity of the system and a large number of microservices, which makes it difficult to figure out which services should be redeployed to reflect the change of the particular parameter. Some widely used configuration parameters may even require redeploying the entire system. In such cases, changing the value at run time is a lot easier and thus recommended. This is where fetching the parameters from the Parameter Store comes into play.

Programmatic fetching of parameters

The AWS SDK offers a whole different set of methods for interacting with the Parameter Store. To fetch a single parameter, you can use the getParameter function. Moreover, you can fetch the entire parameter hierarchy at once:

/service-A/externalSystem/username
/service-A/externalSystem/password

You can get both values for ‘externalSystem’ via getParametersByPath as shown below:

const result = await ssm.getParametersByPath({Path: '/serviceA/externalSystem', Recursive: true, WithDecryption: false}).promise();

The methods ‘getParameter’ and ‘getParametersByPath’ both support the ‘WithDecryption’ parameter, which is relevant for secure strings. When ‘WithDecryption’ is true, it decrypts the secure string’s value. For the full example, see https://github.com/marcinzasepa/serverless-configuration/blob/master/ssm-run-time-config/app.js.

When the request reaches the Lambda, the service fetches the parameters from the store and provides them to the application. You can also access ‘Secrets Manager’ secrets consistently by using the Parameter Store API, via the ‘/aws/reference/secretsmanager/’ prefix:

const parameterFromSecretManager = await ssm.getParameter({          Name:'/aws/reference/secretsmanager/secretValue',          WithDecryption: true }).promise();

In this case, permission for both ‘ssm:GetParameter‘ and ‘secretsmanager:GetSecretValue’ are required. For a full example see https://github.com/marcinzasepa/serverless-configuration/blob/master/template.yaml#L76). Please note that the ‘WithDecryption’ flag must be true when you fetch from the ‘Secrets Manager’.
This approach of reading configuration parameters at run time differs substantially from the methods described earlier in this article.

Low response time from SSM Parameter Store

High response time from SSM Parameter Store

Pros:

Changes are reflected during run time and regardless of where a particular parameter is referred.
You can fetch decrypted values, so there is no problem with referring secure strings in Lambda, as there is with CloudFormation and with ssm-secure.
You can skip the parameter version or the label and always fetch the latest version.
You can use any other custom label and move, for example, the ‘Current’ label around from one version to reflect the current parameter value.

Cons:

The request to the Parameter Store is an HTTP request and introduces latency during the Lambda execution (see the pictures below)

Increased time for user-facing lambdas negatively impacts the business (Amazon found that every 100ms of latency cost them 1% in sales).
Increased time increases the costs of your Lambda’s execution. Even though it may seem irrelevant (we will soon investigate the amount of time required to fetch the params), it may have a bigger impact because Lambdas are billed on 100ms resolution (for more information, see the great Yan Cui article https://blog.binaris.com/lambda-pricing-pitfalls/).
When you use ‘getParameterByPath’, you may have to call the Parameter Store multiple times if the hierarchy contains many parameters, as the results are paginated. At the time of writing, the maximum number of parameters fetched in one request (page size) is 10, though this information is undocumented. Moreover, even if there are less than 10 parameters, the Parameter Store does not guarantee that they will all be returned in one request: See documentation: Request results are returned on a best-effort basis. If you specify ‘MaxResults’ in the request, the response will include information up to the limit specified. The number of items returned, however, can be between zero and the value of ‘MaxResults.’ https://docs.aws.amazon.com/systems-manager/latest/APIReference/API_GetParametersByPath.html
It may decrease the availability of systems that rely on the Parameter Store, and other services will become unstable if the Parameter Store becomes unstable.

Because the Parameter Store Service is offered free of charge, it has requests limits much lower than other services. Moreover, those limits are not documented. Reaching these limits results in the exception ‘ThrotllingException :RateExceeded’.

ThrottlingException when hitting the SSM Parameter Store Request limits

To find out the limits and check the latency introduced by the Parameter Store, we will activate X-Ray, an AWS service for tracing and profiling requests and simulate a load that reaches the limits.

The behavior shown in the screenshot suggests that the requests are throttled using the ‘Token Bucket’ algorithm, with a refill rate of 30–40 requests per second. Hitting these limits (even with a small number of events, i.e., HTTP requests or SNS/SQS/Stream events) is very easy, especially if many different Lambdas are running in parallel (i.e., there are many state machines with parallel steps). Many of the AWS limits (e.g., concurrent Lambda executions, step function refill rate, and step functions bucket size) are soft and can be adjusted. In the case of the Parameter Store, asking AWS to increase the limits didn’t help, and the request was rejected.

“We currently have limits on Parameter Store APIs (just like any other AWS service) primarily to prevent abuse. Parameter Store is currently available at no additional charge to customers. There are implications of raising the current API limits including service costs, preventing service abuse, and operational load to the service. However, we do realize that there are use cases where customers may want higher Parameter Store API limits.”

The ‘GetParameter’ limit increases are considered on a case-by-case basis, and their approval is not guaranteed because that may affect other customers. To have a request considered, exponential backoff should be implemented. AWS recommends implementing backoff while accessing the Parameter Store and being throttled. This is a valid proposal; however, keep in mind that, whereas state machine retry logic can be easily used (see retry https://docs.aws.amazon.com/step-functions/latest/dg/amazon-states-language-errors.html) in Lambda, outside of the state machine, it is necessary to block the request, wait, then pay for the additional execution time, which introduces latency and increases the costs. Therefore, having backoff in place to increase the system’s resiliency is definitely a good idea, but you have to make sure that your system will not hit this limit in normal situation often.

If the requirement is to fetch only sensitive data at run time and make use of, for example, automatic password rotation, then directly using the ‘Secrets Manager’ API instead of the Parameter Store can help: the ‘Secrets Manager’ has significantly higher request limits because it is a paid service. The limit for the ‘GetSecretValue’ operation (which is used to fetch data) is 700 requests per second, which provides much a higher throughput than the 40 requests per second offered by the Parameter Store. The ‘Secrets Manager’s pricing structure has two tiers: Price Pro Secret, which costs $0.40 per month per secret, and API Request, which runs $0.05 per 10,000 API calls (https://aws.amazon.com/secrets-manager/pricing/). How to fetch secrets from the ‘Secrets Manager’ is shown in the example below:

var secretsmanager = new AWS.SecretsManager();
const = await secretsmanager.getSecretValue({ SecretId: 'secretValue'}).promise();

For a full example see https://github.com/marcinzasepa/serverless-configuration/tree/master/secretmanager-run-time-config

Regardless of whether parameters are from the ‘Parameter Store’ or the ‘Secrets Manager’, caching them and avoiding fetching them at every request is recommended (https://docs.aws.amazon.com/secretsmanager/latest/userguide/manage_retrieve-secret.html#use-client-side-caching-components)

Update 30.04.2019
Now Paramter Store is able to support much higher API Throughput https://aws.amazon.com/about-aws/whats-new/2019/04/aws_systems_manager_now_supports_use_of_parameter_store_at_higher_api_throughput/ . For the you have to configure it and it is paid, but you get an Throughput which should be enough for most of the use-cases.

Caching and optimization

To reduce the number of requests to the ‘Parameter Store’ and ‘Secrets Manager’ (and thus reduce latency), reduce costs, and increase availability, caching the values is recommended. Caching will allow the values to be reused, in subsequent requests processed by ‘warm’ Lambda. Please note that the cache can be used only by a particular instance of the Lambda, meaning the cache must be initialized if new instances of the lambda are spawn (scaling). Initialization results in Parameter Store requests. This might lead to unremarkable caching benefits as many new containers spawn, especially during times of peak traffic (many requests). Moreover, caching for the Lambda instance’s entire lifetime leads to longer propagation times for changes because the changes are reflected only after the Lambda instance has been taken down. For this reason, implementing cache expiration logic is recommended. Such an implementation provides better application control and information about how long the parameters are stored and the maximum amount of time before the changed configuration value will get entirely propagated. As previously stated, if the short time periods required for configuration change propagation is the reason for the programmatic approach to fetching parameters, aggressively caching those values may not be a good idea.
As caching logic is not specific to a particular service, caching can be removed from the service’s implementation and added to the middleware. There is nice middleware implementation for Node.js called “middy” (https://www.npmjs.com/package/middy), which supports parameter fetching and caching out of the box. For the Python implementation, there is a great library written by Alex Casalboni (https://github.com/alexcasalboni/ssm-cache-python). Here is what the Node.js implementation looks like:

const FIVE_MINUTES_IN_MS = 300000;let ssmMiddleware = ssm({
    cache: true,
    cacheExpiryInMillis: FIVE_MINUTES_IN_MS,
    paths: {
        SERVICE_A: '/serviceA'
    },
    names: {
        OTHER_PARAM_BY_NAME: '/paramByName'
    },
    setToContext: true
});exports.lambdaHandler = middy(unwrappedHandler)
    .use(ssmMiddleware);

For the full example see https://github.com/marcinzasepa/serverless-configuration/tree/master/ssm-run-time-config-cached.
This implementation greatly reduces the number of requests and the latency introduced by the Parameter Store. However, as mentioned before, it does not eliminate the risk of hitting the limits and getting throttled, which depends on the traffic/request ratio.

Summary

That’s it. We covered several configuration options for services in AWS. To get the code including cloudformation with policies and roles configured for every example mentioned in this post, have a look at https://github.com/marcinzasepa/serverless-configuration. Hopefully, this article will help you make the right decision and avoid time-intensive evaluations. If you like it, give me a clap and I will write another blog on configuration in multi-account environment!