How I Built an AWS App Without Spending a Penny — Other Services

12 min readSep 22, 2023

This is part 6 of a 6-part series. See the previous parts where we built the frontend, backend, and pipeline.

I saved this section for other miscellaneous services I used while building this AWS project. AWS is kind of big.

Trusted Advisor

Trusted Advisor is a free tool you can use to check if what you’re building in AWS follows their well-architected best practices, including cost optimization, performance, security, fault tolerance, and service limits. Under a Basic Support plan, you only have access to the service limits and some security checks. But these other checks aren’t hidden at all! They’re faded, but still visible to all users.

In addition, you can find all the checks at https://docs.aws.amazon.com/awssupport/latest/user/trusted-advisor-check-reference.html. So, you can still browse all the checks for all the services you’re using and see if your account is compliant. This is where I discovered many of the best practices that I tried to apply for this project. The only thing upgrading to a Business or Enterprise plan would do is allow AWS to automatically check these for you.

If you want to get notified whenever a Trusted Advisor check fails, you can use EventBridge and SNS to subscribe to error or warning events from Trusted Advisor. The architecture looks like this:

Architecture diagram for Trusted Advisor alerts

And the CloudFormation template looks like the following:

AWSTemplateFormatVersion: "2010-09-09"
Description: Subscribe to errors and warnings from Trusted Advisor
Parameters:
  Email:
    Type: String
    Description: The email address that will receive SNS notifications for the TA alerts
    # Simple regex from https://stackoverflow.com/a/201378
    AllowedPattern: "(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|\"(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21\\x23-\\x5b\\x5d-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])*\")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21-\\x5a\\x53-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])+)\\])"
Resources:
  EventBridgeRule:
    Type: AWS::Events::Rule
    Properties:
      Description: Push Trusted Advisor notifications to an SNS topic
      EventPattern:
        source:
          - aws.trustedadvisor
        detail-type:
          - Trusted Advisor Check Item Refresh Notification
        detail:
          status:
            - ERROR
            - WARN
      State: ENABLED
      Targets:
        - Id: EmailTarget
          Arn: !Ref SNSTopic
          # Failed events are retried up to 185 times within 24 hours by default
          # This causes false drift detection :(
          # DeadLetterConfig:
          #   Arn: !GetAtt DLQ.Arn
  SNSTopic:
    Type: AWS::SNS::Topic
    Properties:
      # Encrypt using the default SNS SSE key
      # Key aliases: aws kms list-aliases
      KmsMasterKeyId: alias/aws/sns
      Subscription:
        - Endpoint: !Ref Email
          Protocol: email
      TopicName: TrustedAdvisorNotifications
  SNSTopicPolicy:
    Type: AWS::SNS::TopicPolicy
    Properties:
      PolicyDocument:
        Version: "2012-10-17"
        Statement:
          # Allow EventBridge to send email alerts using SNS
          - Sid: AllowEventBridgeServicePrincipalWriteOnly
            Effect: Allow
            Principal:
              Service: events.amazonaws.com
            Action: sns:Publish
            Resource: !Ref SNSTopic
            Condition:
              ArnEquals:
                "aws:SourceArn": !GetAtt EventBridgeRule.Arn
          - Sid: AllowSSLRequestsOnly
            Effect: Deny
            Principal: "*"
            Action: sns:Publish
            Resource: !Ref SNSTopic
            Condition:
              Bool:
                "aws:SecureTransport": false
      Topics:
        - !Ref SNSTopic
  DLQ:
    Type: AWS::SQS::Queue
    DeletionPolicy: Delete
    UpdateReplacePolicy: Delete
    Properties:
      # Defaults:
      # DelaySeconds (delay queue) = 0s (time before messages appear in the queue)
      # MessageRetentionPeriod = 4 days (345,600s) (time messages stay in the queue)
      # VisibilityTimeout = 30s (time before messages reappear in the queue)
      MessageRetentionPeriod: 345600
      ReceiveMessageWaitTimeSeconds: 5 # long polling if > 0
      SqsManagedSseEnabled: true
  DLQPolicy:
    Type: AWS::SQS::QueuePolicy
    Properties:
      # Allow EventBridge to send messages to the DLQ
      PolicyDocument:
        Version: "2012-10-17"
        Statement:
          - Sid: AllowEventBridgeServicePrincipalWriteOnly
            Effect: Allow
            Principal:
              Service: events.amazonaws.com
            Action: sqs:SendMessage
            Resource: !GetAtt DLQ.Arn
            Condition:
              ArnEquals:
                "aws:SourceArn": !GetAtt EventBridgeRule.Arn
          - Sid: AllowSSLRequestsOnly
            Effect: Deny
            Principal: "*"
            Action: "sqs:*"
            Resource: !GetAtt DLQ.Arn
            Condition:
              Bool:
                "aws:SecureTransport": false
      Queues:
        - !Ref DLQ
Outputs:
  DLQURL:
    Description: DLQ URL for EventBridge
    Value: !Ref DLQ

To help generate the EventPattern for Event Rules, you can go to the AWS console and start creating a rule for Trusted Advisor checks. Under Event pattern, set the desired properties and it will display the code in JSON, which can be translated to YAML in the CloudFormation template.

We don’t need a Lambda function for this, so we can set the target directly to SNS. Note that we can’t filter by specific checks without upgrading our Support plan, but this will still work for all the basic checks. Like many of the other services mentioned earlier, you can create a DLQ to catch errors, but I found that this can cause false drift detection, which causes the pipeline to fail. I believe this is a bug, but I can’t open a technical support case under the Basic Support plan. Oh well…

Compute Optimizer

Another free tool AWS provides is Compute Optimizer. It guides users on how to save money and improve performance for EC2 instances, Auto Scaling groups, EBS volumes, Lambda functions, and ECS services on Fargate. You must enable this feature in the AWS console. By default, it analyzes your CloudWatch metrics over the past 14 days to provide recommendations. These standard checks are free, but you can opt-in to enhanced checks to analyze 3 months of data for around 25₵ per resource per month, assuming they’re running for a full month. The only service we used in this project is Lambda and we haven’t spent a penny, so Compute Optimizer isn’t going to give us any recommendations because we’re already being cost effective!

Health

AWS Health (formerly known as AWS Personal Health Dashboard) is a service health page for all AWS services. You can use this to determine if any services are down or under maintenance in a particular region and how it will affect your apps. Like with Trusted Advisor, you can use EventBridge and SNS to subscribe to notifications whenever a health event is posted. The architecture diagram is the same, just replace Trusted Advisor with AWS Health. And the only difference in CloudFormation is the EventBridge rule resource:

EventBridgeRule:
  Type: AWS::Events::Rule
  Properties:
    Description: Push AWS Health notifications to an SNS topic
    EventPattern:
      source:
        - aws.health
      detail-type:
        - AWS Health Event
      detail:
        service:
          # Services to get notified for health alerts
          - CLOUDFORMATION
          - S3
          - CLOUDFRONT
          - APIGATEWAY
          - LAMBDA
          - DYNAMODB
          - IAM
          - CODEBUILD
          - CODEDEPLOY
          - CODEPIPELINE
          - KMS
          - SQS
          - ECR
          - COGNITO
          - ROUTE53
          - CLOUDWATCH
          - XRAY
    State: ENABLED
    Targets:
      - Id: EmailTarget
        Arn: !Ref SNSTopic
        # Failed events are retried up to 185 times within 24 hours by default
        # This causes false drift detection :(
        # DeadLetterConfig:
        #   Arn: !GetAtt DLQ.Arn

I listed all the services that were relevant to my project, but you are free to add or remove as many services as you’d like. It’s all free! (Just make sure to double-check the console first to see how all the services are named under the event pattern. Usually, it’s all caps with no spaces or special characters.)

Well-Architected Tool

If you want another tool to check if you’re meeting AWS’s Well-Architected Framework (or are trying to study for the Solutions Architect exam), you can use, well, the AWS Well-Architected Tool. This is a free questionnaire for analyzing your app’s architecture. You create a workload and then answer questions that align with the 6 pillars of the Well-Architected Framework: operational excellence (are we monitoring our apps?), security (are our apps secure?), reliability (can our apps recover from disasters?), performance efficiency (are we optimizing performance on AWS?), cost optimization (we know this one 😉), and sustainability (we’re not burning fossil fuels, right?). You can even upload your architecture diagram for reference. Once you finish answering all the questions, you’ll get a list of high and medium risks for each pillar. Keep in mind that this tool is fully automated. No humans are reading your responses.

Admittedly, this tool can be much for a personal project. A lot of the questions revolve around things like teams, ticketing, monitoring, and security audits. It seems more tailored for someone higher up in a technical organization, like a director or a tech lead. If you’re on an Enterprise Support plan, you’ll have access to a Technical Account Manager (TAM) for architectural guidance from a human. (Tip: Discovery support allows the Well-Architected Tool to incorporate Trusted Advisor checks into its analysis. But I don’t recommend enabling this if your support plan doesn’t cover all the Trusted Advisor alerts. You will get a bunch of failed API logs in CloudTrail. For some reason, I couldn’t turn off this setting in the console or the CLI but was able to do so via boto3. Then I was able to delete the corresponding AWSServiceRoleForWellArchitected_Discovery role.)

Route 53 & CloudWatch

I’m combining both Route 53 and CloudWatch since they’re both used to monitor health checks for your apps and APIs. Although Route 53 is more commonly used as a DNS service, it can also serve as a health checker for any endpoint. And CloudWatch is used to monitor that health check to see if it’s unhealthy. CloudWatch can automatically monitor existing resources like CloudFront, API Gateway, and Lambda. Standard metrics from automatic dashboards are provided for you at no charge. You can browse these in the console by going to CloudWatch dashboards and clicking on the Automatic dashboards tab. We can create health checks for CloudFront on both the front end and back end, then tie it with a CloudWatch alarm that sends an SNS notification whenever the site or API goes down. On both templates, we can add the following to enable health checks:

Parameters:
  Email:
    Type: String
    Description: The email address that will receive SNS notifications for the health check alarm
    # Simple regex from https://stackoverflow.com/a/201378
    AllowedPattern: "(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|\"(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21\\x23-\\x5b\\x5d-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])*\")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21-\\x5a\\x53-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])+)\\])"
Resources:  
  # Route 53
  HealthCheck:
    Type: AWS::Route53::HealthCheck
    Properties:
      HealthCheckConfig:
        # 3 health checks every 30s by default
        # Optional features: HTTPS, STR_MATCH, RequestInterval < 30, MeasureLatency
        Type: HTTP
        FullyQualifiedDomainName: !GetAtt CloudFrontDistribution.DomainName
        Port: 80
        ResourcePath: / # for the microservices, replace with /health
  # CloudWatch
  HealthCheckAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      AlarmDescription: Check the health of the AWS Shop site and send an email whenever it's unhealthy # update for the microservices
      # In Route 53, monitor the health check status in the past 5 minutes (either 0 or 1).
      # If this unitless value is < 1 at any point, transition to an ALARM state.
      # https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/aws-services-cloudwatch-metrics.html
      Namespace: AWS/Route53
      MetricName: HealthCheckStatus
      Dimensions:
        - Name: HealthCheckId
          Value: !Ref HealthCheck
      Period: 300 # 5 minutes
      EvaluationPeriods: 1
      Statistic: Minimum
      ComparisonOperator: LessThanThreshold
      Threshold: 1
      Unit: None
      AlarmActions:
        - !Ref SNSTopic
  # SNS
  SNSTopic:
    Type: AWS::SNS::Topic
    Properties:
      # Encrypt using the default SNS SSE key
      # Key aliases: aws kms list-aliases
      KmsMasterKeyId: alias/aws/sns
      Subscription:
        - Endpoint: !Ref Email
          Protocol: email
  SNSTopicPolicy:
    Type: AWS::SNS::TopicPolicy
    Properties:
      PolicyDocument:
        Version: "2012-10-17"
        Statement:
          - Sid: AllowSSLRequestsOnly
            Effect: Deny
            Principal: "*"
            Action: sns:Publish
            Resource: !Ref SNSTopic
            Condition:
              Bool:
                "aws:SecureTransport": false
      Topics:
        - !Ref SNSTopic

(Make sure to modify the Lambda function for the microservice to check for the “GET /health” route key, then return a 200 status.)

It’s very important to keep the free tier in mind. We can only have up to 50 health checks and 10 alarms within the free tier. Most importantly, the health check must not include any enhanced features, such as HTTPS, string matching, more frequent health checks (beyond every 30 seconds), and latency measurements. This is where I initially messed up. I used HTTPS and was greeted with a $1/month bill. Thankfully, the billing alarm that I created at the very beginning alerted me of this, so I only owed 8₵ over a few days (which got waived because the bill was so small). So, make sure to only do health checks on HTTP (port 80). The health check will still pass since it considers a 2xx or 3xx response to be successful. In this case, it will return a 301 response since CloudFront redirects all HTTP requests to HTTPS. The downside is that the health check can’t determine if the actual website is down since Route 53 doesn’t follow redirects.

To create the CloudWatch alarm, like with EventBridge, it’s easier if you start creating it in the console first. You get a nice graph to visualize what kind of metric you want to monitor. In this case, we want to check the HealthCheckStatus metric provided by Route 53. It reports 1 if the endpoint is healthy and 0 if it’s unhealthy. So, we need to trigger the alarm whenever this metric is 0. The way to set this up in CloudWatch can admittedly be confusing, but I tried to explain in the comments how to read out each property in the template. Since CloudWatch metrics are recorded over a period (in this case 5 minutes), we need to check if the minimum value at any point is < 1. This way, even if the route goes down briefly, we’ll still get notified since the minimum won’t be 1. The alarm action is tied to an SNS topic that will send us an email notification. The last piece of this metric is the HealthCheckId. This isn’t a standard property found on CloudFormation. It’s a dimension, which uniquely identifies a metric using a resource-specific value. This ID is important to reference the correct health check from Route 53. (We don’t want to use the AlarmIdentifier property in CloudFormation for Route 53. Basic health checks can’t have a region associated with them.)

Once everything is set up, you should receive an email confirming your SNS subscription and, hopefully, your Route 53 health checks report as healthy. But we’re not done with the free tier yet. Keep in mind what this health check is doing. Route 53 is pinging CloudFront every 30 seconds using the endpoint specified to see if it returns a 200 response. (If we assume CloudFront is caching the data, we don’t need to worry about any additional pings to S3 or API Gateway.) It’s also doing health checks across multiple regions to make sure the resource isn’t down due to a specific region. I kept my health checks simple by just returning an OK response, but you can add additional checks if the system is more complicated. Regardless, this is an API call made regularly to CloudFront. Remember when I commented out the logging property of the CloudFront distribution? If this was enabled, every health check would be logged to S3. Now CloudFront will aggregate all the logs every 15 minutes, but S3 objects are still being created, which requires making a PUT request. S3 will charge 1₵ for every 25K GET, SELECT, and other requests per month, but also for every 2K PUT, COPY, POST, & LIST requests per month. Let’s do the math. If this health check is running 24/7 every month, this will result in 4 calls/hour * 24 hours/day * 31 days/month (accounting for the longest months) = 2976 PUT requests/month. (Note that DELETE requests count in the 25K category, such as if we included a lifecycle rule that deletes objects after 1 day, so they’re not counted here.) We will have already exceeded our budget 2/3 into the month. We unfortunately can’t make the health checks less frequent to stay under the 2K limit. We can only do every 30s or every 10s.

This is also where I initially messed up. I used S3 Storage Lens to track the total storage and number of objects across all my S3 buckets. And soon enough, I found that the culprit was these CloudFront logs. But wait, I added a lifecycle rule like all the other buckets. Shouldn’t all the objects be deleted after 1 day? Not exactly. I originally had Object Lock set to 30 days, and this overrides the lifecycle rule to make sure all objects follow this retention policy. Even if I change the Object Lock settings to 1 day, the existing objects aren’t affected. So, I was stuck with thousands of log objects for a month until the Object Lock expired. Then the lifecycle rule took effect. Make sure to enable “Show versions” when browsing an S3 bucket to see if any delete markers are still present, since those still take up storage space. Now with those issues resolved, we can keep the health checks for the API and website, just without any CloudFront logs.

Conclusion

Phew! That was a lot. Any longer and I would’ve written a whole thesis! But I hope this inspires you to create something in AWS. I was intimidated at the beginning by the huge plethora of services. But as I showed, there’s plenty you can do in AWS under the free tier, especially with serverless tools. And with the right protection and automation, you can get alerted if you go over your budget and act immediately. Even with how long this article has been, it’s best to play around with AWS yourself to fully learn its services. There are plenty of gotchas in the world of AWS that catch newcomers off-guard, but as they say, you learn through failure. You can check out the GitHub repo at https://github.com/Abhiek187/aws-shop if you want to browse the full source code. Thanks for reading!

P.S. I added a new part to this series all about user authentication using Cognito. Please check it out here.