How to build a serverless web portal in AWS
Leveraging dbt documentation for data lineage, debugging, and communication
Recently, we introduced dbt (data build tool) at one of our clients and discovered the value of dbt docs. These auto generated docs provide a clear data lineage, which is incredibly useful for debugging and communication.
These docs are in HTML and JSON format that work together to give you a nice interactable image. This image is suitable for rendering in a browser but not easily shareable via tools like Confluence or Teams. To address this, we decided to create a web portal where these files could be hosted securely. Additionally, we included other HTML documentation, creating a centralized repository accessible through a landing page with links to various subfolders. This approach ensures that authorized team members can access the information conveniently.
Let’s get started: Amazon S3 and Cloudfront
Firstly, let’s discuss where we obtain the dbt docs. These docs are generated by running the “dbt docs generate” command. To streamline this process, we added it to the end of your CI/CD pipeline whenever a new feature branch merges into the main branch. Once generated, you can store these files in an S3 bucket. Since they are static files, this simple hosting solution suffices.
Now, let’s address the need for a domain pointing to the bucket, allowing secure access over HTTPS without delving into the technical intricacies. For this purpose, we can leverage CloudFront, Amazon Web Services’ content delivery network (CDN). With CloudFront in place, we have a straightforward way to access the dbt docs centrally. Sounds easy, right?
However, there’s one crucial aspect missing: authentication. We don’t want these files to be publicly accessible; only authorized individuals should have access. Fortunately, AWS offers solutions for this as well, albeit with some nuances.
Amazon Cognito
AWS provides a service called Amazon Cognito, which theoretically allows us to add authentication to our solution. Cognito offers a user-friendly login screen and handles password management. Here’s how it works. We create a Cognito User Pool and add all the authorized users who should have access to the web portal. When users are added, they receive an email with a temporary password. Upon their first login through the Cognito UI, they’ll be prompted to change this password. However, implementing Cognito to meet our specific needs isn’t straightforward. We can’t simply link Cognito with CloudFront and expect everything to work seamlessly out of the box. Instead, we need to incorporate custom logic using Lambda functions, API Gateway, and DynamoDB as secondary services in the background. While the official documentation might not provide a clear path, I found a helpful blog post that explains how to achieve this integration. Even though it’s a bit outdated, I used it as a foundation to build upon.
In the following section, I’ll delve into how all these services work together to achieve the desired solution.
The solution: how it works
Diagram Overview:
The diagram illustrates the integration of key services — S3, CloudFront, and Cognito — within a system. These services act as the core components, while additional elements serve as the “glue” that enables seamless interaction.
Initial Access:
A user accesses the CloudFront domain. Since no specific path is provided, the default path directs the user to the Session Checker Lambda function. The Session Checker verifies whether a session cookie exists. If no cookie is found (e.g., during the first visit or after cookie expiration), the user is prompted to log in.
Authentication Process:
The CloudFront domain is called again, this time with the _identity/login path. This redirects the user to the Session Manager Lambda function via API Gateway. The Session Manager further redirects the user to the Cognito UI for email and password-based sign-in. Upon successful authentication, a session is created and stored in a DynamoDB table.
Final Steps:
The user is redirected to the CloudFront domain once more, now using the _identity/auth path along with additional authentication details. API Gateway invokes the Session Manager Lambda, which validates the authentication details. Finally, the user is redirected to the S3 Static site, granting access to the web portal.
Subsequent Visits:
As long as the session cookie remains valid, the user is directly redirected to the S3 site without needing to log in again.
This flow ensures secure access and efficient communication between the services
Conclusion
This integrated solution offers robust security through Cognito, allowing users to authenticate securely using email and password. Meanwhile, CloudFront acts as a global CDN, optimizing performance by distributing content efficiently. With S3 serving as scalable storage for static assets, the hosting solution remains cost-effective. Users interact seamlessly with the CloudFront domain, unaware of the underlying complexity. Transparent redirection between services ensures a smooth experience during login and subsequent visits. Additionally, session data is stored centrally in DynamoDB, enabling efficient monitoring and analysis. In summary, this architecture optimizes security, performance, and user experience, making it an excellent choice for web applications.
If you have a similar use case and require assistance with the setup, feel free to reach out.