Hosting Docusaurus on AWS S3 + CI/CD with Azure DevOps

Published in

w:Logs

5 min readJun 8, 2020

Background

At work, I came to a scenario where my team needs to create a live documentation website to advocate DevOps and Software Engineering practices across an enterprise. The best solution should be Confluence, but we opt for an open source option. We considered using Azure DevOps Wiki, but the continuous need to enroll readers into the Azure DevOps project is a hassle.

Eventually, it was decided to use a self-hosted WordPress site. In plain sight, it was a decent choice as it complies to the enterprise design language by default and the WYSIWYG editing seemed convenient. However, over time, I realize there are a few downsides:

It is hard to ensure content quality without proper content review process. Our WordPress content is not integrated to Git, thus we cannot use pull request. We can compare revisions, but it is not user-friendly.
WYSIWYG allows writers to save their changes frequently. At times we would see tens of revisions, where each changes only one or two lines.
The design language did not provide a good support for tables and code blocks, thus it looks very ugly.
The WordPress template has limitations, for instance, there is finite number of levels we can create at the side bar. And it does not support blogging (for future plan) and search features.

Though the above downsides/limitations should be solvable by plugins and customization, but I reckon migrating to a modern JavaScript site could make it more maintainable.

Getting Started with Docusaurus v2

After a quick research, I narrowed down to two options: Docusaurus and Docz. Both are reputable and have similar features. I eventually chose Docusaurus because I think it is easier to use.

To be adventurous, I decided to try out the Docusaurus v2, which is still in 2.0.0-alpha.56 at the time of writing.

To get started, I ran an npx command that initializes the project.

npx @docusaurus/init@next init [name] classic

After that, yarn start will run the site on http://localhost:3000 and it looks very nice already.

The built-in dark mode has a pleasant looking.

Fail Attempt to Enable Search

Next up, I want to enable the search using Algolia DocSearch. As the site is expected to run on private network, running the crawl from Docker image is the way to go.

docker run -it --env-file=.env -e "CONFIG=$(cat algolia-config.json | jq -r tostring)" algolia/docsearch-scraper

After a few fail attempts, I realize:

Algolia DocSearch crawler could not handle port number in site URL. To make it works, we need to host our site at HTTP port (80) or HTTPS port (443).
Despite Docusaurus CLI has --port option, it doesn’t allow port 80. If the port option is set as 80, it will be automatically changed to 1024 instead (or other port if your port 1024 is occupied).

That leaves me no choice but to host the Docusaurus somewhere, instead of running on my local. I decided to use AWS S3.

Hosting on S3

AWS official docs should be the best place to start with. I followed the steps and did the following:

Created S3 bucket with public access (untick “Block all public access”)
Enabled website hosting and configured index document (set as index.html)
Add bucket policy to make objects publicly readable

To make it even nicer, I went extra mile to configure SSL with AWS Certificate Manager (ACM) then expose it via AWS CloudFront. I mainly followed this freeCodeCamp guide, and here are some notes and mistakes I made:

In my case, I already have a domain name from AWS Route 53, that makes things simpler. I just need to request a certificate from ACM and configure accordingly.
When configuring CloudFront, set the Origin Domain Name as <BUCKET_NAME>.s3-website-<REGION>.amazonaws.com. Do not choose the S3 option from drop down, that might cause routing problems to your site later. I faced a situation where the homepage is loading fine, and I can navigate to child pages from there; however, if I go directly to a child page, it will return HTTP 403 to me.

Reattempt to Enable Search

With the static site up and running, I reattempted to enable search with a trial Algolia account and a sample config found here. Turned out it is not compatible for Docusaurus! I kept getting nbHits 0 that indicates nothing is indexed.

Eventually I found a correct sample from the config list in algolia/docsearch-configs repo. Below is a working sample.

{
    "index_name": "example",
    "start_urls": ["https://www.example.com/docs"],
    "selectors": {
        "lvl0": {
            "selector": ".menu__link--sublist.menu__link--active",
            "global": true,
            "default_value": "Documentation"
        },
        "lvl1": "header h1",
        "lvl2": "article h2",
        "lvl3": "article h3",
        "lvl4": "article h4",
        "lvl5": "article h5",
        "lvl6": "article h6",
        "text": "article p, article li"
    }
}

Here is how the working search feature looks like.

CI/CD with Azure DevOps

Unlike the WordPress that we have, Docusaurus has markdown support and all its content are versioned on Git.

To maximize the features and provide a better content creation process, I decided to integrate it with CI/CD. It will automatically build and deploy the site whenever new changes are made.

The enterprise is using Azure DevOps, thus it is my go-to CI/CD tool. I used a Node.js with React template for the YAML, then made it a multi-stage pipeline. The build is done by yarn build, while the deployment is simply an upload to S3 (the new files will replace the old ones). The only thing to note is remember to invalidate CloudFront cache, so that your content will be refreshed upon deployment.

# Node.js with React
# Build a Node.js project that uses React.
# Add steps that analyze code, save build artifacts, deploy, and more:
# https://docs.microsoft.com/azure/devops/pipelines/languages/javascripttrigger:
  - masterstages:
  - stage: Build
    jobs:
      - job: Build
        pool:
          vmImage: "ubuntu-latest"
        steps:
          - task: NodeTool@0
            inputs:
              versionSpec: "10.x"
            displayName: "Install Node.js"
          - script: |
              yarn
              yarn build
            displayName: "yarn and yarn build"
          - task: CopyFiles@2
            inputs:
              SourceFolder: "build/"
              Contents: "**"
              TargetFolder: "$(Build.ArtifactStagingDirectory)"
          - task: PublishBuildArtifacts@1
            inputs:
              PathtoPublish: "$(Build.ArtifactStagingDirectory)"
              ArtifactName: "drop"
              publishLocation: "Container"
  - stage: Deployment
    displayName: Deploy to S3
    dependsOn: Build
    condition: succeeded()
    jobs:
      - deployment:
        pool:
          vmImage: "ubuntu-latest"
        environment: dev
        variables:
          - name: DistributionId
            value: "YOUR_DISTRIBUTION_ID"
        strategy:
          runOnce:
            deploy:
              steps:
                - task: S3Upload@1
                  inputs:
                    awsCredentials: "YOUR_AWS_SERVICE_CONNECTION"
                    regionName: "YOUR_AWS_REGION_NAME"
                    bucketName: "YOUR_BUCKET_NAME"
                    sourceFolder: "$(Pipeline.Workspace)/drop/"
                    globExpressions: "**"
                - task: AWSCLI@1
                  inputs:
                    awsCredentials: "YOUR_AWS_SERVICE_CONNECTION"
                    awsCommand: "cloudfront"
                    awsSubCommand: "create-invalidation"
                    awsArguments: '--distribution-id $(DistributionId) --paths "/*"'
                  displayName: "Invalidate CloudFront Cache"

To make it even more perfect, we should include a step to automatically re-crawl the site to update the search index after deployment. I reckon docker run would not work in Microsoft-hosted agent, the crawler should be running from code base. But I didn’t add this step as part of this exercise.

Final Thoughts

This served as a nice weekend exercise. I learnt how easy it is to bring up a nice documentation site with awesome features, integrate with Algoria DocSearch, and hosting static site on AWS.

The outcome is very satisfying because the rich features of Docusaurus can easily overkill WordPress sites.

Search with Algolia DocSearch
Blogging
Integrated with Git
Markdown support
Customizable (built in React)
Automatically deploy content changes with CI/CD

Thanks for reading.