Hosting Docusaurus on AWS S3 + CI/CD with Azure DevOps
Background
At work, I came to a scenario where my team needs to create a live documentation website to advocate DevOps and Software Engineering practices across an enterprise. The best solution should be Confluence, but we opt for an open source option. We considered using Azure DevOps Wiki, but the continuous need to enroll readers into the Azure DevOps project is a hassle.
Eventually, it was decided to use a self-hosted WordPress site. In plain sight, it was a decent choice as it complies to the enterprise design language by default and the WYSIWYG editing seemed convenient. However, over time, I realize there are a few downsides:
- It is hard to ensure content quality without proper content review process. Our WordPress content is not integrated to Git, thus we cannot use pull request. We can compare revisions, but it is not user-friendly.
- WYSIWYG allows writers to save their changes frequently. At times we would see tens of revisions, where each changes only one or two lines.
- The design language did not provide a good support for tables and code blocks, thus it looks very ugly.
- The WordPress template has limitations, for instance, there is finite number of levels we can create at the side bar. And it does not support blogging (for future plan) and search features.
Though the above downsides/limitations should be solvable by plugins and customization, but I reckon migrating to a modern JavaScript site could make it more maintainable.
Getting Started with Docusaurus v2
After a quick research, I narrowed down to two options: Docusaurus and Docz. Both are reputable and have similar features. I eventually chose Docusaurus because I think it is easier to use.
To be adventurous, I decided to try out the Docusaurus v2, which is still in 2.0.0-alpha.56
at the time of writing.
To get started, I ran an npx
command that initializes the project.
npx @docusaurus/init@next init [name] classic
After that, yarn start
will run the site on http://localhost:3000 and it looks very nice already.
Fail Attempt to Enable Search
Next up, I want to enable the search using Algolia DocSearch. As the site is expected to run on private network, running the crawl from Docker image is the way to go.
docker run -it --env-file=.env -e "CONFIG=$(cat algolia-config.json | jq -r tostring)" algolia/docsearch-scraper
After a few fail attempts, I realize:
- Algolia DocSearch crawler could not handle port number in site URL. To make it works, we need to host our site at HTTP port (80) or HTTPS port (443).
- Despite Docusaurus CLI has
--port
option, it doesn’t allow port 80. If the port option is set as 80, it will be automatically changed to 1024 instead (or other port if your port 1024 is occupied).
That leaves me no choice but to host the Docusaurus somewhere, instead of running on my local. I decided to use AWS S3.
Hosting on S3
AWS official docs should be the best place to start with. I followed the steps and did the following:
- Created S3 bucket with public access (untick “Block all public access”)
- Enabled website hosting and configured index document (set as
index.html
) - Add bucket policy to make objects publicly readable
To make it even nicer, I went extra mile to configure SSL with AWS Certificate Manager (ACM) then expose it via AWS CloudFront. I mainly followed this freeCodeCamp guide, and here are some notes and mistakes I made:
- In my case, I already have a domain name from AWS Route 53, that makes things simpler. I just need to request a certificate from ACM and configure accordingly.
- When configuring CloudFront, set the Origin Domain Name as
<BUCKET_NAME>.s3-website-<REGION>.amazonaws.com
. Do not choose the S3 option from drop down, that might cause routing problems to your site later. I faced a situation where the homepage is loading fine, and I can navigate to child pages from there; however, if I go directly to a child page, it will return HTTP 403 to me.
Reattempt to Enable Search
With the static site up and running, I reattempted to enable search with a trial Algolia account and a sample config found here. Turned out it is not compatible for Docusaurus! I kept getting nbHits 0
that indicates nothing is indexed.
Eventually I found a correct sample from the config list in algolia/docsearch-configs
repo. Below is a working sample.
{
"index_name": "example",
"start_urls": ["https://www.example.com/docs"],
"selectors": {
"lvl0": {
"selector": ".menu__link--sublist.menu__link--active",
"global": true,
"default_value": "Documentation"
},
"lvl1": "header h1",
"lvl2": "article h2",
"lvl3": "article h3",
"lvl4": "article h4",
"lvl5": "article h5",
"lvl6": "article h6",
"text": "article p, article li"
}
}
Here is how the working search feature looks like.
CI/CD with Azure DevOps
Unlike the WordPress that we have, Docusaurus has markdown support and all its content are versioned on Git.
To maximize the features and provide a better content creation process, I decided to integrate it with CI/CD. It will automatically build and deploy the site whenever new changes are made.
The enterprise is using Azure DevOps, thus it is my go-to CI/CD tool. I used a Node.js with React template for the YAML, then made it a multi-stage pipeline. The build is done by yarn build
, while the deployment is simply an upload to S3 (the new files will replace the old ones). The only thing to note is remember to invalidate CloudFront cache, so that your content will be refreshed upon deployment.
# Node.js with React
# Build a Node.js project that uses React.
# Add steps that analyze code, save build artifacts, deploy, and more:
# https://docs.microsoft.com/azure/devops/pipelines/languages/javascripttrigger:
- masterstages:
- stage: Build
jobs:
- job: Build
pool:
vmImage: "ubuntu-latest"
steps:
- task: NodeTool@0
inputs:
versionSpec: "10.x"
displayName: "Install Node.js"
- script: |
yarn
yarn build
displayName: "yarn and yarn build"
- task: CopyFiles@2
inputs:
SourceFolder: "build/"
Contents: "**"
TargetFolder: "$(Build.ArtifactStagingDirectory)"
- task: PublishBuildArtifacts@1
inputs:
PathtoPublish: "$(Build.ArtifactStagingDirectory)"
ArtifactName: "drop"
publishLocation: "Container"
- stage: Deployment
displayName: Deploy to S3
dependsOn: Build
condition: succeeded()
jobs:
- deployment:
pool:
vmImage: "ubuntu-latest"
environment: dev
variables:
- name: DistributionId
value: "YOUR_DISTRIBUTION_ID"
strategy:
runOnce:
deploy:
steps:
- task: S3Upload@1
inputs:
awsCredentials: "YOUR_AWS_SERVICE_CONNECTION"
regionName: "YOUR_AWS_REGION_NAME"
bucketName: "YOUR_BUCKET_NAME"
sourceFolder: "$(Pipeline.Workspace)/drop/"
globExpressions: "**"
- task: AWSCLI@1
inputs:
awsCredentials: "YOUR_AWS_SERVICE_CONNECTION"
awsCommand: "cloudfront"
awsSubCommand: "create-invalidation"
awsArguments: '--distribution-id $(DistributionId) --paths "/*"'
displayName: "Invalidate CloudFront Cache"
To make it even more perfect, we should include a step to automatically re-crawl the site to update the search index after deployment. I reckon docker run
would not work in Microsoft-hosted agent, the crawler should be running from code base. But I didn’t add this step as part of this exercise.
Final Thoughts
This served as a nice weekend exercise. I learnt how easy it is to bring up a nice documentation site with awesome features, integrate with Algoria DocSearch, and hosting static site on AWS.
The outcome is very satisfying because the rich features of Docusaurus can easily overkill WordPress sites.
- Search with Algolia DocSearch
- Blogging
- Integrated with Git
- Markdown support
- Customizable (built in React)
- Automatically deploy content changes with CI/CD
Thanks for reading.