Painful experience with Next13 deployment in GCP
I wrote this story shortly after we were able to move forward on deploying a React application in Google Cloud Platform (GCP). Since I am just a web developer, infrastructure is truly something horrible to me. However, in some unexpected cases, we also have to involving in checking where the problem of failed deployment comes from. Therefore, I would like to save it as my memorable notes for the later use.
First of all, let’s see what do we have:
- 2 applications which are built in Next 13 (for Media and Management side)
- Google Cloud Platform (GCP)
- Terraform
- Github Action
I. Deployment Setup
- Github Actions configuration
- As far as I have researched, we can simply understand that there are two main components of Github Actions: Event and Workflow.
***Brief explanations:
- name: the name of the workflow as it will appear in the “Actions” tab
- on: used to define which events can cause the workflow to run. In our project, we use label event.
When you create a pull request, you will see it on the right side of the screen.
- jobs: used to group a list of jobs which are run in one workflow.
***There are two ways to name a job:
If you don’t specify the name explicitly, it will use the name below “jobs” as a default
jobs:
terraform-apply:
runs-on: ubuntu-latest
timeout-minutes: 15
On the other hand, it will use the name in the configuration.
jobs:
deploy:
name: deploy dev
runs-on: ubuntu-latest
timeout-minutes: 15
- steps: used to group all the actions taken in one job.
There are two main keywords: uses and run.
To set it up, we will need a YAML file like below:
name: dev
on:
pull_request:
types:
- labeled
paths-ignore:
- "**.md"
- ".gitignore"
branches:
- main
jobs:
deploy-dev:
runs-on: ubuntu-latest
environment:
name: dev
timeout-minutes: 20
if: |
( github.event.action == 'labeled' && github.event.label.name == 'dev' ) |
steps:
- name: <the name will be displayed in Action log)
uses: <check out a repository or an image which is uploaded in the bucket and run it>
- name: <the name will be displayed in Action log)
run: <run a command line>
FYI, please refer this post:
https://docs.github.com/en/actions/learn-github-actions/understanding-github-actions
2. Docker configuration
According to statistics collected on the Internet, there are more than 7 million people using Docker every day. It’s no wonder that setting up Docker is one of the steps to prepare for deployment.
(I’m sorry that I will have to cover Docker in detail in another post because it’s not really what I would like to share now, but I show you where it will be used in our project. It was defined as a step in the workflow mentioned above)
- name: docker build and push
uses: docker/build-push-action@v3
with:
push: true
file: docker/node/Dockerfile
context: .
build-args: env=dev
tags: asia-northeast1-docker.pkg.dev/<path in gcp>:${{ github.sha }}
3. Terraform configuration
As I have admitted from the beginning, I honestly have no idea about anything related to infrastructure. So, let me just say what I understand is that we will need a source to setup the terraform environment and it will be used as a final step after building the image above.
- name: setup terraform
uses: hashicorp/setup-terraform@v2
with:
terraform_version: 1.4.4
- name: terraform/management apply
run: bash terraform/management/bin/apply.sh dev
env:
TF_VAR_fe_management_image_tag: ${{ github.sha }}
II. Deployment Errors
The situation we faced was that only one source was deployed successfully but the configuration of both was the same. However, both initial sources were able to deployed once.
And since we weren’t able to deploy two current sources successfully at the same time, there were a lot of assumptions appeared in our mind.
- Failed in checking the healthy status of an instance
STARTUP HTTP probe failed 3 times consecutively for container "bext-fe-management-1" on path "/healthcheck". The instance was not started.
When the first time we met this error, we tried increasing periodSeconds in the configuration and then, it passed. However, the instance still failed.
2. Tried running Docker images locally
Then, we came up with an idea of downloading built Docker image from GCP and tried running it locally. Of course, it could run normally.
Here you can upload and download your Docker images:
Luckily, there were a small difference outputted in the log:
- The initial source:
Listening on port 3000 url: http://localhost:3000
- The current source which was always failed:
- ready started server on 127.0.0.1:3000, url: http://127.0.0.1:3000
Besides the strange log output, another thing was that we had no way to access the /healthcheck api as well.
- The initial source:
/app # wget http://127.0.0.1:3000/healthcheck
Connecting to 127.0.0.1:3000 (127.0.0.1:3000)
saving to 'healthcheck'
healthcheck 100% |********************************************************************************************************************************************************************************************************************************| 16881 0:00:00 ETA
'healthcheck' saved
- The current source which was always failed:
/app # wget http://127.0.0.1:3000/healthcheck
Connecting to 127.0.0.1:3000 (127.0.0.1:3000)
wget: can't connect to remote host (127.0.0.1): Connection refused
3. Figured out how to debug the instance.
As those who have experience working with React applications will know this log must be output from server.js
In Next13, we just need to specify standalone mode so it will help us to generate this file, which implicitly causes this problem.
- To check server.js from a running container, you can use this command:
docker exec -it <containerId> /bin/bash
Or clicking on this CLI icon if you’re using Docker GUI.
Can you guess what we saw in the server.js?
- The initial source:
const { createServerHandler } = require('next/dist/server/lib/render-server-standalone')
...omit the similarity above...
process.env.__NEXT_PRIVATE_STANDALONE_CONFIG = JSON.stringify(nextConfig)
const server = http.createServer(async (req, res) => {
try {
await handler(req, res)
} catch (err) {
console.error(err);
res.statusCode = 500
res.end('Internal Server Error')
}
})
if (
!Number.isNaN(keepAliveTimeout) &&
Number.isFinite(keepAliveTimeout) &&
keepAliveTimeout >= 0
) {
server.keepAliveTimeout = keepAliveTimeout
}
server.listen(currentPort, async (err) => {
if (err) {
console.error("Failed to start server", err)
process.exit(1)
}
handler = await createServerHandler({
port: currentPort,
hostname,
dir,
conf: nextConfig,
})
console.log(
'Listening on port',
currentPort,
'url: http://' + hostname + ':' + currentPort
)
});
- The current source which was always failed:
const { startServer } = require('next/dist/server/lib/start-server')
...omit the similarity above...
process.env.__NEXT_PRIVATE_STANDALONE_CONFIG = JSON.stringify(nextConfig)
if (
Number.isNaN(keepAliveTimeout) ||
!Number.isFinite(keepAliveTimeout) ||
keepAliveTimeout < 0
) {
keepAliveTimeout = undefined
}
startServer({
dir,
isDev: false,
config: nextConfig,
hostname,
port: currentPort,
allowRetry: false,
keepAliveTimeout,
useWorkers: !!nextConfig.experimental?.appDir,
}).catch((err) => {
console.error(err);
process.exit(1);
});
4. Be careful with the version using in package.json
After about two weeks of struggling with this confusing problem, this comment literally saved our lives.
https://github.com/vercel/next.js/issues/53171#issuecomment-1689050295
Finally, we could find the root of this problem, it’s from this line in package.json
After degrading it into 13.4.1 and remove ^ also, we was happy to announce that two applications were able to be deployed at the same time.
And why do I have to emphasize “remove ^”, I hope you find this article useful.
III. Summary
There are a few lessons I learned through this problem:
- Try to build up the infrastructure as soon as you start implementing the project. And be sure that every source code can be deployed successfully once.
- Try to verify the deployed built image. Although it’s built and run successfully in the local, it doesn’t mean it will work the same in another environment.
- Pay most attention to the version of the dependencies. If we don’t want to encounter this problem in someday, it would be better to update the version manually after we can make sure that the new version will not be degraded.
Okay, last but not least, thank you for spending time to read my painful story.
Hope you have a nice day / night ~