Painful experience with Next13 deployment in GCP

Tinytella(トリン)
Goalist Blog
Published in
7 min readSep 7, 2023

I wrote this story shortly after we were able to move forward on deploying a React application in Google Cloud Platform (GCP). Since I am just a web developer, infrastructure is truly something horrible to me. However, in some unexpected cases, we also have to involving in checking where the problem of failed deployment comes from. Therefore, I would like to save it as my memorable notes for the later use.

First of all, let’s see what do we have:

  • 2 applications which are built in Next 13 (for Media and Management side)
  • Google Cloud Platform (GCP)
  • Terraform
  • Github Action

I. Deployment Setup

  1. Github Actions configuration

***Brief explanations:

  • name: the name of the workflow as it will appear in the “Actions” tab
The name displays in Actions Tab
  • on: used to define which events can cause the workflow to run. In our project, we use label event.

When you create a pull request, you will see it on the right side of the screen.

  • jobs: used to group a list of jobs which are run in one workflow.

***There are two ways to name a job:

If you don’t specify the name explicitly, it will use the name below “jobs” as a default

jobs:
terraform-apply:
runs-on: ubuntu-latest
timeout-minutes: 15

On the other hand, it will use the name in the configuration.

jobs:
deploy:
name: deploy dev
runs-on: ubuntu-latest
timeout-minutes: 15
  • steps: used to group all the actions taken in one job.

There are two main keywords: uses and run.

To set it up, we will need a YAML file like below:

name: dev

on:
pull_request:
types:
- labeled
paths-ignore:
- "**.md"
- ".gitignore"
branches:
- main

jobs:
deploy-dev:
runs-on: ubuntu-latest
environment:
name: dev
timeout-minutes: 20
if: |
( github.event.action == 'labeled' && github.event.label.name == 'dev' ) |
steps:
- name: <the name will be displayed in Action log)
uses: <check out a repository or an image which is uploaded in the bucket and run it>
- name: <the name will be displayed in Action log)
run: <run a command line>

FYI, please refer this post:

https://docs.github.com/en/actions/learn-github-actions/understanding-github-actions

2. Docker configuration

According to statistics collected on the Internet, there are more than 7 million people using Docker every day. It’s no wonder that setting up Docker is one of the steps to prepare for deployment.

(I’m sorry that I will have to cover Docker in detail in another post because it’s not really what I would like to share now, but I show you where it will be used in our project. It was defined as a step in the workflow mentioned above)

- name: docker build and push
uses: docker/build-push-action@v3
with:
push: true
file: docker/node/Dockerfile
context: .
build-args: env=dev
tags: asia-northeast1-docker.pkg.dev/<path in gcp>:${{ github.sha }}

3. Terraform configuration

As I have admitted from the beginning, I honestly have no idea about anything related to infrastructure. So, let me just say what I understand is that we will need a source to setup the terraform environment and it will be used as a final step after building the image above.

- name: setup terraform
uses: hashicorp/setup-terraform@v2
with:
terraform_version: 1.4.4

- name: terraform/management apply
run: bash terraform/management/bin/apply.sh dev
env:
TF_VAR_fe_management_image_tag: ${{ github.sha }}

II. Deployment Errors

The situation we faced was that only one source was deployed successfully but the configuration of both was the same. However, both initial sources were able to deployed once.

And since we weren’t able to deploy two current sources successfully at the same time, there were a lot of assumptions appeared in our mind.

  1. Failed in checking the healthy status of an instance
STARTUP HTTP probe failed 3 times consecutively for container "bext-fe-management-1" on path "/healthcheck". The instance was not started.

When the first time we met this error, we tried increasing periodSeconds in the configuration and then, it passed. However, the instance still failed.

2. Tried running Docker images locally

Then, we came up with an idea of downloading built Docker image from GCP and tried running it locally. Of course, it could run normally.

Here you can upload and download your Docker images:

Luckily, there were a small difference outputted in the log:

  • The initial source:
Listening on port 3000 url: http://localhost:3000
  • The current source which was always failed:
- ready started server on 127.0.0.1:3000, url: http://127.0.0.1:3000

Besides the strange log output, another thing was that we had no way to access the /healthcheck api as well.

  • The initial source:
/app # wget http://127.0.0.1:3000/healthcheck
Connecting to 127.0.0.1:3000 (127.0.0.1:3000)
saving to 'healthcheck'
healthcheck 100% |********************************************************************************************************************************************************************************************************************************| 16881 0:00:00 ETA
'healthcheck' saved
  • The current source which was always failed:
/app # wget http://127.0.0.1:3000/healthcheck
Connecting to 127.0.0.1:3000 (127.0.0.1:3000)
wget: can't connect to remote host (127.0.0.1): Connection refused

3. Figured out how to debug the instance.

As those who have experience working with React applications will know this log must be output from server.js

In Next13, we just need to specify standalone mode so it will help us to generate this file, which implicitly causes this problem.

  • To check server.js from a running container, you can use this command:
docker exec -it <containerId> /bin/bash

Or clicking on this CLI icon if you’re using Docker GUI.

List of React application’s files inside a container

Can you guess what we saw in the server.js?

  • The initial source:
const { createServerHandler } = require('next/dist/server/lib/render-server-standalone')

...omit the similarity above...

process.env.__NEXT_PRIVATE_STANDALONE_CONFIG = JSON.stringify(nextConfig)

const server = http.createServer(async (req, res) => {
try {
await handler(req, res)
} catch (err) {
console.error(err);
res.statusCode = 500
res.end('Internal Server Error')
}
})

if (
!Number.isNaN(keepAliveTimeout) &&
Number.isFinite(keepAliveTimeout) &&
keepAliveTimeout >= 0
) {
server.keepAliveTimeout = keepAliveTimeout
}
server.listen(currentPort, async (err) => {
if (err) {
console.error("Failed to start server", err)
process.exit(1)
}

handler = await createServerHandler({
port: currentPort,
hostname,
dir,
conf: nextConfig,
})

console.log(
'Listening on port',
currentPort,
'url: http://' + hostname + ':' + currentPort
)
});
  • The current source which was always failed:
const { startServer } = require('next/dist/server/lib/start-server')

...omit the similarity above...

process.env.__NEXT_PRIVATE_STANDALONE_CONFIG = JSON.stringify(nextConfig)

if (
Number.isNaN(keepAliveTimeout) ||
!Number.isFinite(keepAliveTimeout) ||
keepAliveTimeout < 0
) {
keepAliveTimeout = undefined
}

startServer({
dir,
isDev: false,
config: nextConfig,
hostname,
port: currentPort,
allowRetry: false,
keepAliveTimeout,
useWorkers: !!nextConfig.experimental?.appDir,
}).catch((err) => {
console.error(err);
process.exit(1);
});

4. Be careful with the version using in package.json

After about two weeks of struggling with this confusing problem, this comment literally saved our lives.

https://github.com/vercel/next.js/issues/53171#issuecomment-1689050295

Finally, we could find the root of this problem, it’s from this line in package.json

After degrading it into 13.4.1 and remove ^ also, we was happy to announce that two applications were able to be deployed at the same time.

And why do I have to emphasize “remove ^”, I hope you find this article useful.

III. Summary

There are a few lessons I learned through this problem:

  1. Try to build up the infrastructure as soon as you start implementing the project. And be sure that every source code can be deployed successfully once.
  2. Try to verify the deployed built image. Although it’s built and run successfully in the local, it doesn’t mean it will work the same in another environment.
  3. Pay most attention to the version of the dependencies. If we don’t want to encounter this problem in someday, it would be better to update the version manually after we can make sure that the new version will not be degraded.

Okay, last but not least, thank you for spending time to read my painful story.

Hope you have a nice day / night ~

--

--