Painful experience with Next13 deployment in GCP

Published in

Goalist Blog

7 min readSep 7, 2023

I wrote this story shortly after we were able to move forward on deploying a React application in Google Cloud Platform (GCP). Since I am just a web developer, infrastructure is truly something horrible to me. However, in some unexpected cases, we also have to involving in checking where the problem of failed deployment comes from. Therefore, I would like to save it as my memorable notes for the later use.

First of all, let’s see what do we have:

2 applications which are built in Next 13 (for Media and Management side)
Google Cloud Platform (GCP)
Terraform
Github Action

I. Deployment Setup

Github Actions configuration

As far as I have researched, we can simply understand that there are two main components of Github Actions: Event and Workflow.

***Brief explanations:

name: the name of the workflow as it will appear in the “Actions” tab

on: used to define which events can cause the workflow to run. In our project, we use label event.

When you create a pull request, you will see it on the right side of the screen.

jobs: used to group a list of jobs which are run in one workflow.

***There are two ways to name a job:

If you don’t specify the name explicitly, it will use the name below “jobs” as a default

jobs:
  terraform-apply:
    runs-on: ubuntu-latest
    timeout-minutes: 15

On the other hand, it will use the name in the configuration.

jobs:
  deploy:
    name: deploy dev
    runs-on: ubuntu-latest
    timeout-minutes: 15

steps: used to group all the actions taken in one job.

There are two main keywords: uses and run.

To set it up, we will need a YAML file like below:

name: dev

on:
  pull_request:
    types:
      - labeled
    paths-ignore:
      - "**.md"
      - ".gitignore"
    branches:
      - main

jobs:
  deploy-dev:
    runs-on: ubuntu-latest
    environment:
      name: dev
    timeout-minutes: 20
    if: |
      ( github.event.action == 'labeled' && github.event.label.name == 'dev' ) |
    steps:
      - name: <the name will be displayed in Action log)
        uses: <check out a repository or an image which is uploaded in the bucket and run it>
      - name: <the name will be displayed in Action log)
        run: <run a command line>

FYI, please refer this post:

https://docs.github.com/en/actions/learn-github-actions/understanding-github-actions

2. Docker configuration

According to statistics collected on the Internet, there are more than 7 million people using Docker every day. It’s no wonder that setting up Docker is one of the steps to prepare for deployment.

(I’m sorry that I will have to cover Docker in detail in another post because it’s not really what I would like to share now, but I show you where it will be used in our project. It was defined as a step in the workflow mentioned above)

- name: docker build and push
  uses: docker/build-push-action@v3
  with:
    push: true
    file: docker/node/Dockerfile
    context: .
    build-args: env=dev
    tags: asia-northeast1-docker.pkg.dev/<path in gcp>:${{ github.sha }}

3. Terraform configuration

As I have admitted from the beginning, I honestly have no idea about anything related to infrastructure. So, let me just say what I understand is that we will need a source to setup the terraform environment and it will be used as a final step after building the image above.

- name: setup terraform
  uses: hashicorp/setup-terraform@v2
  with:
    terraform_version: 1.4.4

- name: terraform/management apply
  run: bash terraform/management/bin/apply.sh dev
  env:
    TF_VAR_fe_management_image_tag: ${{ github.sha }}

II. Deployment Errors

The situation we faced was that only one source was deployed successfully but the configuration of both was the same. However, both initial sources were able to deployed once.

And since we weren’t able to deploy two current sources successfully at the same time, there were a lot of assumptions appeared in our mind.

Failed in checking the healthy status of an instance

STARTUP HTTP probe failed 3 times consecutively for container "bext-fe-management-1" on path "/healthcheck". The instance was not started.

When the first time we met this error, we tried increasing periodSeconds in the configuration and then, it passed. However, the instance still failed.

2. Tried running Docker images locally

Then, we came up with an idea of downloading built Docker image from GCP and tried running it locally. Of course, it could run normally.

Here you can upload and download your Docker images:

Store artifacts in Artifact Registry | Cloud Build Documentation | Google Cloud

This page describes how to configure Cloud Build to store built artifacts in an Artifact Registry repository. If the…

cloud.google.com

Luckily, there were a small difference outputted in the log:

The initial source:

Listening on port 3000 url: http://localhost:3000

The current source which was always failed:

- ready started server on 127.0.0.1:3000, url: http://127.0.0.1:3000

Besides the strange log output, another thing was that we had no way to access the /healthcheck api as well.

The initial source:

/app # wget http://127.0.0.1:3000/healthcheck
Connecting to 127.0.0.1:3000 (127.0.0.1:3000)
saving to 'healthcheck'
healthcheck          100% |********************************************************************************************************************************************************************************************************************************| 16881  0:00:00 ETA
'healthcheck' saved

The current source which was always failed:

/app # wget http://127.0.0.1:3000/healthcheck
Connecting to 127.0.0.1:3000 (127.0.0.1:3000)
wget: can't connect to remote host (127.0.0.1): Connection refused

3. Figured out how to debug the instance.

As those who have experience working with React applications will know this log must be output from server.js

In Next13, we just need to specify standalone mode so it will help us to generate this file, which implicitly causes this problem.

To check server.js from a running container, you can use this command:

docker exec -it <containerId> /bin/bash

Or clicking on this CLI icon if you’re using Docker GUI.

List of React application’s files inside a container

Can you guess what we saw in the server.js?

The initial source:

const { createServerHandler } = require('next/dist/server/lib/render-server-standalone')

...omit the similarity above...

process.env.__NEXT_PRIVATE_STANDALONE_CONFIG = JSON.stringify(nextConfig)

const server = http.createServer(async (req, res) => {
  try {
    await handler(req, res)
  } catch (err) {
    console.error(err);
    res.statusCode = 500
    res.end('Internal Server Error')
  }
})

if (
  !Number.isNaN(keepAliveTimeout) &&
    Number.isFinite(keepAliveTimeout) &&
    keepAliveTimeout >= 0
) {
  server.keepAliveTimeout = keepAliveTimeout
}
server.listen(currentPort, async (err) => {
  if (err) {
    console.error("Failed to start server", err)
    process.exit(1)
  }

  handler = await createServerHandler({
    port: currentPort,
    hostname,
    dir,
    conf: nextConfig,
  })

  console.log(
    'Listening on port',
    currentPort,
    'url: http://' + hostname + ':' + currentPort
  )
});

The current source which was always failed:

const { startServer } = require('next/dist/server/lib/start-server')

...omit the similarity above...

process.env.__NEXT_PRIVATE_STANDALONE_CONFIG = JSON.stringify(nextConfig)

if (
  Number.isNaN(keepAliveTimeout) ||
  !Number.isFinite(keepAliveTimeout) ||
  keepAliveTimeout < 0
) {
  keepAliveTimeout = undefined
}

startServer({
  dir,
  isDev: false,
  config: nextConfig,
  hostname,
  port: currentPort,
  allowRetry: false,
  keepAliveTimeout,
  useWorkers: !!nextConfig.experimental?.appDir,
}).catch((err) => {
  console.error(err);
  process.exit(1);
});

4. Be careful with the version using in package.json

After about two weeks of struggling with this confusing problem, this comment literally saved our lives.

https://github.com/vercel/next.js/issues/53171#issuecomment-1689050295

Finally, we could find the root of this problem, it’s from this line in package.json

After degrading it into 13.4.1 and remove ^ also, we was happy to announce that two applications were able to be deployed at the same time.

And why do I have to emphasize “remove ^”, I hope you find this article useful.

III. Summary

There are a few lessons I learned through this problem:

Try to build up the infrastructure as soon as you start implementing the project. And be sure that every source code can be deployed successfully once.
Try to verify the deployed built image. Although it’s built and run successfully in the local, it doesn’t mean it will work the same in another environment.
Pay most attention to the version of the dependencies. If we don’t want to encounter this problem in someday, it would be better to update the version manually after we can make sure that the new version will not be degraded.

Okay, last but not least, thank you for spending time to read my painful story.

Hope you have a nice day / night ~