Stackdriver Error Reporting: part 2

Further adventures w/ Golang & Kubernetes Engine

**Update 19–04–02 — This post has become dated and some key parts no longer work correctly. Please see Google’s documentation (link) for definitive guidance. To understand how to use Stackdriver Error Reporting with Kubernetes Engine, this post may still provide benefit. I’m going to encourage my peers to augment the Google documentation to more thoroughly explain the Kubernetes Engine scenario.

**Update 18–03–12 — I think that Stackdriver Debugger with Golang only works with code built with Go versions ≤1.9.x. I’ve upgraded most of my machines to 1.10.y and this appears to not work. But, creating a binary using Go 1.9.4 works and I’m able to set and capture Debugger snaphots.

I wrote about using Stackdriver Error Reporting in Golang and Kubernetes. I continue to be unable to get Stackdriver Debugger working with the results. I’m going to write here about what I tried and then I’ll update this post when my colleagues help me identify the solution.

In this post, I’ve added Google Cloud Source Repositories and Container Builder to automate:

  • Creation of Service Accounts|Keys & Kubernetes Secrets
  • Building the Golang, the Docker image and deploying to Kubernetes

Enable GCP Services

Beside Cloud Source Repostories, Container Builder, and Kubernetes Engine, we’re going to need Google IAM to provision Service Accounts and their Keys and Google Cloud Resource Manager to make changes to the project’s policy (mostly to enhance the Cloud Builder’s own Service Account so that it can create other Service Accounts and deploy to Kubernetes Engine).

Let’s enable the services:

for SERVICE in \
cloudbuild \
cloudresourcemanager \
container \
iam \
sourcerepo
do
gcloud services enable ${SERVICE}.googleapis.com \
--async \
--project=${PROJECT}
done
NB If you’d prefer to run these synchronously to ensure you know when they’re enabled, drop the --async.

Container Builder & Kubernetes Engine Regional Clusters

The Container Builder for kubectl (link) is unable to manage Regional Clusters. But, if you follow the instructions in this GitHub issue (link), you will be able to create a tailored kubectl Builder that does support Regional Clusters.

NB In the following documentation, I assume that you’ve completed this step and are using a Kubernetes Engine Regional Cluster.

Container Build Service Accounts, Keys and Secrets

You may treat this section as an aside but I decided to automate the creation of a Service Account, an associated Key and creation of a Kubernetes Secret representing the Service Account so that the Service Account could be used by Pods deployed to Kubernetes Engine.

Because the following script assumes that the Container Builder service can create Service Accounts and Service Account Keys and deploy to Kubernetes Engine, we must add IAM roles to the Container Builder Service Account to do this before we can run the script:

NUM=$(gcloud projects describe ${PROJECT} \
--format='value(projectNumber)')
for ROLE in \
roles/container.developer \
roles/iam.serviceAccountAdmin \
roles/iam.serviceAccountKeyAdmin \
roles/resourcemanager.projectIamAdmin
do
gcloud projects add-iam-policy-binding ${PROJECT} \
--member=serviceAccount:${NUM}@cloudbuild.gserviceaccount.com \
--role=${ROLE}
done

Here’s a Container Builder script:

NB As mentioned previously, I assume you’re using kubectl against a Regional Cluster. If not, you may remove the 6th step in the above and revise the 7th step name to be gcr.io/cloud-builders/kubectl.

You may run the script from the command-line with:

gcloud container builds submit . \
--config cloudbuild.robot.yaml \
--substitutions=\
_CLUSTER=${CLUSTER},\
_REGION=${REGION},\
_ROBOT=robot-$(date +"%H%M") \
--project=$PROJECT

All being well, this should result in a Secret called error-reporting-key in the default namespace:

kubectl get secrets
NAME TYPE DATA AGE
error-reporting-key Opaque 1 1h

The Kubernetes Deployment script references this Secret; if you decide to rename error-reporting-key ensure you reflect this change also in the Deployment script.

Container Build Stackdriver Debugger Deployment

Unfortunately, this step is where my problems begin. Although able to build the Golang using the Go Builder, the resulting image generates errors when deployed to Kubernetes.

So, rather than include these go build steps:

steps:
- name: "gcr.io/cloud-builders/go"
args: [
"get",
"cloud.google.com/go/errorreporting",
]
env : [
"GOPATH=/workspace",
]
- name: "gcr.io/cloud-builders/go"
args: [
"build",
"-gcflags=-N",
"-gcflags=-l",
"-a",
"-o","main",
]
env: [
"GOOS=linux",
"GOPATH=/workspace",
]

I’m reluctantly building the Go file locally (go version go1.10 linux/amd64) and including the assuming the resulting binary is present for the Cloud Builder script. So, assuming you’re using Go 1.10 too:

CGO_ENABLED=0 go build -gcflags='-N -l'

Which should generate go-errrep then, generate the source context file:

gcloud debug source gen-repo-info-file --output-directory .

Which should generate source-context.json and then:

Here’s the Dockerfile.debugger:

and deployment.debugger.yaml:

NB You must replace [[YOUR-PROJECT-ID]] with the value of ${PROJECT} in lines 19 and 27.

And finally the curtailed cloudbuild.yaml:

Which you may apply with:

gcloud container builds submit . \
--config cloudbuild.debugger.yaml \
--substitutions=\
_CLUSTER=${CLUSTER},\
_REGION=${REGION} \
--project=$PROJECT

All being well:

kubectl get deployments
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE
error-reporting-debug 1 1 1 1

And:

Stackdriver Error Reporting

But:

Expanding a single Error Report

and clicking through to Stackdriver Debugger, the snapshots aren’t ever triggered:

Waiting for Godot

Tidy-up

kubectl delete deployment/error-reporting-debug
kubectl delete secret/error-reporting-key

GCE VM & Error Reporting

What’s curious is that the behavior appears consistent (i.e. doesn’t work) with a pure GCE VM:

INSTANCE=instance-errrep
gcloud beta compute instances create ${INSTANCE} \
--project=${PROJECT} \
--zone=us-west1-c \
--machine-type=custom-2-8192 \
--scopes=https://www.googleapis.com/auth/cloud-platform \
--image=ubuntu-1604-xenial-v20180306 \
--image-project=ubuntu-os-cloud \
--boot-disk-size=50

From our working directory, copy what we need:

for FILE in go-errrep source-context.json ${ROBOT}.key.json
do
gcloud compute scp \
./${FILE} \
${INSTANCE}: \
--project=$PROJECT
done

Then:

gcloud compute ssh ${INSTANCE} --project=${PROJECT}

And, finally, you’ll need to provide a value for ROBOT on the VM but the remaining values can be pulled from the Metadata service. Following up on the other post’s reference to the Kubernetes Downward API, the equivalent values would be pulled from the Metadata service if I were running this in a container on this VM.

METADATA="http://metadata.google.internal/computeMetadata/v1"
ROBOT=[[YOUR-ROBOT-NAME]]
wget -O go-cloud-debug https://storage.googleapis.com/cloud-debugger/compute-go/go-cloud-debug
chmod 0755 go-cloud-debug
export GOOGLE_APPLICATION_CREDENTIALS=./${ROBOT}.key.json
export GOOGLE_PROJECT_ID=$(\
curl \
--silent \
--header "Metadata-Flavor: Google" \
${METADATA}/project/project-id)
export POD_NAME=ubuntu
export POD_NODE=$(\
curl \
--silent \
--header "Metadata-Flavor: Google" \
${METADATA}/instance/name)
export POD_IP=$(\
curl \
--silent \
--header "Metadata-Flavor: Google" \
${METADATA}/instance/network-interfaces/0/ip)
./go-cloud-debug \
-sourcecontext=./source-context.json \
-appmodule=go-errrep \
-appversion=1 \
-- ./go-errrep

which generates:

pod: [ubuntu]; node: [instance-errrep]; ip: [10.138.0.5]
pod: [ubuntu]; node: [instance-errrep]; ip: [10.138.0.5]
pod: [ubuntu]; node: [instance-errrep]; ip: [10.138.0.5]
pod: [ubuntu]; node: [instance-errrep]; ip: [10.138.0.5]
pod: [ubuntu]; node: [instance-errrep]; ip: [10.138.0.5]
error setting breakpoint at main.go:39: couldn't find file "main.go"
pod: [ubuntu]; node: [instance-errrep]; ip: [10.138.0.5]
error setting breakpoint at main.go:37: couldn't find file "main.go"

So, I’m clearly doing something wrong somewhere :-(

Tried it again (later yesterday evening) and it worked:

Tried it again (this morning) and it doesn’t work :-(

dumb-init

Incorporated Yelp’s dumb-init (link) into my builds for no reasons other than, I think I should and I had some time. Because Scratch doesn’t include wget, I pulled dumb-init as part of the group binaries (now comprising go-errrep and go-cloud-debug).

So, per Yelp’s instructions but tweaked to store in the current directory:

wget --output-document ./dumb-init https://github.com/Yelp/dumb-init/releases/download/v1.2.1/dumb-init_1.2.1_amd64
chmod +x ./dumb-init

Then a slightly revised Dockerfile.debugger:

Everything else remains unchanged. When you next build and deploy, the image will include the dumb-init as PID 1.

Logs

You can tail the container’s logs using the Kubernetes Dashboard, the Cloud Console Workloads or from the command-line:

POD=$(\
kubectl get pods \
--selector=app=error-reporting-debug \
--output=jsonpath="{.items[0].metadata.name}"\
)
kubectl logs pod/${POD} --follow

That’s it!