This tutorial shows how to use the Scheduled Backups tool to configure Cloud Scheduler for creating Cloud Spanner backups on a regular basis, e.g. daily or weekly.

We will use the following GCP services:

  • Cloud Scheduler: trigger tasks with a cron-based schedule.
  • Cloud Pub/Sub: a message queue from Cloud Scheduler to Cloud Functions.
  • Cloud Functions: start an operation for creating a Cloud Spanner backup.
  • Cloud Logging: create logs-based metrics.
  • Cloud Monitoring: create alerts based on conditions of logs-based metrics.

The architecture is as follows:

We will need to do the following steps:

  • Create a pub/sub topic
  • Deploy a function to…


This tutorial will show you how to enable the session-related metrics in a demo application with the Cloud Spanner client library in Go. These metrics allow for better monitoring of the session pool and help us to quickly identify session issues:

  • max_allowed_sessions: The maximum number of sessions allowed. Configurable by the user.
  • num_sessions_in_pool: The number of sessions currently in the session pool, including in-use and unused.
  • max_in_use_sessions: The maximum number of sessions in use during the last 10 minute interval.
  • get_session_timeouts: The number of get sessions timeouts due to pool exhaustion.
  • num_acquired_sessions: The number of sessions acquired from the session…


The background is that I have an S3 hosted website but the DNS records are managed by Google Domains. Previously, it doesn’t support HTTPS, so I would like to add it now.

Two more AWS services are required:

  • AWS CloudFront: associate your certificate with the S3 bucket URL
  • AWS Certificate Manager (ACM): manage your SSL certificate

Steps

1. Create an SSL certificate in ACM

  1. Request a public certificate
  2. Add domain names: *.yourwebsite.com and yourwebsite.com
  3. DNS validation
  4. Note down CNAME. It looks like:
Name: [ABCDE].yourwebsite.com.
Type: CNAME
Value: [FGHIJK].acm-validations.aws.

5. Add a record set in “Custom resource records” (Google Domains):

Name: [ABCDE]
Type: CNAME
TTL: 1H
Domain name: [FGHIJK].acm-validations.aws.


What does startingDeadlineSeconds mean?

I was working on Kubernetes CronJob and I was wondering what startingDeadlineSeconds is. There is official documentation, but I am still confused after reading it.

After looking at the source code, I think startingDeadlineSeconds means that if a CronJob controller cannot start a job run on its schedule, it will keep retrying until startingDeadlineSeconds is reached.

Before showing a few examples, we need to clarify some concepts:

Controller check: CronJob controller checks things (watching and syncing jobs) every 10 seconds.

Schedule: the time to execute the job according to the given schedule expression.

Job run: a job object is created…


A scheduled (cron-like) task

This all begins with my need to schedule a script to crawl some stock data weekly. In a web service or application, we always have some needs to do a job at fixed times, dates, or intervals.

The most famous job scheduler is cron, which provides a utility for scheduling repetitive tasks on Linux/Unix systems. Cron Expression is commonly used to let you define when tasks should be run. You can check this website crontab.guru to configure the cron expression. There are different variants of Cron Expression used in systems, like Jenkins, Kubernetes CronJob, Fargate Scheduled Task etc. …


Actually, this post is related to my previous one about creating a dead letter queue. Now, the problem is how to read from this queue and re-publish celery tasks to a queue.

How to read messages from RabbitMQ

Three ways that I found can read messages from rabbitmq:

  • pika
  • kombu
  • rabbitmq management plugin’s http api

Pika

pika is a pure rabbitmq client library in Python.

The following code allows you to read messages from queues in rabbitmq from RabbitMQ Tutorial — Work Queues:

#!/usr/bin/env python
import pika
import time
connection = pika.BlockingConnection(
pika.ConnectionParameters(host='localhost'))
channel = connection.channel()
channel.queue_declare(queue='default.deadletter', durable=True)
def callback(ch, method, properties, body):
print("…


This is a summary for git workflow management, which is an important part when having a team working together with git. There are several approaches to manage the workflow:

  • All in master branch
  • Feature branch
  • GitFlow
  • GitHub flow
  • GitLab flow

All in master branch

Simple. Everything is a single master branch.

# git pull [remote repository] [branch]
$ git pull origin master
$ git commit
$ git push origin master

git pull is a shorthand of git fetch and git merge FETCH_HEAD. git merge to merge the retrieved branch heads into the current branch.

     A---B---C master on origin…


Last year, in my current company, we spent three months to re-architect and implement a new data capture pipeline based on some ideas how a good dataflow system should be, such as immutable rawest data, log-based message broker, recomputation, and etc. The new capture pipeline is much more scalable and reliable. With the log-based message broker, it has the ability to reprocess all raw data and generate derived data when needed.

We chose to use AWS Kinesis as our log-based message broker which is an easy-to-use and reliable service. More importantly, we don’t need to operate it by ourselves. Then…


Continue to my last post. How should we route tasks to different queues in Celery?

Here is the example:

We can specify queue, exchange, routing_key when calling apply_async. However, it is very confusing that when calling apply_async(routing_key='xxx'), the message goes to the default queue. If you don’t set app.conf.task_default_queue, it will create a queue with name celeryfor you.

Actually, the trick is to specify either queue or exchange + routing_key. I tried the following examples:

add.apply_async((1,2)) 
=> default_queue

add.apply_async((1,2), routing_key='moon')
=> default_queue

add.apply_async((1,2), routing_key='sunshine')
=> default_queue

add.apply_async((1,2), exchange='default', routing_key='moon')
=> moon_queue

add.apply_async((1,2), exchange='default', routing_key='sunshine')
=> sunshine_queue

add.apply_async((1,2), queue='sunshine')
=> sunshine_queue

add.apply_async((1,2), queue='moon')
=> moon_queue


Recently, I started learning how to run async tasks in Celery, which has been used in our company product. One interesting requirement is how to create a dead letter queue in Celery?

RabbitMQ has already supported arguments, x-dead-letter-exchange and x-dead-letter-routing-key, when creating a new queue.

I have made a simple example to create a dead letter queue:

You need a rabbitmq as a broker and redis as a backend. Run docker-compose up -d to start containers.

Just run:

celery -A tasks worker --loglevel=info# open ipython
from tasks import add, div
add.delay(1,2)
div.delay(2,1)
div.delay(2,0) # throws ZeroDivisionError

Reject

Hengfeng Li

PhD. Software engineer at Google. Living in Australia.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store