Trigger gsutil with Watchman

Pradeep Kumar Singh
Google Cloud - Community
2 min readJul 3, 2022

‘gsutil’ is a tool to transfer files from on-prem to GCS and vice versa. It provides great features like easy to use interface, parallel files transfer, multipart upload etc. It’s a command line tool, that means it needs to be triggered either manually or from some other processes like cron to schedule file transfers.

There are certain use cases where a user wants to upload files to a given GCS bucket as soon as they are written to a given directory. To handle such cases we can run great ‘gsutil’ with watchman inotify loop. Watchman will ensure the trigger of gsutil command as soon as a file is uploaded to a given directory monitored by watchman.

Install Watchman

Use below steps to download and install watchman.

# Download Watchmanwget https://github.com/facebook/watchman/releases/download/v2022.06.06.00/watchman-v2022.06.06.00-linux.zip# Install Binaryunzip watchman-*-linux.zipcd watchman-v2022.06.06.00-linuxsudo mkdir -p /usr/local/var/run/watchmansudo cp bin/* /usr/local/binsudo cp lib/* /usr/local/libsudo chmod 755 /usr/local/bin/watchmansudo chmod 2777 /usr/local/var/run/watchman

Configure Watchman

  1. Create a custom configuration file /etc/watchman.json
touch /etc/watchman.json

2. Create a directory to be monitored by watchman

mkdir ~/upload

3. Configure watchman to monitor upload directory

# Run below command/usr/local/bin/watchman watch ~/upload/
# It will generate output similar to below one
# Output{"version": "20220605.192726.0","watch": "/home/singhpradeepk/upload","watcher": "inotify"}

4. Confirm the directory is being monitored by watchman

# Run below command/usr/local/bin/watchman watch-list
# It will generate output similar to below one
# Output{"version": "20220605.192726.0","roots": ["/home/singhpradeepk/upload"]}

5. Write a script gsutil_rsync.sh with below content. This script will be executed by watchman to rsync the ‘upload’ directory with GCS bucket.

#!/usr/bin/env bash# Change bucket name with desired value from your environment./usr/bin/gsutil rsync -rdc /home/singhpradeepk/upload gs://sample-bucket

6. Make gsutil_rsync.sh executable

chmod +x gsutil_rsync.sh

7. Create a watchman trigger to run this script in order to upload any changes to the monitored directory

# Run below command/usr/local/bin/watchman -j <<-EOT> ["trigger", "/home/singhpradeepk/upload", { "name": "gcs", "command": ["/home/singhpradeepk/gsutil_rsync.sh"]}]> EOT
# It will generate output similar to below one
# Output{"version": "20220605.192726.0","triggerid": "gcs","disposition": "created"}

7. Test watchman

Now create a file in the ‘upload’ directory and check log file for your environment which should be present at similar location as mine i.e. ‘/usr/local/var/run/watchman/singhpradeepk-state/log’ for transfer logs. Check GCS bucket for confirming successful transfer.

Hope you will use it in your environment to transfer files.

Comments and suggestions are welcome!!

--

--

Pradeep Kumar Singh
Google Cloud - Community

Senior Site Reliability Engineer — Google. Views are my own.