Trigger gsutil with Watchman
‘gsutil’ is a tool to transfer files from on-prem to GCS and vice versa. It provides great features like easy to use interface, parallel files transfer, multipart upload etc. It’s a command line tool, that means it needs to be triggered either manually or from some other processes like cron to schedule file transfers.
There are certain use cases where a user wants to upload files to a given GCS bucket as soon as they are written to a given directory. To handle such cases we can run great ‘gsutil’ with watchman inotify loop. Watchman will ensure the trigger of gsutil command as soon as a file is uploaded to a given directory monitored by watchman.
Install Watchman
Use below steps to download and install watchman.
# Download Watchmanwget https://github.com/facebook/watchman/releases/download/v2022.06.06.00/watchman-v2022.06.06.00-linux.zip# Install Binaryunzip watchman-*-linux.zipcd watchman-v2022.06.06.00-linuxsudo mkdir -p /usr/local/var/run/watchmansudo cp bin/* /usr/local/binsudo cp lib/* /usr/local/libsudo chmod 755 /usr/local/bin/watchmansudo chmod 2777 /usr/local/var/run/watchman
Configure Watchman
- Create a custom configuration file /etc/watchman.json
touch /etc/watchman.json
2. Create a directory to be monitored by watchman
mkdir ~/upload
3. Configure watchman to monitor upload directory
# Run below command/usr/local/bin/watchman watch ~/upload/
# It will generate output similar to below one# Output{"version": "20220605.192726.0","watch": "/home/singhpradeepk/upload","watcher": "inotify"}
4. Confirm the directory is being monitored by watchman
# Run below command/usr/local/bin/watchman watch-list
# It will generate output similar to below one# Output{"version": "20220605.192726.0","roots": ["/home/singhpradeepk/upload"]}
5. Write a script gsutil_rsync.sh with below content. This script will be executed by watchman to rsync the ‘upload’ directory with GCS bucket.
#!/usr/bin/env bash# Change bucket name with desired value from your environment./usr/bin/gsutil rsync -rdc /home/singhpradeepk/upload gs://sample-bucket
6. Make gsutil_rsync.sh executable
chmod +x gsutil_rsync.sh
7. Create a watchman trigger to run this script in order to upload any changes to the monitored directory
# Run below command/usr/local/bin/watchman -j <<-EOT> ["trigger", "/home/singhpradeepk/upload", { "name": "gcs", "command": ["/home/singhpradeepk/gsutil_rsync.sh"]}]> EOT
# It will generate output similar to below one# Output{"version": "20220605.192726.0","triggerid": "gcs","disposition": "created"}
7. Test watchman
Now create a file in the ‘upload’ directory and check log file for your environment which should be present at similar location as mine i.e. ‘/usr/local/var/run/watchman/singhpradeepk-state/log’ for transfer logs. Check GCS bucket for confirming successful transfer.
Hope you will use it in your environment to transfer files.
Comments and suggestions are welcome!!