Photo by Dima Langemann on Unsplash

An easy FileWatcher for python — No Side-Effects & Quick Setup (Watchdog Alternative)

Efstratios Pahis

--

Background

In a recent client project written in python, I had to implement a functionality where certain tasks were performed after a specific file was altered in a dedicated directory. In search of applicable 3rd party libraries, I promptly discovered watchdog as being praised as the best among all. Looking at the GitHub stats the project seemed promising. However, during implementation and use, I found some drawbacks which I would like to address in this post and introduce you to a library to rule them out, which I created.

Watchdog Drawbacks

First, I would like to share my experience working with watchdog which motivated me to create my own Filewatcher. What bothered me was irregularity and unpredictability. So, whenever changes were detected by watchdog, events were triggered multiple times (or sometimes not at all). After researching and debugging, I could not find a certain answer to why this phenomenon occurred. However, I believe it is either associated with the handling of temporal hidden Files created on opening files, and/or the duration of the associated tasks triggered by a watchdog event. Considering the latter, it seemed the longer a task took to perform the higher the probability of multiple executions. That made me think that it is caused by watchdog not updating internally the new directory structure at directory modification time but rather after associated task completion. Why couldn’t I clearly pinpoint the cause? Well, one reason for that is also the event object’s unclear description of an event’s cause/source.

So, to summarize also other drawbacks in short:

→ No configurable starting & end time concerning when a directory should be watched

→No possibility to pause & resume the watcher to watch over a directory

→No configurable polling time: Watching for changes is constant

→By default, watchdog runs on the main thread: Manual Multi-Threading is needed

→Does not persist the watching tasks: Has to be reconfigured and started again manually at program start

→At first complicated structure and a bit cumbersome to get started

EasyFileWatcher

So, to deal with those drawbacks I built a File watching Program that relies on APScheduler. Therefore, the philosophy of this approach is that file watching is a scheduled task performed at certain intervals. And by using APScheduler as a basis, scheduling start & end times are possible as well as pausing & resuming tasks. Furthermore, by default, it runs as a daemon thread in the background conveniently tied to the main thread’s lifecycle. The polling time can be configured as needed from nearly real-time to even years, giving the possibility to make it more efficient if file changes are not required to be monitored constantly. And finally, tasks are persisted in a database automatically, allowing for automatic Filewatching tasks at program startup. I am aware that some of those drawbacks can be configured also with watchdog like polling with an Observer. However, I find using EasyFilewatcher more intuitive (obviously).

Getting started is also very easy and convenient:

from easyfilewatcher.EasyFileWatcher import EasyFileWatcher


def print_msg(msg: str):
print(msg)


if __name__ == "__main__":
filewatcher = EasyFileWatcher()
filewatcher.add_directory_to_watch(directory_path="your\\directory",
directory_watcher_id="my_id", callback=print_msg,
callback_param={'msg': 'hi'}, event_on_deletion=False)
while(True):
pass

Check the project here, and let me know what you think :)

Till then, happy hacking!

--

--

Efstratios Pahis
Efstratios Pahis

Written by Efstratios Pahis

Developer advocating Python’s Enterprise Software Capabilities. Just created a company & software solution with a friend introducing Data Science into EAM

No responses yet