How to get notifications about file system changes on Windows

Balázs Kovács
Tresorit Engineering
7 min readApr 3, 2020

Motivation

As Tresorit offers file synchronization, we need to detect changes on the file system. In order to make this feature actually usable, the software needs to be informed about changes made by other applications as quickly as possible. We could scan the entire synchronized directory tree periodically, but that is slow and also consumes a lot of resources, so it does not provide an adequate user experience. It is much better to only process the actual changes, and most operating systems, including Windows, can provide this information. The module that implements this feature is written in C++, and in the past a popular C library, libuv, was used to get these change events. But we felt that using the Windows API directly would be better for multiple reasons. We could remove the library, which should reduce maintenance costs. A homegrown solution would also gives us more control over the implementation.

To keep resource consumption low, we’ve agreed that at most a single thread should be used for all the paths we want to listen to, and at least 100 different paths should be supported.

In this post, I would like to discuss how we designed the replacement, and describe the final implementation. During the development, I found that it is very hard to find good learning material about the relevant API, so I hope that my summary will help others.

Possible solutions

Windows offers multiple ways to listen for changes on the file system. Let’s take a look!

There is a very simple function: FindFirstChangeNotificationW(). The name says it all, it waits until the first change happens on the given path. The problem is that changes between calls will be lost, so it is not suitable for the task in hand.

There is also a more complex one: the ReadDirectoryChangesW() function (shortened as RDC()). This API can return all changes since it buffers them between calls, so there would be no events lost. Although be aware, that in cases when the used buffer is not large enough, the function will report an error, and subsequent events will be lost. This is not a deal-breaker since in this rare case, we could just re-scan the entire path.

ReadDirectoryChangesW()

ReadDirectoryChangesW() returns changes in a data structure, called FILE_NOTIFY_INFORMATION. It is basically a custom single-linked list, where the nodes contain the relative path and the type of change (e.g. create, delete). It does not contain any information about the path that is being watched, so that has to be stored by the client code. In our case a class named WatchInfo was created.

There is an *Ex version of this function(ReadDirectoryChangesExW()), which returns a more verbose structure (FILE_NOTIFY_EXTENDED_INFORMATION), but we do not need this extra information for our internal API. Also, it would be problematic to use because it is only available since Windows 10, version 1709.

RDC() needs the following input parameters:

  • HANDLE hDirectory: a handle to the directory to be monitored
  • VOID* lpBuffer: buffer where the FILE_NOTIFY_INFORMATION will be stored
  • DWORD nBufferLength: size of the buffer
  • BOOL bWatchSubtree: whether to watch the entire directory tree—in our case this is always TRUE
  • DWORD dwNotifyFilter: a filter for event categories. By only listening to a subset of events, we slightly improved the performance of our application and fixed a minor issue in the process. Since libuv's solution listens for all kinds of events, this was an improvement.

It also has other input parameters that control the way it reports results.

RDC() can report changes in multiple ways:

  1. Using in a blocking way — obviously not applicable here, as we cannot be sure that there would be changes on every watched path.
  2. Using the GetOverlappedResult() function with WaitForMultipleObjects().
  3. Using a completion routine.
  4. Waiting on an I/O completion port.

The GetOverlappedResult() / WaitForMultipleObjects() way

We could create some sort of event object to signal the waiter thread, that there were some actions (e.g. change of path to watch, etc.). Then one can call WaitForMultipleObjects()on these event objects. For an implementation, check out the Path Watcher Node module used by the Atom editor.

The problem with this approach is that there is an upper limit of 64 objects. This limitation comes from the value of the MAXIMUM_WAIT_OBJECTS macro. Consequently, the Path Watcher Node module has an open issue regarding this limitation. Unfortunately, it is not likely that a future Windows version will increase this limit, as it is outlined by a StackOverflow answer:

[…] Since STATUS_ABANDONED_WAIT_63 is defined as 0xBF and STATUS_USER_APC is defined as 0xC0, if you incremented MAXIMUM_WAIT_OBJECTS by even just one, there would be no way to tell the difference between the 65th handle being abandoned and your wait being terminated by an APC. Properly changing MAXIMUM_WAIT_OBJECTS would require renumbering the status codes, which would require recompiling every Win32 program in existence.

Also, a program compiled with MAXIMUM_WAIT_OBJECTS defined as 65 would fail on an OS where it’s defined as 64.

Although it is possible to watch more than 64 paths using this technique, it requires additional logic, as one has to create multiple threads, which wait on each other, organized in a tree. Because of the added complexity, we did not take this approach.

Using a completion routine

Another way is to specify a completion routine. It would run as an Asynchronous Procedure Call (APC). Our issue with this approach was that the application currently does not use APCs for any other function, so there is no thread pool for them. Currently, when a thread sleeps, it goes into an alertable wait state, meaning an APC could be scheduled on them, and the SleepEx() function will return once it is completed. This could mean that a thread that intends to sleep for 5 seconds will suddenly return in 200 ms. While the code is written with this assumption in general, it might introduce some hard-to-debug issues.

Shutdown also poses some challenges. I have not found a way to either cancel an APC or determine whether it has been run or not. Of course, you can add this functionality with instance counting or other ways, but it would add complexity without any notable gain.

The I/O completion port way

An I/O completion port (usually shortened as IOCP) is basically an I/O message queue. The GetQueuedCompletionStatusEx() function can be used to get these messages. In our case, there will be a "packet" for each change notification. The next problem to solve is to determine the directory where the change occurred. These "packets" have a fairly rigid structure, they only contain the following information:

  • DWORD* lpNumberOfBytesTransferred: Not controlled by the client code, it describes the length of the FILE_NOTIFY_INFORMATION data structure.
  • ULONG_PTR* lpCompletionKey: Should be used for this purpose, but it is not controlled from the RDC() call—it can be set with the other function: CreateIoCompletionPort() .
  • OVERLAPPED** lpOverlapped: Although the OVERLAPPED structure contains irrelevant information for us, as it is mainly intended for asynchronous I/O operations, the pointer itself can be used to uniquely identify the source of the change. The instance can be supplied by the client code, when ReadDirectoryChangesW() is called.

Although you can use the CompletionKey as well, we have chosen to use the OVERLAPPED structure for identification. We save the information in an std::map where the OVERLAPPED*is the key, and we look up the additional data (the FILE_NOTIFY_INFORMATION buffer, original path, etc.) every time a change is reported.

So we create a new thread that runs this little event loop, which just listens to messages on the IOCP. Shutdown is done by sending a special packet to the IOCP, where the CompletionKey is filled, and it stops the event loop.

To remove a path we no longer wish to listen to, one can close the directory handle of it. Once you do that, a closing notification is sent by RDC(). Most of the time it is a 0 byte long empty “message”, but if there are changes on the path, it can be an actual change event. This means that you cannot deallocate the buffer until you have received the closing notification. This is why the event loop does not close immediately after it receives the shutdown packet: it has to check whether all closing notifications have arrived and destructed the appropriate WatchInfo instances. This behavior is not explicitly documented, and I think it is because Microsoft's API documentation is very function-oriented. Following that logic, it should be mentioned in the documentation for CloseHandle(), but honestly, it has nothing to do with RDC(), it is a very generic function. The RDC() article itself explains a lot of related things, but not much about how to stop the listening, which is understandable, given that this function initiates the listening.

A good resource about using RDC() with IOCPs can be found in the source of 0 A.D., an open source RTS game. Unfortunately, I’ve only found it when our solution was somewhat complete. It is a very well documented code, if the example code is not clear, then you can check that out as well. The version that I’ve initially found is quite old, its current incarnation is somewhat different.

Be aware that the reported relative path might contain short names (8.3 names, like RDC_FS~1.CPP). The documentation is kind of vague about when this happens:

If there is both a short and long name for the file, the function will return one of these names, but it is unspecified which one.

So if you use long names, you have to resolve the long name of a file or directory with GetLongPathNameW(). Please note that it is not always possible, for example when we receive a notification because a file was deleted. Although I could not trigger the reporting of short names on my development machine or in our test environment, metrics from customers indicate that it does happen, although only on a small portion of the machines.

Summary

With our solution we replaced libuv’s filesystem watcher with a smaller, less complex implementation, that can be parameterized as to which changes to look for. Taking advantage of this, we reduced the watched event types. This even fixed an issue, where an Alternate Data Stream modification inadvertently changed the access time of a file, which was reported as a change.

The good news is that the implementation is available at GitHub, so if my explanation is missing something, you can always check out the working example.

In the end, I think a better alternative was created, which works better and is easier to maintain. Also, I’ve learned a lot about the Win32 API, which is always nice, given the ubiquitousness of the platform.

--

--