README.txt: Guerrilla metadata entry

J T
3 min readMay 17, 2018

--

For many years I’ve been in search of the perfect file organization and metadata entry tool. I’ve tried databases (proprietary, open source, and self-built), archive file formats, file taggers, extended attributes, and filename conventions. I haven’t stuck with any of them, even the ones I put serious time and effort into.

Perhaps one day the perfect tool or method will arrive. What I’ve been doing in the meantime is a stopgap hack that has turned out to be simple, useful, and enduring: the humble README.txt.

I should start by saying that the following things are very important to me. They rule out a lot of other solutions:

  • My original files must not be touched. I want to keep stable checksums for integrity management.
  • My original filenames must also not be touched. Some of them are accessed programmatically and that would break them.
  • I don’t want a proprietary system, or something with a lot of dependencies, or something that requires a lot of maintenance.

So here’s what I do:

For a file named IMG_1234.JPG, I create a file named IMG_1234.JPG.README.txt, into which I put metadata. For metadata that applies to a whole directory, I just use a file named README.txt.

I try to semi-structure my READMEs. History and audit is important to me, so my READMEs are append-only. I start each new entry with a timestamp. I finish each entry with a blank line. Free-format text represents a comment or description. If I want more structure, I use Key: Value pairs in the style of HTTP headers. A README file might look like this:

2018–05–16 22:19 +0100
This is a photo of my cat
Tags: cat

Corrections and additions to tags follow a simple rule: latest Tags line wins.

I’ve got a Python script which automates some of this. I have it bound to a shortcut key in Krusader, my favorite file manager.

The structure is all optional. Some of my READMEs are bare text.

Why do I like this scheme? It’s super simple, degrades very gracefully, works perfectly across all operating systems, doesn’t mess with your filenames, doesn’t touch your original files, and is obvious even to another user who knows nothing about the scheme. You can even do it using the iOS Dropbox client and other barren toolless environments.

The POSIXy hierarchical filesystem has stood the test of time. For an archive, I would trust this just-a-bunch-of-files method to be usable in 50 years’ time, moreso than anything more complicated.

The name README is famous - there’s a Wikipedia article about it. How it works is intuitively clear to almost everyone, and was designed to be self-documenting. The basic concept predates computing:

The major downside is that you can get a lot of visual clutter when browsing directories. I don’t particularly like that, but I’ll live with it for the other advantages. You can always prefix the README file with a . to make it a hidden file, then toggle those on or off as you see fit.

“But I want my metadata centralized and nicely organized for querying!” you say? Try Recoll — you can configure the way it extracts keywords for a file. Specifically, you can tell it to find keywords for IMG_1234.JPG in IMG_1234.README.txt using the metadatacmds option plus a bit of Python to do the actual keyword finding. Schema on read!

“But the file metadata might get separated from the file!” you say? I always thought this way too (hence spending so long on relational database solutions). But it’s just never been a problem. Files don’t just go missing, I have good backups, and I backup my filesystem metadata too. Since that’s a oneliner, I’ll share it here:

find / -exec stat {} \; -exec sha256sum {} \; > metadata.txt 2>&1

Conclusion

It’s not as good as a relational database. It’s not as good as RDF. It’s not as good as an application that gives you a slick user interface for tag editing.

But it exists.

--

--