A Better Finder Rename & Regular Expressions

Managing File Name Components

A Better Finder Rename is a powerful and flexible batch file renaming utility, but sometimes a scenario arises for which ABFR doesn’t provide an out-of-the-box solution. Usually, though, ABFR’s advanced regular expression features can help.

A Better Finder Rename’s regular expression actions bring a lot of functionality to the table, but many ABFR users aren’t familiar with regular expressions. They don’t realize the purpose or potential of these tools, and are at a significant disadvantage. Today, you will learn what regular expressions are, how to apply regular expressions to file renaming tasks, and how to use regular expressions with A Better Finder Rename to manage file name components. To this end, we will look at a few file renaming scenarios that can’t be addressed using out-of-the-box actions:

  • replacing a user-defined part of a file name with a user-defined text string
  • moving a user-defined part of a file name to a new location
  • transposing user-defined parts of a file name

What is a Regular Expression?

A regular expression is an encoded way of representing a pattern in a bit of text. Using a regular expression creates what amounts to a template describing a file naming convention, and can potentially identify a large number of files — all the photos taken during a particular event, for instance — in a given collection. Using this regular expression, we can then tell A Better Finder Rename to replace a certain part of the file name with something else or completely rearrange the components of the file name to form a new file naming convention.

How to Replace a File Name Component

Let’s say we have a number of songs by Texas blues legend Stevie Ray Vaughan. As sometimes happens, many of the tracks feature misspellings of Stevie’s last name, which can be a real pain in the neck. A Better Finder Rename’s Replace regular expression action makes it easy to standardize the spelling among these files. Let’s use these file names as examples:

Stevie Ray Vaughan — Pride and Joy.aac
Stevie Rya Vaughan — Texas Flood.aac
Steve Ray Vaughan — Couldn’t Stand the Weather.aac
Stevey Ray Vaughn — Lenny.aac
Stevie Ray Vaughn — Call It Stormy Monday.aac

As you can see, only one of these files features the artist’s correctly spelled last name.

Regular expressions feature rules that specify how to represent characters and — more importantly — character types. A regular expression processor interprets most characters literally. That is, the letter a represents itself. But there is a small number of reserved characters and other notations called character classes that each represent a specific character type, and this is where the power of a regular expression lies. One of the most useful of these character classes — especially for quick-and-dirty file renaming operations — is the humble period (.).

In a regular expression, the period represents a single instance of any character except the invisible end-of-line characters. If we consistently use a standard separator between file name components when naming files, such as the space-hyphen-space sequence used in our example file names, we can rely on this consistency, in many cases, to make our file renaming tasks much easier. Indeed, the trick to writing regular expressions for file renaming is to find and exploit consistencies in our file names. Perusing our examples, we can see that the Artist component of each file name begins with the letter s and ends with the letter n. Using this consistency, we can put together a combination of literal characters and one or more character classes to match the pattern or convention our file names adhere to. This short regular expression represents, or matches in regex parlance, the Artist component:

s.+n

The beginning s and ending n are interpreted literally, while the .+ sequence means that one or more characters should appear between them.

Let’s test this. Take a couple of minutes to create some sample files with the example file names above. These files are only for experimentation, so they don’t have to be audio files; text files will work fine for our purposes.

Load these files into A Better Finder Rename, and select the Replace Regular Expression action from the Advanced & Special category. This action presents four configuration options: Pattern, With, Ignore Case, and Change. In the Pattern field, enter the regular expression above. In the With field, enter the proper artist name:

Stevie Ray Vaughan

Tick the Ignore Case checkbox, and select The file name without the extension from the Change options.

Notice that the results we see in ABFR’s preview pane aren’t quite as expected. This expression matches our artist component, but, as written, it matches all the way through the file name to the last occurrence of the letter n. This demonstrates a regular expression’s default greedy behavior. That is, a regular expression will match as much as possible, unless we do something about it. To alter this behavior, we will use the lazy modifier — a single question mark (?) — with the addition symbol. The lazy modifier tells the regular expression processor to match the expression only up to the first occurrence of n. Essentially, it says, “be lazy, work as little as possible.”

s.+?n

Now we’re in business. The artist component features the artist’s properly spelled name across all file names.

How to Move and Transpose File Name Components

Using the Replace regular expression action, we are concerned with matching only one file name component. But when it comes to restructuring file naming conventions, we should usually match all components of the file name, each of which may feature variable information. Continuing with the example file names we’ve been using, let’s learn how move a single file name component to another location within the file name.

Regular expressions feature the ability to assign identifiers to each of our file name components, clearing the way for us to tell A Better Finder Rename exactly how to rearrange those components to form a new naming convention. To identify a file name component, we create a capture group targeting that component. Creating a capture group is a simple matter of enclosing the file name component within parentheses. Before we begin, let’s write a complete regular expression to match the entire file naming convention in use:

.+ — .+

This regular expression matches two strings of characters separated by a space-hyphen-space sequence, the pattern our file names adhere to. You will note that we aren’t using s.+?n for the Artist component. This new expression will match more than just Stevie Ray’s music; it will match all songs in our collection which use this file naming convention.

Now, to define the file name components, enclose each component within parentheses:

(.+) — (.+)

Defining a capture group automatically assigns a capture group ID to the component. Six of these IDs are available — $1 through $6 — and are assigned in order of occurrence within the regular expression, from left to right. We can then use these IDs later in A Better Finder Rename to build a new file naming convention. Let’s give it a try.

In A Better Finder Rename, select the Re-arrange using regular expressions action from the Advanced & Special category. In the Substitution field, enter the above regular expression. In the Pattern field, enter a combination of literal characters and capture group IDs to restructure our file naming convention:

$2 — $1

As you can see in the preview pane, A Better Finder Rename replaces the $2 capture group with the song title, and $1 with the artist.

Now, for further experimentation, change the chosen action to Add text to beginning, found under the Text category, to add this string to the beginning of our file names:

Song -

Click the Perform Renames button to apply changes, because we’re going to use the resulting file names for experimentation.

Song — Stevie Ray Vaughan — Pride and Joy.aac
Song — Stevie Rya Vaughan — Texas Flood.aac
Song — Steve Ray Vaughan — Couldn’t Stand the Weather.aac
Song — Stevey Ray Vaughn — Lenny.aac
Song — Stevie Ray Vaughn — Call It Stormy Monday.aac

Next, select Re-arrange using regular expressions again. In the Pattern field, enter:

(.+) — (.+) — (.+)

We now have three file name components, represented by three capture groups, all of which we can use in the Substitution field. Play around with this a little, entering different combinations of capture group IDs:

$3 — $1 — $2
$2 — $1 — $3
$1 — $3 — $2

As you can see, we can place our file name components anywhere we want. We can also add new information to the mix. Try this:

Music — $1 — $3 — $2

Literal characters entered into the Substitution field are applied to all file names in the position at which they appear in this field.

Diving Deeper into Regular Expressions

So far, we have gained a very basic understanding of how regular expressions can help address day-to-day file renaming tasks, but it’s not really enough. More often than not, a renaming task will require more specificity in a regular expression in order to be effective. Naturally, this requires knowledge of a few more regex features, including quantifiers and more about the all-important character class.

When writing a regular expression, an important consideration is the number of characters in each file name component. Quantifiers are the tools we use to leverage this consideration.

We learned about one quantifier already: the addition symbol (+). As you recall, the addition symbol matches one or more instances of the character preceding it. Other quantifiers include the asterisk (*), which matches zero or more occurrences, and the question mark (?), which matches zero or one occurrence. Like the addition symbol, both of these quatifiers accept the lazy modifier, limiting the scope of the match.

Quantifiers come in another notation style, as well. Wrapping a number in braces ({ and }) specifies a certain number of occurrences. This regular expression indicates that thirteen instances of the letter z are required for a match to occur:

z{13}

Wrapping a numerical range in braces indicates that any number of occurrences within that range may suffice for a match.

z{10,20}

Whereas we use quantifiers for character counts, we use character classes to specify the characters themselves. We are already familiar with the period as a character class, but its scope of functionality is too wide for many tasks. Additional character classes are available so we can be much more specific with regular expressions.

Character classes come in two forms. The character class \d is an example of one style, and represents a single digit, 0 through 9. Another style, known as a POSIX character class, takes this form: :alpha:. The :alpha: character class matches letters.

Armed with only these additional bits of knowledge, we can make quite a difference in the effectiveness of our regular expressions.

Let’s use an EPUB file naming convention for experimentation:

<ISBN> — <Author Name> — <Book Title> — <YearofPublication>

This time, we have four distinct file name components. The first is an International Standard Book Number. According to the ISBN User’s Manual, an ISBN consists of thirteen digits with hyphens or spaces as separators, like so:

ISBN 978–0–571–08989–5
ISBN 978 0 571 08989 5

Ignore for now that another form of ISBN — the 10-digit form — is still in use, and that some folks, like Amazon, don’t use separators as specified in the ISBN User’s Manual.

The first group of numbers will always consist of three digits. The second group of numbers will include a maximum of five digits, the third group a maximum of seven digits, and the fourth group a maximum of six digits. The final “group” will always be a single digit. Using an ISBN in this example provides an excellent opportunity to explore just how specific we can be when writing a regular expression to match a string of text. To represent the ISBN component of our file name, let’s use:

ISBN \d{3}(-|\s)\d{1,5}(-|\s)\d{1,7}(-|\s)\d{1,6}(-\s)\d

Here, we are introduced to alternative matching. The pipe character (|) is used here to indicate that either a hyphen or a space may match at this point in the regular expression. This allows us to account for both of the two standard separators used in writing an ISBN number.

Why take the time to limit our matches for each section of the ISBN? First, it presents the opportunity for greater specificity. But it’s also one way to validate the ISBN, helping to ensure that what we are working with is, indeed, an ISBN. Validation by regular expression is one of the most important tools in a developer’s toolkit, whether he’s writing code for the desktop or for the Web. This technique is used to validate usernames and passwords for Web log-ins, mailing addresses for online shopping carts, e-mail addresses for newsletter sign-ups, and many more types of standardized information.

The next component in our naming convention is the author’s name. Of course, a name can consist of virtually any number of characters, so we will just use the old stand-by .+ to match this component. The same goes for the book title. Finally, we will use the following to match the year of publication:

\d{4}

The resulting regular expression is:

ISBN \d{3}(-|\s)\d{1,5}(-|\s)\d{1,7}(-|\s)\d{1,6}(-\s)\d — .\* — .\* — \d{4}

Now that you have this know-how under your belt, you’re able to solve many more file renaming problems than you could armed only with A Better Finder Rename’s built-in renaming actions. ABFR’s value has just increased dramatically for you. But this hasn’t been an exhaustive discussion of regular expressions by any stretch of the imagination. Thick volumes have been written on the topic. If you’re interested in more thorough coverage of regular expressions, take a look at Introducing Regular Expressions by Michael Fitzgerald, or one of the many Web sites dedicated to the subject.

Show your support

Clapping shows how much you appreciated John L. Reed’s story.