Member preview

Think You Understand Wildcards? Think Again.


Wildcards are a powerful tool beloved of macOS & Linux command line users. Many developers and power users have a passing familiarity with them, but it turns out most people don’t really understand how they actually work. This article explains why.


What is a Wildcard?

Many years ago I remember a tutor in an introductory CS lecture telling the whole class that wildcards were nothing to do with cards. Strictly speaking, he was right, but the name is a good analogy because in some card games the Joker or “wild card” can stand in for any other card in the pack.

Similarly, in the command shell, a wildcard is a short textual pattern, often only a single character, that can match another character (or characters) in a file path. It’s kind of a shortcut that allows you to specify a whole set of related path names using a single, concise pattern.

With a wildcard pattern you can easily copy, move or delete large numbers of files with a single command.

Single-Character Wildcard: ?

The easiest wildcard to grasp is the single ? character.

Suppose you have a directory with a lot of similarly-named files in it:-

$ ls -1
index.txt
report.txt
report1.txt
report2.txt
report3.txt
report4.txt
report5.txt

Now what if you want to list just the report files? You can use the single wildcard character ? to specify them all like this:-

$ ls report?.txt
report1.txt report2.txt report3.txt report4.txt report5.txt

That looks like a good start. The shell has treated the ? character as a special character that means “match any character”. The pattern matches filenames that are composed of the string report followed by another single character, followed by the string .txt. It has then found all the files in the current directory than match this pattern and listed them using ls.

But you may notice the report.txt file is missing because it didn’t match the strict single-character wildcard pattern. Perhaps you meant to do that; perhaps not.

Multi-Character Wildcard: *

So the ? pattern is great when you need to match just a single character, but that’s quite rare, and in reality it’s not used all that often. To see why, imagine what would happen when the number of report files in the example directory reached 10: the filter would no longer match because it only matches a single character, and now the number is two digits long.

Thankfully there’s another way: the * (asterisk) wildcard. This is the workhorse of wildcards, and is the one most people know about. A version of it exists in Windows, too, but it’s nowhere near as powerful. Let’s see why…

First of all, let’s pretend some more reports have arrived by adding some empty files. The person adding these files started at 09 and decided a two-digit format was necessary:-

$ touch report09.txt report10.txt report11.txt
$ ls -1
index.txt
report.txt
report09.txt
report1.txt
report10.txt
report11.txt
report2.txt
report3.txt
report4.txt
report5.txt

Now the single ? wildcard is clearly not going to work anymore. But filtering the report files is easy with the asterisk wildcard:-

$ ls report*.txt
report.txt report1.txt report11.txt report3.txt report5.txt
report09.txt report10.txt report2.txt report4.txt

In fact, it’s easier than that:-

$ ls report*
report.txt report1.txt report11.txt report3.txt report5.txt
report09.txt report10.txt report2.txt report4.txt

Or even:-

$ ls r*
report.txt report1.txt report11.txt report3.txt report5.txt
report09.txt report10.txt report2.txt report4.txt

The magic thing about the asterisk is it tries to match any number of characters (including none). So in the final example above, anything that starts with the letter ris enough to match – the asterisk matches all the remaining characters. Of course, a non-report file called raptor-999 would also match, so it’s important when using the asterisk wildcard to ask: what is the minimum pattern that would uniquely identify what I’m looking for?

You can also use * on its own to match all files in a directory:-

$ ls * 
index.txt report09.txt report10.txt report2.txt report4.txt
report.txt report1.txt report11.txt report3.txt report5.txt

Doing Useful Stuff With Wildcards

That’s all very well, you might be thinking, but I can easily view files in Finder (or whatever GUI tool you use). How is this going to make me more productive?

Well this is where wildcards get interesting. Say you needed to make a copy of just the report files. First create a new directory for them:-

$ mkdir saved

Then use the wildcard pattern r* as an argument to cp as follows:-

$ cp r* saved

Now all the report files have been copied to the saved directory. You can confirm that using ls:-

$ ls saved
report.txt report1.txt report11.txt report3.txt report5.txt
report09.txt report10.txt report2.txt report4.txt

You can equally easily move or delete files this way, by passing a wildcard to mv, rm, etc. In fact, you can pass wildcards to any shell command that accepts multiple file or path names.

Why People Don’t Really Understand Wildcards

Wildcards are an incredibly powerful tool, but they’re often misunderstood. On first meeting wildcards, many people — including some experienced developers — think the wildcard is somehow sent to the command, such as ls or cp, which then looks at the available files and filters the results through the wildcard. Although that is precisely what happens on other operating systems (such as Windows), in Unix-based systems that is absolutely not what happens.

This can be demonstrated using the lowly echo command. echo simply takes the arguments you pass to it and sends them to the Terminal output:-

$ echo Hello world
Hello world

But if you pass echo a wildcard, something else happens:-

$ echo r*
report.txt report1.txt report11.txt report3.txt report5.txt
report09.txt report10.txt report2.txt report4.txt

So, what’s going on here? At first glance, many people are surprised by this. Why is echo looking at the file system? To understand, it’s crucial to remember that the wildcard is interpreted by the shell itself, before it is sent to the command.

The shell (not the command) expands the wildcard pattern to match as many files as it can, either from the current working directory or from one specified in the path. So it would expand the above command to be:-

$ echo report.txt report1.txt report11.txt report3.txt report5.txt
report09.txt report10.txt report2.txt report4.txt

Now the output makes sense. echo is just echoing the filenames it received from the shell. Because this crucial step is never seen by the user, the exact behaviour of wildcard expansion is often misunderstood.

Match Any File: the Lone *

Lots of people know that * can be used on its own to match any file. It’s typically used when copying or deleting entire directories. But once you understand wildcard expansion you can see it’s a bit subtler than that.

Think about the lowly ls command. Using the asterisk wildcard has pretty much the same effect as running the recursive form, ls -R:-

$ ls *
index.txt report09.txt report10.txt report2.txt report4.txt
report.txt report1.txt report11.txt report3.txt report5.txt
saved:
report.txt report1.txt report3.txt report5.txt
report09.txt report2.txt report4.txt

The reason for this is that the shell expands the wildcard out to the command:-

$ ls index.txt report.txt report09.txt report1.txt report10.txt report11.txt report2.txt report3.txt report4.txt report5.txt saved

ls then processes the list of file and directory names it receives from the shell, and proceeds to list all files as well as the contents of the saved directory.

What If the Wildcard Doesn’t Match?

Sometimes the wildcard pattern matches nothing. The way this is handled by the shell can be another source of confusion.

Suppose you wanted to list all the files with the extension .csv in a directory where there were none:-

$ ls *.csv
ls: *.csv: No such file or directory

A casual user might read the error message and not give it a second thought. But look carefully: the message came from the ls command. And it claims to have been looking for a file called *.csv. How can that be? The shell should have expanded the wildcard, so how is ls seeing the * character?

The answer might surprise you: if a wildcard fails to match, it is passed unmodified as a literal text string to the command. So yes, the ls command did indeed receive the argument *.csv.

Think about the implications of this. Since ls has no knowledge of wildcards, it would consider *.csv as a valid filename. It suggests that you can in fact have a file called *.csv.

And you can. There is no restriction in the Unix file system preventing use of symbolic characters in filenames. By using quotes to strip the wildcard of any special significance to the shell, you can happily wreak havoc and mayhem by creating files like this:-

$ touch '*.csv'
$ ls *.csv
*.csv

Yes, you just created a file called *.csv. Please, don’t do this!

Wildcard Gotchas For Former Windows Users

If you are used to using the * wildcard on Windows, you may encounter some subtle and surprising differences when using Mac or Linux. One of the more common differences is when trying to copy or rename a set of files. Consider the following example:-

$ mv report*.txt report*.csv

Here the intention is fairly obvious: to find all files with the .txt extension in the current directory and rename them with a .csv extension. In Windows, this works as intended, but in Unix shells it will produce an error:-

usage: mv [-f | -i | -n] [-v] source target
mv [-f | -i | -n] [-v] source … directory

Someone familiar with the Windows way of handling wildcards (where it is done by the command) might struggle to see why this doesn’t work, and reach the conclusion that the Unix command line is in some way broken.

But it makes sense once you understand that the shell expands the command line to the following:-

$ mv report.txt report09.txt report1.txt report2.txt report3.txt report4.txt report5.txt report*.csv

I’ve emphasised the final entry in the list. Recall from the previous section that an unmatched wildcard will expand to its literal text? That’s precisely what’s happened here, and the mv command has choked on the final report*.csv argument, unable to interpret it as a valid directory (which would be the only option given the preceding list of filename arguments).

Several people have developed utilities and command-line hacks to perform this operation on Unix command lines. Most of them involve a short shell script to iterate through the files and rename them individually. But it remains a somewhat tricky problem to solve, and illustrates that the Unix shell, while powerful, is far from perfect.

Using echo to Preview Wildcard Expansions

Using echo is a very good (and safe) way of checking a wildcard expansion before sending it to any potentially destructive commands such as mv or rm.

It’s particularly useful when using wildcards in paths. Say you were going to perform some destructive action on your saved directory, maybe deleting copies of reports numbered 10 and higher. You might hastily decide that this would work:-

$ rm saved/report1*

Here the directory path is specified, so the wildcard expansion will be applied to files in the saved directory rather than the current working directory. When dealing with wildcards in paths there is always more scope to make a simple mistake. But if you were cautious you could “preview” the wildcard expansion using echo:-

$ echo saved/report1*
saved/report1.txt saved/report10.txt saved/report11.txt

Oops! You almost deleted report1.txt there by accident. You need to refine the wildcard pattern:-

$ echo saved/report1?.txt
saved/report10.txt saved/report11.txt

Our old friend ? came to the rescue. Now you can see from the output that the wildcard pattern does what it was supposed to — delete the reports numbered 10 and higher — so it’s safe to use:-

$ rm saved/report1?.txt

Summary

This article discussed single character and multi-character wildcards. The ? wildcard matches as single character and the * wildcard matches any number of characters, including zero.

Crucially, you learnt that Unix wildcards are not passed to the final command but are handled entirely by the shell and expanded out by matching the pattern against the files in the current (or specified) directory.

When a wildcard expansion has no matches, the wildcard is passed on as literal text from the shell to the command. This can lead to unexpected results, especially for users coming from a Windows environment.

To alleviate these potential problems, you can use the echo command as a kind of preview to see how a wildcard pattern will expand. This technique is a useful way of testing a wildcard before using it in a destructive operation.

This article is a heavily revised copy of one originally published at appcodelabs.com on August 12, 2018.