Good CLI Design & Implementation

Paul J. Lucas
13 min readJun 18, 2023

--

Introduction

Graphical User Interfaces (GUIs) or REST APIs get all the attention, but, for command-line tools, Command-Line Interfaces (CLIs) are just as important, but often neglected. Hence, this article is going to show how to design and implement a good CLI in C.

Typically at a bare minimum, a CLI has to:

  1. Parse and validate command-line options and their arguments (if any).
  2. Parse and validate command-line arguments (if any).
  3. Give good error messages.
  4. Give good help.

There are actually several styles of command-line options. Of these, I personally recommend using the GNU standard due to the ubiquity of GNU command-line tools. To parse GNU-style command-line options, use getopt_long() declared in getopt.h. Unfortunately, getopt_long() has a number of quirks. I’ll mention these and show how to work around them.

Command-Line Options

The getopt_long() function requires lists of both long and short options. For example:

static struct option const OPTIONS_LONG[] = {
{ "help", no_argument, NULL, 'h' },
{ "output", required_argument, NULL, 'o' },
{ "version", no_argument, NULL, 'v' },
{ NULL, 0, NULL, 0 }
};
static char const OPTIONS_SHORT[] = ":ho:v";

For OPTIONS_LONG, I find the flag (third) field vestigial since it’s useful only for int (or int-as-bool) option values, so I recommend always setting it to NULL. In that case, the val (fourth) field is what is returned by getopt_long() when a given long option has been encountered. It’s most straight forward to make val the short option synonym since every long option should have a short option synonym whenever possible.

In the rare case when no short option synonym is possible for a particular long option (because the desired short option is already being used as a synonym for a different long option), then you can specify a non-null flagto know when the particular long option has been encountered. In such cases, getopt_long() returns 0.

According to the GNU standard, all programs should accept the --help and --version options.

OPTIONS_SHORT specifies the short options in a way that’s backwards compatible with the POSIX getopt(). Of note:

  • A leading : makes getopt_long() return a : when a required argument for an option is missing.
  • An option letter followed by : means that option requires an argument. (A :: means that option allows an optional argument.)

However, if every long option has a short option synonym, then having to specify OPTIONS_SHORT separately is both redundant (since all option information is contained in OPTIONS_LONG) and error-prone (since you might update OPTIONS_LONG but forget to update OPTIONS_SHORT to match). To address both of these issues, we can write a function to create a short option string from a long option array:

static char const* make_short_opts( struct option const opts[static const 2] ) {
// pre-flight to calculate string length
size_t len = 1; // for leading ':'
for ( struct option const *opt = opts; opt->name != NULL; ++opt )
len += 1 + (unsigned)opt->has_arg;

char *const short_opts = malloc( len + 1/*\0*/ );
char *s = short_opts;
*s++ = ':'; // return missing argument as ':'
for ( struct option const *opt = opts; opt->name != NULL; ++opt ) {
*s++ = (char)opt->val;
switch ( opt->has_arg ) {
case optional_argument:
*s++ = ':';
// no break;
case required_argument:
*s++ = ':';
} // switch
} // for
*s = '\0';
return short_opts;
}

If you don’t know what the static const 2 in the declaration of the optsparameter means, read this.

For this simple example, we only need to declare one global variable for the output option:

char const *opt_output = "-";

(The --help and --version options can be handled internally.)

Parsing Command-Line Options

Here is the start of a parse_options() function:

static void parse_options( int *pargc, char const **pargv[const] ) {
opterr = 0; // suppress default error message
int opt;
bool opt_help = false;
bool opt_version = false;
char const *const options_short = make_short_opts( OPTIONS_LONG );
// ...

The function takes pointers to argc and argv because we want to adjust argc to be the number of non-option command-line arguments and adjust argv such that argv[0] points at the first non-option argument (if any).

First, we set the global opterr = 0 to suppress default error messages given by getopt_long() so we can print error messages in exactly the format we want.

Next, we declare a few variables including opt_help and opt_version. These option variables are declared locally because we can handle those entirely within parse_options(), so there’s no need to make them global.

Next, we call getopt_long() in a loop until it returns one of -1 (for “no more options”), : (for a missing required argument), or ? (for an invalid option):

  // ...
for (;;) {
opt = getopt_long(
pargc, pargv, options_short, OPTIONS_LONG, /*longindex=*/NULL
);
if ( opt == -1 )
break;
switch ( opt ) {
case 'h':
opt_help = true;
break;
case 'o':
if ( SKIP_WS( optarg )[0] == '\0' )
goto missing_arg;
opt_output = optarg;
break;
case 'v':
opt_version = true;
break;
case ':':
goto missing_arg;
case '?':
goto invalid_opt;
} // switch
} // for
// ...

For options that take arguments, the argument value it stored in a global variable optarg by getopt_long(). You must copy (shallow is fine) it to your option variable since its value will change on every loop iteration. However, getopt_long() considers options like the following:

example --output=     # optarg will be "" (empty string)
example --output=" "

to have a present — but either an empty or all-whitespace — argument. In most cases, we want to treat this the same as a missing argument. SKIP_WS() is a macro that skips any leading whitespace in a string:

#define SKIP_WS(S)  ((S) += strspn( (S), " \n\t\r\f\v" ))

Once skipped, the first character of (the updated) optarg can be checked: if it’s the null character, the argument is effectively missing.

Note that we could handle the --help and --version options “inline” in their respective cases. However, we don’t because all options should be parsed first, then handled. If such options were handled “inline,” then we wouldn’t catch usage errors like:

example --version arg

(The --help and --version options may only be given by themselves. We check for this later.)

After all options have been parsed, we can free options_short and adjust argc and argv by optind (a global variable maintained by getopts_long()that contains the number of options parsed):

  // ...
free( (void*)options_short );
*pargc -= optind;
*pargv += optind;
// ...

The --help and --version Options

Next, we handle the --help and --version options:

  // ...
if ( opt_help )
usage( *pargc > 0 ? EX_USAGE : EX_OK );
if ( opt_version ) {
if ( *pargc > 0 )
usage( EX_USAGE );
version();
}
return;
// ...

The EX_ symbols are preferred exit status codes declared in sysexits.h. You should use those whenever possible.

Printing the Usage Message

One problem with the option struct is that there’s no member for a description. Instead, we can store the description in another array:

static char const *const OPTIONS_HELP[] = {
[ 'h' ] = "Print help and exit",
[ 'o' ] = "Write to file [default: stdout]",
[ 'v' ] = "Print version and exit",
};

It’s an array of char const* (strings) indexed by short option characters (one pointer for each ASCII character) initialized via the array designator syntax.

It’s a tiny bit wasteful due to the NULL “holes” in the array, but, in the grand scheme of things, it’s nothing.

Given that, we can write a usage() function that prints the command-line usage message by iterating over the OPTIONS_LONG array and looking up each option’s help in OPTIONS_HELP. But first, iterate over OPTIONS_LONG to find the longest option’s length so we can make everything line up:

_Noreturn static void usage( int status ) {
// pre-flight to calculate longest long option length
size_t longest_opt_len = 0;
for ( struct option const *opt = OPTIONS_LONG;
opt->name != NULL; ++opt ) {
size_t opt_len = strlen( opt->name );
switch ( opt->has_arg ) {
case no_argument:
break;
case optional_argument:
opt_len += STRLITLEN( "[=ARG]" );
break;
case required_argument:
opt_len += STRLITLEN( "=ARG" );
break;
} // switch
if ( opt_len > longest_opt_len )
longest_opt_len = opt_len;
} // for

FILE *const fout = status == EX_OK ? stdout : stderr;
fprintf( fout, "usage: %s [options] ...\noptions:\n", prog_name );

for ( struct option const *opt = OPTIONS_LONG;
opt->name != NULL; ++opt ) {
fprintf( fout, " --%s", opt->name );
size_t opt_len = strlen( opt->name );
switch ( opt->has_arg ) {
case no_argument:
break;
case optional_argument:
opt_len += (size_t)fprintf( fout, "[=ARG]" );
break;
case required_argument:
opt_len += (size_t)fprintf( fout, "=ARG" );
break;
} // switch
assert( opt_len <= longest_opt_len );
fprintf( fout,
"%*s (-%c) %s.\n",
(int)(longest_opt_len - opt_len), "",
opt->val, OPTIONS_HELP[ opt->val ]
);
} // for

exit( status );
}

For the definition of STRLITLEN(), see here.

The global variable prog_name contains the program’s name we’ll set in main().

The usage() function takes an exit status for two reasons:

  1. If the usage message is being printed by request via --help, then it should print to standard output (because no error has occurred). However, if it’s being printed because of a usage error, then it should print to standard error.
  2. So it can call exit() with that status (that we might as well do since it was passed for the first reason).

The final fprintf():

    fprintf( fout,
"%*s (-%c) %s.\n",
(int)(longest_opt_len - opt_len), "",
opt->val, OPTIONS_HELP[ opt->val ]
);

uses the %*s formatting directive that means: print a string (s) in a field whose width (*) is given by the next int argument. In this case, that int argument is the difference in length between the longest option length and the current option length. Printing nothing ("") will print that difference as the number of spaces we need to to line up the remaining output.

The call:

usage( *pargc > 0 ? EX_USAGE : EX_OK );

checks to see whether there are any command-line arguments: if so, it’s a usage error since the --help option may only be given by itself.

The version() function simply prints the program name and version, then exits:

_Noreturn static void version( void ) {
puts( PACKAGE_NAME " " PACKAGE_VERSION );
exit( EX_OK );
}

where PACKAGE_NAME and PACKAGE_VERSION are defined elsewhere, something like:

#define PACKAGE_NAME     "example"
#define PACKAGE_VERSION "1.0"

But before we call version(), we do the same check to see whether there any command-line arguments:

    if ( *pargc > 0 )
usage( EX_USAGE );
version();

If so, it’s a usage error.

Invalid Options

For invalid options:

// ...
invalid_opt:
(void)0; // needed before C23
char const *invalid_opt = (*pargv)[ optind - 1 ];
if ( invalid_opt != NULL && strncmp( invalid_opt, "--", 2 ) == 0 )
fprintf( stderr, "\"%s\": invalid option", invalid_opt + 2 );
else
fprintf( stderr, "'%c' invalid option", (char)optopt );
fputs( "; use --help or -h for help\n", stderr );

Unfortunately, getopt_long()’s error-handling is poor. When getopt_long()returns ? to indicate an invalid option, we have to determine whether it was an invalid short or long option:

  • If it was an invalid short option, getopt_long() will set the global variable optopt to it.
  • However, if it was an invalid long option, getopt_long() doesn’t directly tell you what that long option was.

We have to inspect (*pargv)[optind-1], the command-line argument it was processing at the time: if it starts with --, it’s the invalid long option; otherwise optopt is the invalid short option.

Missing Required Arguments

For options with required but missing arguments, we print an error message:

missing_arg:
fatal_error( EX_USAGE,
"\"%s\" requires an argument\n",
opt_format( (char)(opt == ':' ? optopt : opt) )
);
} // end of parse_options()

However, this code is executed in two cases:

  1. getopt_long() returned : to indicate a required argument was missing. In this case, optopt contains the option missing its argument.
  2. getopt_long() returned the option and its argument, but, upon further checking, we discovered that the argument was either the empty string or all whitespace. In this case, opt contains the option having said argument.

The function fatal_error() is a convenience variadic function that prints and error message (preceded by the program’s name) and exits with the given status code:

_Noreturn void fatal_error( int status, char const *format, ... ) {
fprintf( stderr, "%s: ", prog_name );
va_list args;
va_start( args, format );
vfprintf( stderr, format, args );
va_end( args );
exit( status );
}

The function opt_format() formats an option in both its long (if it exists) and short form, e.g. --help/-h, for use in an error message:

#define OPT_BUF_SIZE  32  /* enough for longest long option */

char const* opt_format( char short_opt ) {
static char bufs[ 2 ][ OPT_BUF_SIZE ];
static unsigned buf_index;
char *const buf = bufs[ buf_index++ % 2 ];
char const *const long_opt = opt_get_long( short_opt );
snprintf(
buf, OPT_BUF_SIZE, "%s%s%s-%c",
long_opt[0] != '\0' ? "--" : "", long_opt,
long_opt[0] != '\0' ? "/" : "", short_opt
);
return buf;
}

The function uses two internal buffers so that opt_format() can be called twice in the same printf(). (This will become handy later.)

The function opt_get_long(), given a short option, gets its corresponding long option, if any:

static char const* opt_get_long( char short_opt ) {
for ( struct option const *opt = OPTIONS_LONG; opt->name != NULL; ++opt ) {
if ( opt->val == short_opt )
return opt->name;
} // for
return "";
}

Calling parse_options()

Finally, this is how parse_options() would be called:

char const *prog_name;

int main( int argc, char const *argv[] ) {
prog_name = argv[0];
parse_options( &argc, &argv );
// ...
}

After parse_options() returns, argc will contain the number of remaining non-option arguments and, if any, argv[0] will be the first such option. (Note that this differs from the canonical value of argv[0] that is initially the executable’s path.)

Option Exclusivity

The code presented so far doesn’t handle the case where certain options may be given only by themselves (e.g., you shouldn’t be allowed to give --help and --version with any other option). That can be implemented by adding a global array to keep track of which options have been given:

static _Bool opts_given[128];  // options that were given

setting it for each option returned by getopt_long():

      // ...
case '?':
goto invalid_opt;
} // switch
opts_given[ opt ] = true; // <-- new line
} // for

writing a function to check for exclusivity:

static void opt_check_exclusive( char opt ) {
if ( !opts_given[ (unsigned)opt ] )
return;
for ( size_t i = '0'; i < ARRAY_SIZE( opts_given ); ++i ) {
char const curr_opt = (char)i;
if ( curr_opt == opt )
continue;
if ( opts_given[ (unsigned)curr_opt ] ) {
fatal_error( EX_USAGE,
"%s can be given only by itself\n",
opt_format( opt )
);
}
} // for
}

and calling the function after processing all options:

  // ...
*pargc -= optind;
*pargv += optind;

opt_check_exclusive( 'h' );
opt_check_exclusive( 'v' );
// ...

Option Mutual Exclusivity

In many programs, there are some options that may not be given with some other options. For example, if a program has options --json/-j and --xml/-x to specify output formats, those options can’t be given simultaneously. It’s good to check for such cases rather than letting the last option specified “win.” A function to check for mutual exclusivity is:

static void opt_check_mutually_exclusive( char opt, char const *opts ) {
if ( !opts_given[ (unsigned)opt ] )
return;
for ( ; *opts != '\0'; ++opts ) {
assert( *opts != opt );
if ( opts_given[ (unsigned)*opts ] ) {
fatal_error( EX_USAGE,
"%s and %s are mutually exclusive\n",
opt_format( opt ),
opt_format( *opts )
);
}
} // for
}

where opt is the short option that, if given, then none of the short options in opts can also be given. (This is the aforementioned case when opt_format() using two internal buffers becomes handy since it can be called twice in the same statement as it is here.)

Calling the function would be like:

opt_check_mutually_exclusive( 'j', "x" );

Other Option Checks

Of course it’s possible for some programs to have more complicated option relationships, e.g., if -x is given, then -y must be also. If your program has such relationships, you should check for them. Writing such a function using opts_given is fairly straightforward, but left as an exercise for the reader.

Eliminating Short Option Redundancy

Every short option has to be specified four times:

  1. In a val field of a struct option array.
  2. In a case.
  3. In calls to either opt_check_exclusive() or opt_check_mutually_exclusive().
  4. In the OPTIONS_HELP array.

If you ever decide to change a short option, you have to update it in four places. It would be better to define every short option once, then use that definition everywhere. For this sample program, we can do:

#define OPT_HELP     h
#define OPT_JSON j
#define OPT_OUTPUT o
#define OPT_VERSION v
#define OPT_XML x

But, in order to use these definitions, they have to be either “stringified” or “charified” depending on the use. Stringification is easier since the C preprocessor supports it directly:

#define STRINGIFY_HELPER(X)  #X
#define STRINGIFY(X) STRINGIFY_HELPER(X)

#define SOPT(X) STRINGIFY(OPT_##X)

SOPT(FOO) means “stringify the FOO option.” For example, SOPT(HELP) will expand into "h".

However, for the val field, in a case, and in OPTIONS_HELP, we need short options as characters. Unfortunately, the C preprocessor doesn’t support “charification” — not directly, anyway. It’s possible to implement with the caveat that only characters that are valid in identifiers can be charified, that is [A-Za-z_0-9]. To start, define one macro per identifier character:

#define CHARIFY_0 '0'
#define CHARIFY_1 '1'
#define CHARIFY_2 '2'
// ...
#define CHARIFY_A 'A'
#define CHARIFY_B 'B'
#define CHARIFY_C 'C'
// ...
#define CHARIFY__ '_'
#define CHARIFY_a 'a'
#define CHARIFY_b 'b'
#define CHARIFY_c 'c'
// ...
#define CHARIFY_z 'z'

Then:

#define NAME2_HELPER(A,B)    A##B
#define NAME2(A,B) NAME2_HELPER(A,B)

#define CHARIFY(X) NAME2(CHARIFY_,X)
#define COPT(X) CHARIFY(OPT_##X)

COPT(FOO) means “charify the FOO option.” For example, COPT(HELP) will expand into 'h'. This can then be used in the option array:

  // ...
{ "help", no_argument, NULL, COPT(HELP) },
{ "output", required_argument, NULL, COPT(OUTPUT) },
{ "version", no_argument, NULL, COPT(VERSION) },
// ...

and in OPTIONS_HELP:

static char const *const OPTIONS_HELP[] = {
[ COPT(HELP) ] = "Print help and exit",
[ COPT(OUTPUT) ] = "Write to file [default: stdout]",
[ COPT(VERSION) ] = "Print version and exit",
};

and in cases:

      // ...
case COPT(HELP):
opt_help = true;
break;
// ...

and in calls to opt_check_exclusive():

  opt_check_exclusive( COPT(HELP) );
opt_check_exclusive( COPT(VERSION) );

and in calls to opt_check_mutually_exclusive():

opt_check_mutually_exclusive( COPT(JSON), SOPT(XML) );

As a bonus, it makes the code a lot more readable.

Conclusion

CLIs should be just as robust as REST APIs. Ultimately, good CLIs make for a better user experience and can prevent unanticipated option combinations that can lead to bugs.

--

--

Paul J. Lucas

C++ Jedi Master. I am NOT available for advice, consultation, recommendations, nor individual training. No, I don't want to write for your Medium publication.