Long Arguments and Getops

Originally posted March 7, 2006

Mission Data
Mission Data Journal
3 min readApr 20, 2016

--

I recently had a need to adapt a script that recrawls a site with nutch. One of my design goals was to use the same command line options as the Fetchtool (one of the steps I had to take to recrawl a site).

It became apparent fairly quickly that bash’s built-in ‘getopts’ didn’t support long command line arguments, so I had to fall back on getopt.

Here is the portion of the script that parses the command line arguments:

 set -- `getopt -n$0 -u -a --longoptions="depth: adddays: topN:" "h" "$@"` || usage
[ $# -eq 0 ] && usage

while [ $# -gt 0 ]
do
case "$1" in
--depth) depth=$2;shift;;
--adddays) adddays=$2;shift;;
--topN) topN=$2;shift;;
-h) usage;;
--) shift;break;;
-*) usage;;
*) break;; #better be the crawl directory
esac
shift
done

Deconstructing this bit by bit:

 set -- `getopt -n$0 -u -a --longoptions="depth: adddays: topN:" "h" "$@"` || usage
[ $# -eq 0 ] && usage

‘set –’ unsets the existing postional parameters and sets them to the result of getopt.
The call to getopts works like this:

  • -n$0, sets the nicename to the name of the script (so warnings come back nicely from getopts)
  • -a, allows long arguments to start with a singe ‘-’ (they ususally have two (‘–’)
  • –longoptions=”depth: adddays: topN:”, sets the format of the long options. In this case I have 3 (depth, adddays, and topN). The
    trailing colon indicates I am expecting an additional argument.
  • “h”, the short options (-h)
  • “$@”, the arguments passed into the script

The ‘||’ at the end and the second line will call my usage statement if an error comes back from getopt (a non-0 return code). The next line make sure we get at least one argument back.

To help understand what goes on next, lets run that command at the shell:

  $ getopt -nrecrawl.sh -u -a --longoptions="depth: adddays: topN:" "h" -depth 5 -adddays 10 -topN 3 -h -x 
recrawl.sh: unrecognized option `-x'
--depth 5 --adddays 10 --topN 3 -h --

A few items of note. The first is the warning message we get because ‘x’ is an unknown option (notice it is prefaced by what we supplied to the -n argument). The second is the result of the getopt operation on my command line parameters.

Now to interpret the results:

  while [ $# -gt 0 ]
do
case "$1" in
--depth) depth=$2;shift;;
...
-h) usage;;
--) shift;break;;
-*) usage;;
*) break;; #better be the crawl directory
esac
shift
done

The while loop loops through each of the arguments return from getopt. If an argument requires an additional value, I use $2 to snag that value and assign it to a variable. When we are done with an argument, we shift passed it and move on to the next one. A few special cases exist:

  • -*) matches any unkown option and prints the usage statement
  • *) matchs any other argument (in this case it is our required directory)
  • –) is the end marker from getopt

The script then goes on to verify the parameters (like the directory exists) and does the crawl….but that is for another day.

Have a web app, mobile app, or piece of custom software you need designed and developed? Drop us a line

--

--

Mission Data
Mission Data Journal

We build smart digital products that transform the way companies do business.