Long Arguments and Getops
Originally posted March 7, 2006
I recently had a need to adapt a script that recrawls a site with nutch. One of my design goals was to use the same command line options as the Fetchtool (one of the steps I had to take to recrawl a site).
It became apparent fairly quickly that bash’s built-in ‘getopts’ didn’t support long command line arguments, so I had to fall back on getopt.
Here is the portion of the script that parses the command line arguments:
set -- `getopt -n$0 -u -a --longoptions="depth: adddays: topN:" "h" "$@"` || usage
[ $# -eq 0 ] && usage
while [ $# -gt 0 ]
do
case "$1" in
--depth) depth=$2;shift;;
--adddays) adddays=$2;shift;;
--topN) topN=$2;shift;;
-h) usage;;
--) shift;break;;
-*) usage;;
*) break;; #better be the crawl directory
esac
shift
done
Deconstructing this bit by bit:
set -- `getopt -n$0 -u -a --longoptions="depth: adddays: topN:" "h" "$@"` || usage
[ $# -eq 0 ] && usage
‘set –’ unsets the existing postional parameters and sets them to the result of getopt.
The call to getopts works like this:
- -n$0, sets the nicename to the name of the script (so warnings come back nicely from getopts)
- -a, allows long arguments to start with a singe ‘-’ (they ususally have two (‘–’)
- –longoptions=”depth: adddays: topN:”, sets the format of the long options. In this case I have 3 (depth, adddays, and topN). The
trailing colon indicates I am expecting an additional argument. - “h”, the short options (-h)
- “$@”, the arguments passed into the script
The ‘||’ at the end and the second line will call my usage statement if an error comes back from getopt (a non-0 return code). The next line make sure we get at least one argument back.
To help understand what goes on next, lets run that command at the shell:
$ getopt -nrecrawl.sh -u -a --longoptions="depth: adddays: topN:" "h" -depth 5 -adddays 10 -topN 3 -h -x
recrawl.sh: unrecognized option `-x' --depth 5 --adddays 10 --topN 3 -h --
A few items of note. The first is the warning message we get because ‘x’ is an unknown option (notice it is prefaced by what we supplied to the -n argument). The second is the result of the getopt operation on my command line parameters.
Now to interpret the results:
while [ $# -gt 0 ]
do
case "$1" in
--depth) depth=$2;shift;;
...
-h) usage;;
--) shift;break;;
-*) usage;;
*) break;; #better be the crawl directory
esac
shift
done
The while loop loops through each of the arguments return from getopt. If an argument requires an additional value, I use $2 to snag that value and assign it to a variable. When we are done with an argument, we shift passed it and move on to the next one. A few special cases exist:
- -*) matches any unkown option and prints the usage statement
- *) matchs any other argument (in this case it is our required directory)
- –) is the end marker from getopt
The script then goes on to verify the parameters (like the directory exists) and does the crawl….but that is for another day.
Have a web app, mobile app, or piece of custom software you need designed and developed? Drop us a line