Thursday, December 16, 2010

Wildcard subtleties in C- and Bourne shell

The idea for this article came when I encountered a - what I first thought strange - behaviour of a hanging nawk command and later found out that I had stumbled upon a well documented feature of C-shell which I had not been aware of.
Here is the allegedly hanging command: nawk -f e*.awk aa* where e*.awk was supposed to match the awk script examples_bynumber.awk via wildcard and aa* was a typo, there were no files of that name.

The issue came up when I was trying to use multiple wildcards in one command where only some are matching. Here are the differences by shell.


When csh sees something like a* b* it tries to expand the patterns to filenames.
If at least one can be found then a* b* is a non-empty string and the command built successfully.
If there is neither a file beginning with a or b then the expanded string is empty and csh responds with No match.

Here is a set of possible translations for an nawk command executing an awk script e.awk where there are no files beginning with aa.
nawk -f e.awk aa    -> nawk -f e.awk aa  
    nawk is executed and will report an error about 'aa'
nawk -f e.awk aa*   -> No match          
    csh reports the error since aa* is empty
nawk -f e*.awk aa   -> nawk -f e.awk aa 
    the wildcard match 'e*.awk' is successful
    and nawk is executed and will report an error about 'aa'
nawk -f e*.awk aa*  -> nawk -f e.awk
    the wildcard match 'e*.awk aa*' is successfully translated to   e.awk
    and nawk is executed waiting for stdin, thus seemingly 'hanging', this was my case
and another variation if no e*.awk scripts exist but a file aa does:
nawk -f e*.awk aa*  -> nawk -f aa
    the wildcard match 'e*.awk aa*' is successfully translated to aa 
    and nawk is trying to execute aa as an awk script

Sh, ksh

In contrast to that Bourne and Korn shell leave the wildcard param as is if it cannot be expanded and they do not report about pattern mismatches. (you can see this via truss -a).
nawk -f e.awk aa    -> nawk -f e.awk aa 
nawk -f e.awk aa*   -> nawk -f e.awk aa*

nawk -f e*.awk aa   -> nawk -f e.awk aa 
nawk -f e*.awk aa*  -> nawk -f e.awk aa*   

and again if aa exists but no e*.awk scripts:
nawk -f e*.awk aa*  -> nawk -f e*.awk aa   
i.e. nawk is always invoked and left to handle the issue of existing files.


Wildcard handling differs vastly by shell type (I have not tested others like bash, zsh or tcsh) and should be used with caution.
In csh if you imagine a command like some_command -f a* -m b* c* then this can lead to quite different executions depending on which files exists.
If there is no file beginning with a then the command will fail (-f being called without a parameter).
If there are 2 files beginning with b (say b1 and b2) then b1 will be passed as a parameter to -m and b2 will be part of the file list.

People who are using wildcards frequently should be aware of this possible trap and treat them more cautiously (advice to myself :-) )


