Thursday, December 16, 2010

Wildcard subtleties in C- and Bourne shell

The idea for this article came when I encountered a - what I first thought strange - behaviour of a hanging nawk command and later found out that I had stumbled upon a well documented feature of C-shell which I had not been aware of.
Here is the allegedly hanging command: nawk -f e*.awk aa* where e*.awk was supposed to match the awk script examples_bynumber.awk via wildcard and aa* was a typo, there were no files of that name.

The issue came up when I was trying to use multiple wildcards in one command where only some are matching. Here are the differences by shell.

Csh

When csh sees something like a* b* it tries to expand the patterns to filenames.
If at least one can be found then a* b* is a non-empty string and the command built successfully.
If there is neither a file beginning with a or b then the expanded string is empty and csh responds with No match.

Here is a set of possible translations for an nawk command executing an awk script e.awk where there are no files beginning with aa.
nawk -f e.awk aa    -> nawk -f e.awk aa  
    nawk is executed and will report an error about 'aa'
nawk -f e.awk aa*   -> No match          
    csh reports the error since aa* is empty
nawk -f e*.awk aa   -> nawk -f e.awk aa 
    the wildcard match 'e*.awk' is successful
    and nawk is executed and will report an error about 'aa'
nawk -f e*.awk aa*  -> nawk -f e.awk
    the wildcard match 'e*.awk aa*' is successfully translated to   e.awk
    and nawk is executed waiting for stdin, thus seemingly 'hanging', this was my case
and another variation if no e*.awk scripts exist but a file aa does:
nawk -f e*.awk aa*  -> nawk -f aa
    the wildcard match 'e*.awk aa*' is successfully translated to aa 
    and nawk is trying to execute aa as an awk script

Sh, ksh

In contrast to that Bourne and Korn shell leave the wildcard param as is if it cannot be expanded and they do not report about pattern mismatches. (you can see this via truss -a).
nawk -f e.awk aa    -> nawk -f e.awk aa 
nawk -f e.awk aa*   -> nawk -f e.awk aa*

nawk -f e*.awk aa   -> nawk -f e.awk aa 
nawk -f e*.awk aa*  -> nawk -f e.awk aa*   

and again if aa exists but no e*.awk scripts:
nawk -f e*.awk aa*  -> nawk -f e*.awk aa   
i.e. nawk is always invoked and left to handle the issue of existing files.

Conclusions

Wildcard handling differs vastly by shell type (I have not tested others like bash, zsh or tcsh) and should be used with caution.
In csh if you imagine a command like some_command -f a* -m b* c* then this can lead to quite different executions depending on which files exists.
If there is no file beginning with a then the command will fail (-f being called without a parameter).
If there are 2 files beginning with b (say b1 and b2) then b1 will be passed as a parameter to -m and b2 will be part of the file list.

People who are using wildcards frequently should be aware of this possible trap and treat them more cautiously (advice to myself :-) )

3 comments:

  1. Harvard Business Review named data scientist the "sexiest job of the 21st century".This Data Science course will cover the whole data life cycle ranging from Data Acquisition and Data Storage using R-Hadoop concepts, Applying modelling through R programming using Machine learning algorithms and illustrate impeccable Data Visualization by leveraging on 'R' capabilities.With companies across industries striving to bring their research and analysis (R&A) departments up to speed, the demand for qualified data scientists is rising.

    data science training in bangalore

    ReplyDelete
  2. myTectra Amazon Web Services (AWS) certification training helps you to gain real time hands on experience on AWS. myTectra offers AWS training in Bangalore using classroom and AWS Online Training globally. AWS Training at myTectra delivered by the experienced professional who has atleast 4 years of relavent AWS experince and overall 8-15 years of IT experience. myTectra Offers AWS Training since 2013 and retained the positions of Top AWS Training Company in Bangalore and India.

    aws training in bangalore

    ReplyDelete
  3. Big Data and Hadoop is an ecosystem of open source components that fundamentally changes the way enterprises store, process, and analyze data.
    hadoop training in bangalore

    ReplyDelete