Thursday, April 4, 2013

A general approach to command line switches and their default values in Perl

In the UNIX world you'll rarely find a program which doesn't support a few or many arguments (or command line parameters) which influence the execution of the program.

When Perl programs require arguments (one of the simplest cases: an input filename) one could investigate the ARGV hash (an approach which works well in easy cases) or one could turn to one of the Perl modules, in particular if the arguments are command line switches.

In this article I will discuss a few types of command line switches and the possible logic behind.

What is a command line switch?

Just to recap: a command line switch is traditionally denoted as a hyphen followed by a letter optionally followed by a value e.g.  -d or  -d 25 . Note the space between the switch and its value. Some programs require this space whereas other require the value to be attached to the switch like -d25 and still others allow both. Some programs allow switches to be concatenated like  -ltr instead of  -l -t -r . Others allow switches to be more than one letter. Some programs allow a switch to appear multiple times like -v in awk.
Further complexities exist: one switch might override others. Some switches exclude each other mutually.
All these cases would need to be handled properly.

On top of that (I think it was) the GNU world introduced double hyphen switches with (usually) string switches e.g.  --verbose .

In the remainder of this article I will only use the simple case of single letter switches with or without argument. I will be using Getopt::Std, one of the core Perl modules and its function getopts. Its basic usage is  getopts('ab:',\%opts); for two switches  -a and  -b foo.

Various types of command line switches with or without default values

The typical distinction between command line switches is whether they are boolean (switched on or off) or carry an additional argument. Then there is the question: if a command line switch is absent should there be a default value used in the program?

The following table explains the differences and shows a few examples.

switchExampleDefaultNotes
 -a 
...falseA boolean switch by its very nature has a default value true or false which should be the opposite of what the switch intends to trigger.
 -d $HOME/tmp 
output directory/tmpCertain things in the program require a default value e.g. the program needs to know where to store its output files. It's left to the programmer to decide which of the default values can be overruled by command line switches.
 -u joe,sandy 
user listcurrent userSome command line switches can take more complex arguments, in this case a comma separated list of users. Its absence should be covered by a reasonable default value e.g. the current user.
 -p 1507 
process idall processesSome switches do specify a setting which acts as a filter or a kind of a restriction but its absence does not imply a default value but is somewhat vague.
In the 'user list' example before another default behaviour could have been 'all users' instead of 'current user'.

Rather than defining a list of variables to set the defaults like

$OUTDIR = "/tmp";
$USERS = $ENV{'USER'};
...
and later somehow associate these variables with the switches a (in my view) cleaner approach is to
  • define the defaults in a hash (the keys are the switches)
  • create a new hash (again with switches for keys) and set them to either the defaults or values supplied by the command line
    The following Perl program handles the cases above.
  • boolean switches and unspecified defaults are set to undef, all others are set to their reasonable default values.
  • the  ... ? ... : ... operator is used to set the actual variables
    (Getopt::Std sets boolean switches to 1 which represents true, the opposite (and default) could be anything that evaluates to false in an if(...) clause, I chose 'undef' rather than 0).

    #!/usr/bin/perl
    use strict;
    
    use Getopt::Std;        # to process command line arguments
    
    # Define the defaults in a hash
    my %defaults;
    $defaults{"a"}  = undef;
    $defaults{"d"}  = "/tmp";
    $defaults{"u"}  = $ENV{'USER'};
    $defaults{"p"}  = undef;
    
    # Retrieve the command line switches into a hash
    # making sure which ones are boolean and which require an argument with ':'
    my %opts;
    getopts('ad:u:p:',\%opts);
    
    # Put either the default values or the command line switch arguments into a hash
    my %vars;
    foreach my $key (keys %defaults) {
      $vars{$key}    = exists $opts{$key} ? $opts{$key} : $defaults{$key} ;
    }
    
    # Test output: see what is contained in 'vars'
    foreach my $key (keys %vars) {
      print $key," ",$vars{$key},"\n";
    }
    print "\n";
    
    # Check decision tree for boolean and unspecified switches
    print "a is set\n" if( $vars{"a"} );
    print "p: all processes\n" unless( $vars{"p"} );
    

    If run without any command line switches:

    u andreas
    p 
    a 
    d /tmp
    
    p: all processes
    

    With -a and -u

    ... -a -u joe,sandy
    
    u joe,sandy
    p 
    a 1
    d /tmp
    
    a is set
    p: all processes
    

    With -p and -d

    ... -d $HOME/tmp -p 1507
    
    u andreas
    p 1507
    a 
    d /export/home/andreas/tmp
    

    With this general approach one hash 'vars' contains all the information and its contents can be used directly later in the program (like -d output directory) or used in a decision process defined vs. undefined.

    Of course there are more issues like the ones mentioned above (e.g. conflicting switches) or validity of values (e.g. does the output directory exist and is writable) but they need to be resolved somewhere else in the code.

  • 3 comments:

    1. Harvard Business Review named data scientist the "sexiest job of the 21st century".This Data Science course will cover the whole data life cycle ranging from Data Acquisition and Data Storage using R-Hadoop concepts, Applying modelling through R programming using Machine learning algorithms and illustrate impeccable Data Visualization by leveraging on 'R' capabilities.With companies across industries striving to bring their research and analysis (R&A) departments up to speed, the demand for qualified data scientists is rising.

      data science training in bangalore

      ReplyDelete
    2. myTectra offers Big Data and Hadoop training in Bangalore using Class Room.
      myTectra offers Live Online Big Data and Hadoop training Globally.
      Big Data and Hadoop training Unlike traditional systems, Big Data and Hadoop enables multiple types of analytic workloads to run on the same data, at the same time, at massive scale on industry-standard hardware.myTectra Big Data and Hadoop training is designed to help you become a expert Hadoop developer. myTectra offers Big Data Hadoop Training in Bangalore using Class Room. myTectra offers Live Online Big Data and Hadoop training Globally.

      hadoop training in bangalore

      ReplyDelete
    3. Python has adopted as a language of choice for almost all the domain in IT including the most trending technologies such as Artificial Intelligence, Machine Learning, Data Science, Internet of Things (IoT), Cloud Computing technologies such as AWS, OpenStack, VMware, Google Cloud, etc.., Big Data Analytics, DevOps and Python is prepared language in traditional IT domain such as Web Application Development, Infrastructure Automation ,Software Testing, Mobile Testing.

      python online training

      ReplyDelete