Saturday, November 24, 2012

Reusable code in shell scripts or How to create a shell library

When working a lot with shell scripts (either your own or others) you get to the point where certain pieces of code seem to be repeated numerous times so eventually one starts to wonder if and how one could build and use some library of reusable code.
A seasoned programmer could eventually end up with a library of standard functions or better a library for various shells (sh, bash, ksh, etc.) and various operating systems (Solaris vs. Linux being the major distinction but also the various releases of each major OS show differences).

For a particular project written in sh/ksh on Solaris I built a library and below I'll explain a few of the considerations.

What is a shell library

Without having seen the phrase 'shell library' anywhere else to me it is a collection of environment variables and functions. So using a shell library is invoking a file containing this collection and thus setting the environment of the executing shell script.

Why is a shell library useful

Many shell scripts contain settings of environment variables and definitions of functions at the beginning of the script. When working in projects with multiple scripts where the same or similar settings are being used it does seem to make sense to put these settings into a single place. An important advantage of such an approach: if the setting needs to be changed later it needs to be changed in only one place. This concept is obvious for programmers but I have seen it rarely used in shell scripts.
When I see pieces of code like HOST=`hostname` and many more of such statements repeated in dozens of scripts (all of them part of a big project) it is time to start using a library.

What goes in

That is probably the simplest question: I'm almost tempted to say any piece of code that is used twice or more should be in the library.

Changes over time

One of the big questions is how to handle a library over time.
New things need to be added i.e. the library grows.
Maybe one has ideas to improve the current code and thus code changes.
Will the changed library still work in all older invocations (backward compatibility)?

How to invoke the library?

Use the dot operator "." as in
. lib/mylib.sh
assuming that your library sits in a file called mylib.sh in a sub directory lib.

Location

In order to invoke the library the calling script needs to locate it. Where should the library reside?
Assuming that it is part of a project (and thus a collection of scripts which are deployed in conjunction) you need to define a directory (without established standards for shell script libraries you might as well call it lib following the convention of other languages).

Some examples

Simplest case: setting a variable

HOST=`hostname` ; export HOST

So your scripts need to run the hostname command only once. Of course the underlying assumption here is that the hostname command can be found in the PATH of the user executing the script.

Extract a variable

i.e. extract pieces of information out of a larger output.
Say you have the output of id and you want the username:

id uid=712(joe) gid=100(other) groups=100(other),22(staff)

The following extracts the string between the first parentheses.

USER=`id |sed -e 's/).*//' -e 's/.*(//'` ; export USER

Setting a variable for if clauses

The control flow in scripts very often depends on whether a variable has a certain value or not. You can introduce a (boolean) variable to subsume this logic.

Imagine that you want to test whether the script is executed by root or not. One could use the USER variable and (always) test like this

if [ "$USER" = "root" ] ; then ... ; fi

An alternative could be this setting in your library which creates a new variable isRootUser

isRootUser=`ID=\`id | sed -e 's/uid=//' -e 's/(.*//'\`; [ $ID -eq 0 ] && echo $ID` ; export isRootUser

This at first glance complex piece of code simply

  • runs the id command and extracts the uid and sets the variable ID
  • checks whether ID is zero (this would also cover the case that there is a second superuser account with uid 0) and if so then sets the variable isRootUser to ID
  • The variable can then be invoked as follows:

    if test $isRootUser ; then ... ; fi

    Advantages of this approach:

  • the root check is encapsulated in the setting of isRootUser (if you decide to use a different method to identify the root user you can change it here and change it only once in the library)
  • it runs only once at the invocation of the library (not possibly multiple times in your script)
  • thereafter a very simple check using a variable with a telling name can be used as many times as needed

  • Common functions

    Maybe this is the more interesting piece and related to other programming languages: defining a set of reusable functions. Due to the nature of shells you have to watch out for the scope and use of variables (local / global / input / return).

    A simple function to print an error message and stop the script:

    die() {
      echo "Error: $*" >&2
      exit 1
    }

    # Usage: 
    #     die Some condition has not been met
    # or: die "Some condition has not been met"
    # or: die "Some condition" has "not been" met

    A wrapper to mkdir including nicer error handling:

    mk_dir() {
      [ -z "${1}" ] && return 1
      [ -d "${1}" ] || mkdir -p "${1}" 2>/dev/null || { echo "Error: cannot mkdir $1"; return 1; }
      return 0
    }

    # Usage: 
    #     mk_dir DIRECTORY
    #        if you are not interested if successful or not
    # or: mk_dir DIRECTORY || return 1
    #        if you want to stop further execution of a function after failure
    # or: mk_dir DIRECTORY || exit 1
    #        if you want to stop the script after failure

    Check if your you are dealing with a number (positive integer or zero) by invoking another shell (in this case: ksh):

    isNum() {
      ksh -c "[[ \"$1\" = +([0-9]) ]]" return $?
    }
    # Usage:
    #     isNum $N && echo "yes"
    #        do something if ok
    # or: isNum $N || echo "no"
    #        do something if not ok

    Have fun building your own libraries.

    4 comments:

    1. Hey Andreas,

      Nice post. Could you give an example of how this library could be called from a test script?
      Also, how one would call one of these common functions from outside the library they were defined at?

      ReplyDelete
    2. Harvard Business Review named data scientist the "sexiest job of the 21st century".This Data Science course will cover the whole data life cycle ranging from Data Acquisition and Data Storage using R-Hadoop concepts, Applying modelling through R programming using Machine learning algorithms and illustrate impeccable Data Visualization by leveraging on 'R' capabilities.With companies across industries striving to bring their research and analysis (R&A) departments up to speed, the demand for qualified data scientists is rising.

      data science training in bangalore

      ReplyDelete
    3. myTectra Amazon Web Services (AWS) certification training helps you to gain real time hands on experience on AWS. myTectra offers AWS training in Bangalore using classroom and AWS Online Training globally. AWS Training at myTectra delivered by the experienced professional who has atleast 4 years of relavent AWS experince and overall 8-15 years of IT experience. myTectra Offers AWS Training since 2013 and retained the positions of Top AWS Training Company in Bangalore and India.
      aws training in bangalore

      ReplyDelete
    4. Big Data and Hadoop is an ecosystem of open source components that fundamentally changes the way enterprises store, process, and analyze data.
      hadoop training in bangalore

      ReplyDelete