Friday, January 28, 2011

Shell library (part 1)

When you write a lot of shell scripts one often duplicates code i.e. you need the same kind of functionality and you rewrite it or copy it from a previous script.

So of course in analogue to programming languages like C or Java it would be nice to capture all the good functionality in a kind of shell library where it's available to whatever you are doing and also has the advantage that if you improve certain code it will be available to all of your scripts.

The shell does not have the notion of library, the closest one can get are scripts which are sourced in by other scripts.

What is a shell library?

I define it as a file which consists of a set of variables and functions which can be sourced in by another script.
A library written for one type of shell very likely does not work with another.
It's not just the difference between the Bourne shell and C-shell families but also in the same family (sh, ksh, bash) you cannot re-use functions: they are implemented differently and don't follow the same rules (global/local vars in functions, syntax) and you can create unexpected results.

So rule #1: stay with one type of shell.

Since I have to deal a lot with legacy code the remainder of this text will refer to Bourne shell (and unfortunately I cannot use any of the more modern features like FPATH, autoload etc.)

Rule #2: the library should not be dependent in any way on the directory where it is placed, it might be invoked from any place which implies e.g. using relative paths is definitly discouraged.

How do you invoke a library (find it and source it in)?

Assume your library sits in /opt/shell/lib/myfuncs.sh and your new script /opt/shell/mytest.sh should invoke it.
The options are that the script knows
  1. the full path of the library
  2. the relative path
  3. a way to determine the location of the library

Source lib with full path

#!/bin/sh
. /opt/shell/lib/myfuncs.sh
Using a full path is - in my mind - a very inflexible solution. There is a variant to that: set the full path in an environment variable so it can be set before sourcing the script.
# Assuming csh is your working shell
setenv LIBPATH /opt/shell/lib

#!/bin/sh
. $LIBPATH/myfuncs.sh
and of course one should check the setting of the variable and the existance of the lib file (see further down).

Source lib with relative path

Using a relative path has the drawback that the script needs to be executed in a specific place, otherwise the relative path does not work.
#!/bin/sh
. lib/myfuncs.sh
does run if executed in /opt/shell and it assumes that the library sits in a 'lib' directory underneath the directory where the script is executed.
If you are in, say $HOME, then calling /opt/shell/mytest.sh will fail.

Of course you can work around by cd-ing to the right place like this:
#!/bin/sh
cd `dirname $0` || exit 1
. lib/myfuncs.sh
which assumes that the library sits in a 'lib' directory underneath the directory where the script is placed (note the subtle difference to the above 'executed').

Determine the lib path

Well, the script needs to find the library somewhere, and if it does not want to search the whole directory tree it needs to have a starting point. That could be: a set of directories which need to be searched (analogous to the FPATH of ksh), either hardcoded or supplied via an environment variable. Again these directories could be full or relative paths.

So I like to follow the convention of later shells and assume that FPATH is a list of directories where things can be picked up.
# Set FPATH to a production lib and a private lib
setenv FPATH /opt/shell/lib:$HOME/lib

#!/bin/sh
# Require that FPATH is set 
[ -n "$FPATH" ] || { echo Error: FPATH is not set ; exit 1 ; }
# Replace colon by space in FPATH
dirpath=`echo "$FPATH" | tr : ' ' `
# Loop through 'dirpath' to find the library
for dir in $dirpath ; do
  # Check if the lib is a readable file in the given dir
  [ -r "${dir}/myfuncs.sh" ] && . ${dir}/myfuncs.sh && break
done
FPATH can contain relative paths too but that creates more problems I think: in a test env it might find the lib in a relative path, on production systems the path might not exist etc.

Caveats:
  1. this code has been using 2 new variables 'dirpath' and 'dir' already (so the lib should not be setting this vars somehow, otherwise there are side effects to the for loop)
  2. it is using the 'tr' tool (which needs to be available and in PATH and not be aliased to something else)
  3. the code has grown beyond the normal 'source in' one liner
  4. the code does not work if path names contain white space

Conclusion

Aside from using the fully qualified path directly in the script all solutions require a certain convention (the name of a global variable and how it is used) but allow a certain flexibility so in the end
  1. the library script can reside anywhere
  2. the calling script can reside anywhere
  3. the calling script can be called anywhere
    and all still works: the calling script can find the lib and can also execute the rest of its code.
    (For those who are confused: I'm in a certain directory calling a script in another directory (with relative or absolute path) which sources in a library in yet another place and everything should work (as good as the very common case where everything is run and placed in the same directory)

No comments:

Post a Comment