Tuesday, February 1, 2011

Using 'expr' in scripts

As many scriptors know (and probably hate) Bourne shell doesn't have inbuilt arithmetic capabilities so one has to resort to expr for calculations, the most famous maybe being the loop increase:
i=0
while [ $i -lt 10 ] ; do
  ...
  i=`expr $i + 1`
done
expr has more operators though than just the basic arithmetics and I don't see them used very frequently, I think I haven't used them at all so - stumbling upon it accidentally - I thought I'd play with it a little to get a better understanding and here's the result.

The match operator (string comparison with regular expressions)


The operator to compare a string to a regular expression is the colon (:).
expr will return the number of bytes matched (the curious might look into the xpg4 version of expr which returns the number of characters matched).

Also important: the regular expression always starts to compare at the beginning of the string so as if one would have used ^.

A few examples.
f="/a/c"

# does $f match ^a ? No.
expr $f : a
0

# does $f match ^/a ? Yes.
expr $f : /a
2

# does $f contain an 'a' ? Yes.
expr $f : '.*a'
4

# does $f contain a 'b' ? No.   (the regexp must be enclosed in simple quotes here)
expr $f : '.*b'
0

# does $f end with a 'c' ? Yes.
expr $f : '.*c$'
4

# does $f end with a 'b' ? No.
expr $f : '.*b$'
0
All of these examples can be used in a decision process to check whether the result is zero (no match) or not.
x=`expr ... : ...`
if [ $x -eq 0 ] ; then
  : # no match
else
  : # match
fi

To make the code a little safer one has to consider that the string to be matched might contain white space. The examples above will fail so one needs to use double quotes.
f="a b c"
expr $f : a
expr: syntax error
# since this translates to    expr a b c : a    which does not compute

# Double quotes around the string do help
expr "$f" : a
1

And even more useful is to use the extraction reg exp \1: instead of returning the number of bytes in a match one gets a string (if successful) or an empty string.
f="/ab/cd/efg"

# Extract the filename 
expr "$f" : '.*/\(.*\)'
efg

# Extract the dirname
expr "$f" : '\(.*\)/.*'
/ab/cd

f="abcdefg"

expr "$f" : '.*/\(.*\)'
        <---- # Note: this is an empty string here !!!

expr "$f" : '\(.*\)/.*'
        <---- # Note: this is an empty string here !!!

# Why empty strings? because we were trying to match a slash which is not present in $f
# Why empty strings and not 0? because we requested a string between (...)

# Now what if we wanted to solve the following: 
# if $f contains a slash then the filename is everything after the slash
# if $f does not contain a slash it should be considered a filename
# Rather than using  if ... else ... fi we can use expr, read on.

The 'or' and 'and' operator


The or operator is | and the and operator is &, both have to be escaped always.
They can be used to compare two expressions.

or: the first expression is evaluated. If it is NULL (i.e. empty or non-existing) or 0 the second expression will be evaluated too.
a=111
b=0
c=
e=555

# In the 4  comparisons below the second expression is always valid
expr "$a" \| "$e"
111

expr "$b" \| "$e"
555

expr "$c" \| "$e"
555

expr "$d" \| "$e"
555

# Here we compare a 0 value with an empty value
expr "$b" \| "$c"
0

# Here we compare an empty value to a non-existing one
#   The result of expr is also 0
expr "$c" \| "$d"
0 
and: both expressions are evaluated. If any of them is NULL or 0 then 0 is returned. Otherwise the first expression.
a=111
b=0
c=
e=555

# In the 4  comparisons below the second expression is always valid
expr "$a" \& "$e"
111

expr "$b" \& "$e"
0

expr "$c" \& "$e"
0

expr "$d" \& "$e"
0

Combining 'match' and 'and/or'


One can use these in combination to solve the emptry string issue above and a default value can be assigned, it is important to remember that expr evaluates left to right.
f="abcdefg"

expr "$f" : '.*/\(.*\)' \| "$f"
abcdefg

# These are two operations, one 'match' and one 'or':   expr   ... : ... \| ...
# If the match operation fails   "$f" : '.*/\(.*\)'    
# then evaluate the right hand side of 'or' and make this the result of 'expr'.
#   So: no slash found in $f then return the original string.

No comments:

Post a Comment