Friday, March 4, 2011

Signal handling in shell background processes

This article is about my learning experience in signal handling and monitoring sub processes.

Yesterday I got puzzled when a supposedly simple test program did not act as expected.
The shell script trap.sh below sets a trap to catch SIGINT (or signal 2) and exit upon receiving it. It works as expected when run standalone but when invoked as a background process in another test script trapWrapper.sh it failed.




trap.sh trapWrapper.sh
#!/bin/sh
# Catch signal 2
trap "echo trapped 2;I=1" 2
# Wait until var 'I' is set to something
while : ; do 
  sleep 1; [ -n "$I" ] && break 
done
echo DONE
#!/bin/sh
# Run trap.sh in the background
./trap.sh 2>&1  &
pid=$!
# Sleep 5 seconds 
sleep 5
# ... and then kill the background process
kill -2 $pid
# Wait 
wait $pid
# Exit with exit code of background process
exit $?
  Running trapWrapper.sh will wait forever and never end.
When killing it with Ctrl-C it will go away but the trap.sh
process will be left behind and needs to be killed manually.
(the bigger idea behind all this is to have a monitoring script which starts a number of background processes and kills them after a certain timeout period has passed).

So what's the difference when run in background?

The sh man page has the answer:

man sh
...
  Signals
     The INTERRUPT and QUIT signals for an  invoked  command  are
     ignored if the command is followed by &. Otherwise, signals
     have the values inherited by the shell from its parent, with
     the  exception  of  signal 11 (but see also the trap command
     below).

i.e. SIGINT in a background process is ignored (as well as SIGQUIT).

SIGTERM is not mentioned here so the next idea is to enhance trap.sh and adding a signal handler for it and changing trapWrapper.sh so that it sends SIGTERM to the background process.

trap.sh trapWrapper.sh
#!/bin/sh
# Catch signal 2 (INT) and 15 (TERM)
trap "echo trapped 2;I=1" 2
trap "echo trapped 15;I=1" 15
# Wait until var 'I' is set to something
while : ; do 
  sleep 1; [ -n "$I" ] && break
done
echo DONE
#!/bin/sh
# Run trap.sh in the background
./trap.sh 2>&1  &
pid=$!
# Sleep 5 seconds 
sleep 5
# ... and then kill the background process
kill -15 $pid
# Wait 
wait $pid
# Exit with exit code of background process
exit $?
  after 5 seconds this will result in what we wanted:
trapped 15
DONE

Something to remember: the supposedly stronger kill with SIGINT (and SIGQUIT would be the same) does not work due to the ignored signal whereas SIGTERM works fine.

So if you write a script which should act upon SIGINT or SIGQUIT let is also act upon SIGTERM, just to be safe.

Note: this might be different in other shells. When you test this interactively you'll see the difference:

sh csh
$ trap.sh&
8379
$ ptree 8379
    8360  sh
      8379  /bin/sh ./trap.sh
        8385  sleep 1
$ kill -2 8379
$ ptree 8379
    8360  sh
      8379  /bin/sh ./trap.sh
        8785  sleep 1
$ kill 8379
$ trapped 15
DONE
% trap.sh&
[1] 8116
% ptree 8116
    33465 -csh
      8116  /bin/sh trap.sh
        8207  sleep 1
% kill -2 8116
% trapped 2
DONE
the background process ignores the signal the background process in csh accepts SIGINT and exits
In case you've wondered about ptree: this was tested on a Solaris box.

No comments:

Post a Comment