Monday, March 14, 2011

Background processes and file descriptors in shell scripts

Lately I stumbled upon an issue in a shell script which left me puzzling for a while.

Reduced to a simple example it goes like this:
envision you have two files wrapper.sh and script.sh whereas wrapper.sh is supposed to call script.sh in backticks:
wrapper.sh
#!/bin/sh
x=`script.sh`      # run script.sh and collect its output    
echo x=$x
script.sh
#!/bin/sh
(sleep 60)&        # start a background process
echo pid=$!        # report the pid of the background process
exit 0

The expected output of wrapper.sh was x=pid=12345 kind of immediately after running it.

The unexpected but experienced behaviour was that wrapper.sh was waiting until the background process had finished. This was defying the purpose of the script since in the original scenario wrapper.sh should have managed (like sending signals) the background process after doing some work in between.

Some experimenting with variations of the scripts and some reading finally revealed the clue to the issue.
  • Background (better: forked) processes inherit the file descriptors of their parent process
    i.e. the 'sleep' background process has the same open fds as script.sh
  • Running a command in backticks means to collect its stdout until its stdout is closed
    i.e. wrapper.sh waits until the stdout of script.sh is closed for good.
  • Since the 'sleep' background process writes to the same stdout as script.sh the fd is kept open even after script.sh has finished.
    It does not matter if 'sleep' is actually writing anything or not, the point is that if it would write something it would write to the inherited open stdout.
    ('sleep' is just an example. In the real world it would very likely be another script with some complex tasks to fulfil).
  • The solution is to close stdout of the background process
    #!/bin/sh
    (exec >&-; sleep 60)&  # start a background process but close stdout first
    echo pid=$!            # report the pid of the background process
    exit 0
    

Some hints to explain the situation is the process table showing that wrapper.sh has a defunct sub process (the former script.sh) and the 'sleep' process is a child of init (pid 1). Also a slightly different sub process (echo sub; sleep 60)& leads to x=pid=12345 sub thus showing that wrapper.sh gathered the output of script.sh plus the output of the the sub process.

I wonder how many people are paying attention to this, it is an issue which can be easily overlooked. In essence background processes in scripts like script.sh are daemons since script.sh gives up control of the sub process by simple exiting at some point. So who controls the sub processes, in particular where should they write their output to? Rereading the essentials of a daemon process helps and I will definitly pay more attention to this in the future.

An experiment for the curious:
what happens if stdout was redirected to a file and multiple sub processes were started, each writing to stdout aka. the file? Would everything be written to the file? In which order?
#!/bin/sh
exec 1>/tmp/out
(for i in 1 2 3 4 5; do echo aaaaaaaa; sleep 1 ; done)&
(for i in 1 2 3 4 5; do echo bbbbbbbb; sleep 1 ; done)&
echo DONE

3 comments:

  1. Interesting issue! Thanks for your experiments and explanation :)

    ReplyDelete
  2. Came here from https://stackoverflow.com/questions/16493302/shell-hangs-when-assigning-command-result-to-a-variable , helped a lot, thanks!

    ReplyDelete