Say you do this
ls / -R | head -2
The simple command line above, lists (ls) the entire directory structure starting from root in a recursive fashion. Its output is piped to head which will exit after it gets 2 lines of output.
Its clear that head will complete much before ls will. This poses some interesting questions like
- Will ls run in the background after head exits?
- Will the entire command keep running until ls is done or will you get the prompt as soon as head exits but ls will keep running to completion in the background?
- Will head run only after ls has finishes giving its output, thus requiring ls to run to completion?
I recently ran into these same set of questions about assumptions being made in an install script that was being written for a new reporting tool we where building. So what is the deal here ? Think for a minute before you read the answers.
I knew that the way pipe worked was to redirect the output of one process (stdout) as the input (stdin) of the process that came next in line. But this did not quite give all the answers. Therefore googling the pipe process gave some more details which i shall describe below
Poetry in design
- The shell sets up the pipe. The running processes use stdin / stdout as they would normally have without knowing its a pipe rather than the terminal input / output.
- A buffer exists between the pipe descriptors, which is handled by the kernel. Writing process will block once this buffer is full and will be able to continue once the reading processes read off this buffer.
- The process of exec and forking means that it is impossible to say which processes run first and is governed only by CPU scheduling.
- All the processes are in the same hierarchical level and have the shell as their parent.
- In Linux versions before 2.6.11, the capacity of a pipe was the same as the system page size (e.g., 4096 bytes on i386). Since Linux 2.6.11, the pipe capacity is 65536 bytes.
But what happens when one of the piped process exits?
Unfortunately, most of the documentation only explained how the pipe was setup by the shell and kernel. But what we required was an explanation of what happened if one the process exits.
To test this i ran a few tests. Here are my observations.
ls / -R | head -2 | head -20000 (exits immediately)
ls / -R | head -20000 | head -2 (exits immediately)
cat small-file.txt | ls /-R (runs for a long time)
Obviously, it is the processes that read input, which makes a difference in how the piped command line behaves. So what causes this behaviour?
Enter Unix Signals
Signals are the IPC mechanisms akin to software interrupts used by Unix like operating systems to notify running processes of interesting events. This is possibly the only way an outside event can modify the behaviour of an otherwise normal process.
A process could recieve a signal for IO issues / child processes termination or plain Ctrl-C related interrupt signals. Most of the default handlers (which you can change if requires) cause the process to exit.
My attention having turned to signals, i ended up walking through the wikipedia for the master list of UNIX signals, hoping to find anything related to pipe. This caused me to find an interesting entry …
The wikipedia entry for SIGPIPE says “When a pipe is broken, the process writing to it is sent the SIGPIPE signal. The default reaction to this signal for a process is to terminate”.
If attempt is made to read from a pipe whose write end has been closed, the reader receives an EOF, which causes the read program to close in an orderly fashion.
hmmmm – applying that knowledge to our pipe commands would show that this would cause the chain of processes to exit, if a listening process exits and thus breaks the pipe chain.
It would have been an extreme waste of resources had ls continued to run even after head exitted in the command line ls / -R | head -2 and shows how well the UNIX system has been thought out.
Even more importantly it is highly intuitive that ls / -R | head -2 to exit immediately after 2 lines have been output. This is one of the cool things about Unix, that it behaves in the manner you expect it to, as if the commands understand the social niceties of human pysche or at least hacker psyche.
It is also highly interesting to note that the reader receives an EOF, because output finishing first is normal but the write receives a signal if the reader has already exit, since that is an abnormal situation. Very beautiful indeed.
It is no wonder that such clever hacks and excellence in design makes Unix the hacker platform of choice.
Happy hacking !!!!
ps : All this would have been clear had i chose to read the documentation of the pipe system call before the investigation was begun. It goes to show how much we ignore existing documentation at our own peril.