Operating system


Say you do this

ls / -R | head -2

The simple command line above, lists (ls) the entire directory structure starting from root in a recursive fashion. Its output is piped to head which will exit after it gets 2 lines of output.

The problem

Its clear that head will complete much before ls will. This poses some interesting questions like

  1. Will ls run in the background after head exits?
  2. Will the entire command keep running until ls is done or will you get the prompt as soon as head exits but ls will keep running to completion in the background?
  3. Will head run only after ls has finishes giving its output, thus requiring ls to run to completion?

I recently ran into these same set of questions about assumptions being made in an install script that was being written for a new reporting tool we where building. So what is the deal here ? Think for a minute before you read the answers.

Pipe Basics

I knew that the way pipe worked was to redirect the output of one process (stdout) as the input (stdin) of the process that came next in line. But this did not quite give all the answers. Therefore googling the pipe process gave some more details which i shall describe below

The details

shell pipe setup

shell pipe setup

Poetry in design

  1. The shell sets up the pipe. The running processes use stdin / stdout as they would normally have without knowing its a pipe rather than the terminal input / output.
  2. A buffer exists  between the pipe descriptors, which is handled by the kernel. Writing process will block once this buffer is full and will be able to continue once the reading processes read off this buffer.
  3. The process of exec and forking means that it is impossible to say which processes run first and is governed only by CPU scheduling.
  4. All the processes are in the same hierarchical level and have the shell as their parent.
  5. In Linux versions before 2.6.11, the capacity of a pipe was the same as the system page size (e.g., 4096 bytes on i386). Since Linux 2.6.11, the pipe capacity is 65536 bytes.

But what happens when one of the piped process exits?

Unfortunately, most of the documentation only explained how the pipe was setup by the shell and kernel.  But what we required was an explanation of what happened if one the process exits.

To test this i ran a few tests. Here are my observations.

ls / -R | head -2 | head -20000 (exits immediately)

ls / -R | head -20000 | head -2 (exits immediately)

cat small-file.txt | ls /-R (runs for a long time)

Obviously, it is the processes that read input, which makes a difference in how the piped command line behaves. So what causes this behaviour?

Enter Unix Signals

Signals are the IPC mechanisms akin to software interrupts used by Unix like operating systems to notify running processes of interesting events. This is possibly the only way an outside event can modify the behaviour of an otherwise normal process.

A process could recieve a signal for IO issues / child processes termination or plain Ctrl-C related interrupt signals.  Most of the default handlers (which you can change if requires) cause the process to exit.

My attention having turned to signals, i ended up walking through the wikipedia for the master list of UNIX signals, hoping to find anything related to pipe. This caused me to find an interesting entry …

SIGPIPE

The wikipedia entry for SIGPIPE says “When a pipe is broken, the process writing to it is sent the SIGPIPE signal. The default reaction to this signal for a process is to terminate”.

If attempt is made to read from a pipe whose write end has been closed, the reader receives an EOF, which causes the read program to close in an orderly fashion.

hmmmm – applying that knowledge to our pipe commands would show that this would cause the chain of processes to exit, if a listening process exits and thus breaks the pipe chain.

It would have been an extreme waste of resources had ls continued to run even after head exitted in the command line ls / -R | head -2 and shows how well the UNIX system has been thought out.

Even more importantly it is highly intuitive that ls / -R | head -2 to exit immediately after 2 lines have been output.  This is one of the cool things about Unix, that it behaves in the manner you expect it to, as if the commands understand the social niceties of human pysche or at least hacker psyche.

It is also highly interesting to note that the reader receives an EOF, because output finishing first is normal but the write receives a signal if the reader has already exit, since that is an abnormal situation. Very beautiful indeed.

It is no wonder that such clever hacks and excellence in design makes Unix the hacker platform of choice.

Happy hacking !!!!

ps : All this would have been clear had i chose to read the documentation of the pipe system call before the investigation was begun. It goes to show how much we ignore existing documentation at our own peril.

%Interrupt time is the time that the processor spends servicing interrupts from the hardware aka the time spend talking with the hardware installed in the system. This should be usually about 0-1% on a normal box and about 3-5% on a fairly busy box.

In our case we were seeing about 20% and more.

Implications of high %Interrupt Time

If %Interrupt time is high it usually implies that some piece of hardware is really busy. More often than not, this is probably caused by a faulty hardware and in very rare cases by software that is putting a lot of load on the hardware, like a faulty device driver.

When the problem is caused due to faulty hardware, it (the piece of hardware) stops working properly and hence the OS needs to talk a lot to the concerned process. In fact all of the pages in the web would point to some form of faulty hardware / driver (usually newly installed) that needs to be identified and removed for the system to start behaving properly when this behavior is seen.

Most other times disks are the cause, they are prone to a lot of hardware errors due to their moving parts or in-correct connections. When this happens, system starts using an in efficient form of communication instead of Untra DMA or DMA. More details can be found here).

Our problem specifics

But the issue i was trying to debug was a system which was behaving properly when the offending process (Cisco CUOM) was stopped. No new hardware had been installed so driver issues were ruled out. I did some file copies and browsing to rule out hardware issues. The %Interrupt time did not peak in these tests so we had to conclude there were no issues with disk or network equipment.

The issue peaked only when our software was running. Was our software stressing the hardware in some unforeseen manner or was the software plain inefficient ?

In either case, we had to look into what was executing inside the Kernel to decide where the OS was spending its time. There are a couple of ways to do this (which i will detailing in my next post) which should tell us the call stack (ie sequence of function calls that traces it’s path through the modules that make it) and thence to the code where the system spends most of its time.

Verify software configuration

Trying to look into the kernel innards is fine if you have the stomach for it. But prudence calls for verifying the software constitution of the system to get some primary clues before trying to do anything as involved as kernel debugging. My first task therefore was to check what was installed and running on the system. A simple glance at the task manager showed me a process named ~9.exe and svchosty.exe (y has two dots over it).

Warning Bells

The immediate thing to look for when you see junk process names is to see if anti-virus software is present in the system. No surprises there – none was present. Downloading and installing Norton caused the Norton scan window to close / vanish as soon as it was opened. This made me suspect the worse and true enough a brief search on the internet confirmed that behavior was caused by a class of virus called w32/Deborm.worm.gen (in mcafee world) / W32.HLLW.Nebiwo (in norton av world). It spreads using file shares and it is one of the most common viruses out there. Any un-protected systems in big networks soon get infected with this worm.

Anti – Anti Virus

This virus will soon cause other vulnerabilities to be executed on the machine including backdoors and, -check this out- ANTI – Anti Virus products. :). This was why the norton windows kept closing. The only cure then is to clean out the system manually by hand killing all the processes that start with ~, “Explorer .exe”, “Winlogon .exe ” and performing all the steps as recommended in this advisory.

  1. The Anti AntiVirus trojan was Trojan.KillAV
  2. One of the backdoor trojans was Backdoor.SdBot
  3. One of the backdoor trojans was Backdoor.Litmus

AfterMath

Folks assume that by rooting out the virus that caused the initial infection, the machine becomes ready for use. However, the trojan’s advisory on norton web site reads – “If the Trojan was run and a hacker executed files on the computer, it may be difficult to determine exactly what was done, even after the Trojan was removed”

We are now in the process of changing the enterprise wide credentials of everyone who had used the system and also contemplating a possible re-installation of the entire system Now you know how serious a virus infection can be 🙂 and how important an enterprise wide anti virus solution is to your organization.

Happy debugging !!!!