When threads or multiple processes are involved, the likely hood of coming up against the classic, multiple-reader single consumer problem, is quite high. Solving this can become quite tricky too.

Here is one such issue I happened to come up against, some time back, involving Qos communications between the network driver and multiple client applications.

The producer in this case was a server application that would get its data from the NIC card driver and then pass it on to the client applications listening for this data.  The diagram below shows the sequence of steps involved. See if you can spot the multiple consumer problem in it.

Multiple synchronized consumers

Multiple synchronized consumers

As you can see, data is being placed in a well-known memory map by the windows service that receives its data from the NIC driver callbacks. It then raises an event that all the listeners are waiting upon to tell everyone that there is data available to be read.Upon waking up, the clients read the data placed in the common map.

Let the games begin

There are tricky edges in this design regarding how the consumers read the data namely,

  1. What happens if one reader is slow to read the data? How does one know that all readers are done reading?
  2. When is it ok to refresh the data in the common memory?
  3. Do readers take semaphore locks to indicate that the reading is going on? What would then happen if one of the reader fails?
  4. Can timeouts be used as a valid guard against failed readers? How does then one guard against slow systems vs failed readers?
  5. Is sending a copy of the data (as in a WM_DATA message) to everyone efficient ? It might not be very efficient to do so if large amounts of data is be involved.
  6. Can we use a Callback arrangement like in WMI ? What happens if  the reader fails? We do not want the server process to be brought down due to a mis-behaving client.
  7. When there is frequent data to be passed, how can it be most efficiently passed to the clients in an Inter-Process scenario?
  8. Since the event and memory map is common to all the clients, they have a well known name and provides some security loop-holes.
  9. It would be ideal for the server process to not wait for the readers to finish reading because there are other tasks that the server has to attend to and this single functionality shoudn’t be a bottleneck.

The server process handled the entire communication reaquirements for the Proset driver stack and therefore, blocking it was a totally un-acceptable solution in all situations.

So how was this issue solved? Here is how the new solution looked like

Unsynchronized multiple readers in no sharing mode

Unsynchronized multiple readers in no sharing mode

The issue was solved by changing our thinking to realize that there is no spoon. When we did think in-depth about the situation, it was realized that the data (most of which where statistics) is actually not bound to multiple clients. Rather, the data could be identified with a little work as pertaining to a particular client and specifically send to it. This was the breakthrough that led to the successful solution.

The Solution Architecture

  1. Have a dedicated thread at the server that feeds the clients – this way the main server process is never blocked.
  2. Each client has
    1. a thread on which it listens for messages from the server notifying that data is present
    2. A private memory map where each clients data is copied into by the server. Each of this can be 1 Mb worth of data but it can still support 10-20 clients quite easily.
    3. A private event object known to only the client side thread and the server thread.
  3. All of the above setup and the client thread is implemented by us, the creators of the server. These runs inside the client memory when they use the API library provided to use QOS.
  4. The server side thread copies the data into the private map and waits on the client event and the client thread. If the client thread goes down, the server is released.
  5. No synchronization is required between the different readers and no remote synchronization issues need to be kept track of.
  6. Memory maps are the fastest form of IPC and therefore this form of sending data is as optimized as it can be.
  7. If having a single thread at the server to service the clients, proves to be a bottleneck, the design can be changed to have n server threads too.

The result was a solution that NEVER HANGED or NEVE R BLOCKED and this was the most important consideration considering the importance of the network infrastructure it was a part of.

The multiple thread based solution was fast enough for the needs of the clients and though a bit heavy handed, optimal in a client side solution wher there would be a maximum of 4-5 VOIP clients at any one time.

This was so much better in retrospect that it was decided to rewrite the existing version 1.0 of the End to End (code name E2E) Qos feature in Centrino Proset stack in terms of the newer E2E 2.0 implementation. So if you are running a 3945 abg card or higher, the chances are high that you would have this feature running in your machine.

Lesson – In war, avoid what is strong and attack what is weak

So without much ado, the moral of the story is that whenever threading is involved, try to  not share any data. Proper synchronization implementation without loopholes is hard. Not stepping into synchronization mess is the most maintainable and high performance solution possible.

ps : This piece of thinking caused an internal Intel award to be awarded to the author.

Advertisements