Wednesday, February 23, 2011

Single Threaded -- Not As Error Free As Your Mom Told You

There's good reason to avoid multithreaded programming. Nobody wants to deal with race conditions, shared memory, locking and deadlocks. This is especially true in a language like C++ where there are no guarantees on how locks behave, what kind of thread message passing scheme might be available, or what guarantees of atomicity exist on the target platform. It gets even more fun writing multi-threaded, platform independent code.


BUT

There comes a time when forcing everything into a single thread for the sake of single-threadedness becomes an unwieldy disaster.

Consider an application that must perform multiple, concurrent lines of processing. Lets say that this application must communicate with other pieces of software to perform multi-step tasks to accomplish these concurrent lines of processing. In the process of doing these multi-step tasks, we cannot prevent the application from executing. We can't prevent all of the lines of processing while waiting for one line of processing to complete. Doing so would starve every other line of processing in the system--they'd no longer be concurrent.

An example of such an application might be the backend to multiple automated teller machines (ATMs). The software must communicate with ATMs worldwide. Waiting for an ATM to respond in South Africa can't halt processing for every other ATM in the world. The backend software must be robust enough to keep multiple things going simultaneously.

By describing "lines of processing" and using terms like "concurrently" it may sound obvious that each "line" should get its own thread. Indeed this clearly is an inviting solution. Maybe here's what a "WithdrawFunds" routine might look like on the backend when run in its own thread:

// Run as a job in its own thread
void WithdrawFunds(unsigned int amount)
{
   // Confirm the amount is available in the account
   if (IsAmountAvailable(amount))
   {
      //Order the ATM to dispense funds, block
      //until this is completed
      DispenseFunds(amount); // BLOCKS

      DeductAmountFromAccount(amount);
   }
}

Here everything occurs step-by-step. If we need to wait for an ATM to respond we simply block while waiting. Obviously the other threads in the system continue to happily process their own withdrawals/deposits/etc. Somehow we'll need to deal with some information shared between threads--when deducting from an account for an example. To do this, we can get messy with locking code, use some kind of inter-thread communication, or just trust a database to take care of things for us.

There's another solution. "Lines of processing" need not indicate separating everything out into threads. We can implement our ATM backend system with a single never-blocking thread. The required ingredient is a thread message queue. Instead of blocking, a single thread can process each step asynchronously. Each ongoing line of processing, in our case a transaction with an ATM, gets tracked as a separate transaction. Messages coming into our thread initiate these transactions or push their state a step further.

Here's how the code would look:

// Based roughly on how MFC threading works
class TheAtmBackendThread
{
    // we need to track each ATM session now
    std::vector < atmtransaction> atmTransactions;

    // we received a withdraw request event
    void OnWithdrawRequest(/*some kind of msg*/)
    {
        if (IsAmountAvailable(...))
        {
            // Create a record for this communication with the ATM
            atmTransactions.push_back(new AtmSession);
            // Update the state of this transaction
            atmTransactions.back().UpdateState(WaitingOnFundsDispensed);
            PostMessageToDispenseFunds();
        }
    }

    void OnFundsDispensed(/*some kind of msg*/) 
    {
        // Now we have to figure out what ATM transactionthis is for 
        AtmTransaction tran = LookupTransaction(...);
        // Is the transaction in the expected state?
        if (sess.WaitingOnFundsToBeDispensed())
        {
           DeductAmountFromAccount(...)
        }
        // destroy the transaction
    }
};

In this solution, our thread's message queue looks at the next message and calls the corresponding function for that thread. To implement "WithdrawFunds", we need two events: One for withdraw requests, another confirming that funds were dispensed. Since we might be servicing requests for multiple ATMs simultaneously, we need to store every ongoing ATM transaction. When processing the WithdrawRequest event we create a transaction, store it away, and send a message back to the ATM to dispense funds. Later when we get a "funds dispensed" event we'll need to find the intended transaction for the event.

This method can become unreadable fast. Compare this code with the "WithdrawFunds" function above. "WithdrawFunds" by simply blocking can list everything that needs to be done step-by-step. Because we're not blocking here, we need to track the state of each transaction. Instead of being scoped to a single function as in "WithdrawFunds", now the transaction belongs to the entire thread. The blocking "WithdrawFunds" function, by avoiding all the additional state tracking, is far easier to understand and debug. When debugging the single threaded solution we never see the full story. What happened in the event handling functions when processing earlier events in this transaction? We'll never know because we've lost that function context. We don't know what happened, how we got here, or where we're going next. We're a little lost and its hard to find a map-- its hard to write a step-by-step procedure of how everything should happen just by looking at the code.

Moreover, by introducing extra state, we are just asking for bugs. What happens at each step when something goes wrong? How do we handle missing a step in the conversation? In the "WithrdawFunds" function this can simply be a caught exception or an error return code. In the single-threaded code, we need some kind of "watchdog" event to jump in and figure out what might have timed out. The increased scope of the transaction might expose us to bugs. More code will gain access to this data making it possible that unexpected code may poke around in our state leading to bugs and unexpected behavior. We get extra complexity and harder to debug code with its own potential for bugs.

Whats the answer? Its nice we don't have to think about all the threading issues in the single threaded solution. Its something that can certainly get ugly fast. However, a careful system designer should be able to mitigate these problems. Certainly a database would be an appropriate choice. Multiple threads only interacting with a database would avoid headaches associated with locking by delegating that job to the database. Aside from databases most platforms have ways of passing messages or communicating without the direct use of locks.

The question comes down to: do you want newbs confused about your crazy message passing/state tracking scheme? Or do you want them confused about how they should lock shared resources? Where do you want your newb generated bugs :)? If you can structure your code so that they can create threads and rarely have to think about locking (ie by using a database) then don't restrict yourself to the clumsy single threaded approach. If for some reason your system has poorly defined locking/thread communication semantics than there may be a reason to think more carefully about the trade-offs.