Part 9: A Second Interlude, Deferred
More Consequence of Callbacks
We’re going to pause for a moment to think about callbacks again. Although we now know enough about deferreds to write simple asynchronous programs in the Twisted style, the
Deferred class provides more features that only come into play in more complex settings. So we’re going to think up some more complex settings and see what sort of challenges they pose when programming with callbacks. Then we’ll investigate how deferreds address those challenges.
To motivate our discussion we’re going to add a hypothetical feature to our poetry client. Suppose some hard-working Computer Science professor has invented a new poetry-related algorithm, the Byronification Engine. This nifty algorithm takes a single poem as input and produces a new poem like the original, but written in the style of Lord Byron. What’s more, our professor has kindly provided a reference implementation in Python, with this interface:
class IByronificationEngine(Interface): def byronificate(poem): """ Return a new poem like the original, but in the style of Lord Byron. Raises GibberishError if the input is not a genuine poem. """
Like most bleeding-edge software, the implementation has some bugs. This means that in addition to the documented exception, the
byronificate method sometimes throws random exceptions when it hits a corner-case the professor forgot to handle.
We’ll also assume the engine runs fast enough that we can just call it in the main thread without worrying about tying up the reactor. This is how we want our program to work:
- Try to download the poem.
- If the download fails, tell the user we couldn’t get the poem.
- If we do get the poem, transform it with the Byronification Engine.
- If the engine throws a
GibberishError, tell the user we couldn’t get the poem.
- If the engine throws another exception, just keep the original poem.
- If we have a poem, print it out.
- End the program.
The idea here is that a
GibberishError means we didn’t get an actual poem after all, so we’ll just tell the user the download failed. That’s not so useful for debugging, but our users just want to know whether we got a poem or not. On the other hand, if the engine fails for some other reason then we’ll use the poem we got from the server. After all, some poetry is better than none at all, even if it’s not in the trademark Byron style.
Here’s the synchronous version of our code:
try: poem = get_poetry(host, port) # synchronous get_poetry except: print >>sys.stderr, 'The poem download failed.' else: try: poem = engine.byronificate(poem) except GibberishError: print >>sys.stderr, 'The poem download failed.' except: print poem # handle other exceptions by using the original poem else: print poem sys.exit()
This sketch of a program could be make simpler with some refactoring, but it illustrates the flow of logic pretty clearly. We want to update our most recent poetry client (which uses deferreds) to implement this same scheme. But we won’t do that until Part 10. For now, instead, let’s imagine how we might do this with client 3.1, our last client that didn’t use deferreds at all. Suppose we didn’t bother handling exceptions, but instead just changed the
got_poem callback like this:
def got_poem(poem): poems.append(byron_engine.byronificate(poem)) poem_done()
What happens when the
byronificate method raises a
GibberishError or some other exception? Looking at Figure 11 from Part 6, we can see that:
- The exception will propagate to the
poem_finishedcallback in the factory, the method that actually invokes the callback.
poem_finisheddoesn’t catch the exception, it will proceed to
poemReceivedon the protocol.
- And then on to
connectionLost, also on the protocol.
- And then up into the core of Twisted itself, finally ending up at the reactor.
As we have learned, the reactor will catch and log the exception instead of crashing. But what it certainly won’t do is tell the user we couldn’t download a poem. The reactor doesn’t know anything about poems or
GibberishErrors, it’s a general-purpose piece of code used for all kinds of networking, even non-poetry-related networking.
Notice how, at each step in the list above, the exception moves to a more general-purpose piece of code than the one before. And at no step after
got_poem is the exception in a piece of code that could be expected to handle an error in the specific way we want for this client. This situation is basically the exact opposite of the way exceptions propagate in synchronous code.
main function is “high-context”, meaning it knows a lot about the whole program, why it exists, and how it’s supposed to behave overall. Typically,
main would have access to the command-line options that indicate just how the user wants the program to work (and perhaps what to do if something goes wrong). It also has a very specific purpose: running the show for a command-line poetry client.
connect method, on the other hand, is “low-context”. All it knows is that it’s supposed to connect to some network address. It doesn’t know what’s on the other end or why we need to connect right now. But
connect is quite general-purpose — you can use it no matter what sort of service you are connecting to.
get_poetry is in the middle. It knows it’s getting some poetry (and that’s the only thing it’s really good at), but not what should happen if it can’t.
So an exception thrown by
connect will move up the stack, from low-context and general-purpose code to high-context and special-purpose code, until it reaches some code with enough context to know what to do when something goes wrong (or it hits the Python interpreter and the program crashes).
Of course the exception is really just moving up the stack no matter what rather than literally seeking out high-context code. It’s just that in a typical synchronous program “up the stack” and “towards higher-context” are the same direction.
Now recall our hypothetical modification to client 3.1 above. The call stack we analyzed is pictured in Figure 16, abbreviated to just a few functions:
The problem is now clear: during a callback, low-context code (the reactor) is calling higher-context code which may in turn call even higher-context code, and so on. So if an exception occurs and it isn’t handled immediately, close to the same stack frame where it occurred, it’s unlikely to be handled at all. Because each time the exception moves up the stack it moves to a piece of lower-context code that’s even less likely to know what to do.
Once an exception crosses over into the Twisted core the game is up. The exception will not be handled, it will only be noted (when the reactor finally catches it). So when we are programming with “plain old” callbacks (without using deferreds), we must be careful to catch every exception before it gets back into Twisted proper, at least if we want to have any chance of handling errors according to our own rules. And that includes exceptions caused by our own bugs!
Since a bug can exist anywhere in our code, we would need to wrap every callback we write in an extra “outer layer” of
except statements so the exceptions from our fumble-fingered typos can be handled as well. And the same goes for our errbacks because code to handle errors can have bugs too.
Well that’s not so nice.
The Fine Structure of Deferreds
It turns out the
Deferred class helps us solve this problem. Whenever a deferred invokes a callback or errback, it catches any exception that might be raised. In other words, a deferred acts as the “outer layer” of
except statements so we don’t need to write that layer after all, as long as we use deferreds. But what does a deferred do with an exception it catches? Simple — it passes the exception (in the form of a
Failure) to the next errback in the chain.
So the first errback we add to a deferred is there to handle whatever error condition is signaled when the deferred’s
.errback(err) method is called. But the second errback will handle any exception raised by either the first callback or the first errback, and so on down the line.
Recall Figure 12, a visual representation of a deferred with some callbacks and errbacks in the chain. Let’s call the first callback/errback pair stage 0, the next pair stage 1, and so on.
At a given stage N, if either the callback or the errback (whichever was executed) fails, then the errback in stage N+1 is called with the appropriate
Failure object and the callback in stage N+1 is not called.
By passing exceptions raised by callbacks “down the chain”, a deferred moves exceptions in the direction of “higher context”. This also means that invoking the
errback methods of a deferred will never result in an exception for the caller (as long as you only fire the deferred once!), so lower-level code can safely fire a deferred without worrying about catching exceptions. Instead, higher-level code catches the exception by adding errbacks to the deferred (with
Now in synchronous code, an exception stops propagating as soon as it is caught. So how does an errback signal the fact that it “caught” the error? Also simple — by not raising an exception. And in that case, the execution switches over to the callback line. So at a given stage N, if either the callback or errback succeeds (i.e., doesn’t raise an exception) then the callback in stage N+1 is called with the return value from stage N, and the errback in stage N+1 is not called.
Let’s summarize what we know about the deferred firing pattern:
- A deferred contains a chain of ordered callback/errback pairs (stages). The pairs are in the order they were added to the deferred.
- Stage 0, the first callback/errback pair, is invoked when the deferred is fired. If the deferred is fired with the
callbackmethod, then the stage 0 callback is called. If the deferred is fired with the
errbackmethod, then the stage 0 errback is called.
- If stage N fails, then the stage N+1 errback is called with the exception (wrapped in a
Failure) as the first argument.
- If stage N succeeds, then the stage N+1 callback is called with the stage N return value as the first argument.
This pattern is illustrated in Figure 17:
The green lines indicate what happens when a callback or errback succeeds and the red lines are for failures. The lines show both the flow of control and the flow of exceptions and return values down the chain. Figure 17 shows all possible paths a deferred might take, but only one path will be taken in any particular case. Figure 18 shows one possible path for a “firing”:
In figure 18, the deferred’s
callback method is called, which invokes the callback in stage 0. That callback succeeds, so control (and the return value from stage 0) passes to the stage 1 callback. But that callback fails (raises an exception), so control switches to the errback in stage 2. The errback “handles” the error (it doesn’t raise an exception) so control moves back to the callback chain and the callback in stage 3 is called with the result from the stage 2 errback.
Notice that any path you can make with Figure 17 will pass through every stage in the chain, but only one member of the callback/errback pair at any stage will be called.
In Figure 18, we’ve indicated that the stage 3 callback succeeds by drawing a green arrow out of it, but since there aren’t any more stages in that deferred, the result of stage 3 doesn’t really go anywhere. If the callback succeeds, that’s not really a problem, but what if it had failed? If the last stage in a deferred fails, then we say the failure is unhandled, since there is no errback to “catch” it.
In synchronous code an unhandled exception will crash the interpreter, and in plain-old-callbacks asynchronous code an unhandled exception is caught by the reactor and logged. What happens to unhandled exceptions in deferreds? Let’s try it out and see. Look at the sample code in twisted-deferred/defer-unhandled.py. That code is firing a deferred with a single callback that always raises an exception. Here’s the output of the program:
Finished Unhandled error in Deferred: Traceback (most recent call last): ... --- <exception caught here> --- ... exceptions.Exception: oops
Some things to notice:
- The last
- That means the Traceback is just getting printed out, it’s not crashing the interpreter.
- The text of the traceback tells us where the deferred itself caught the exception.
- The “Unhandled” message gets printed out after “Finished”.
So when you use deferreds, unhandled exceptions in callbacks will still be noted, for debugging purposes, but as usual they won’t crash the program (in fact they won’t even make it to the reactor, the deferred will catch them first). By the way, the reason that “Finished” comes first is because the “Unhandled” message isn’t actually printed until the deferred is garbage collected. We’ll see the reason for that in a future Part.
Now, in synchronous code we can “re-raise” an exception using the
raise keyword without any arguments. Doing so raises the original exception we were handling and allows us to take some action on an error without completely handling it. It turns out we can do the same thing in an errback. A deferred will consider a callback/errback to have failed if:
- The callback/errback raises any kind of exception, or
- The callback/errback returns a
Since an errback’s first argument is always a
Failure, an errback can “re-raise” the exception by returning its first argument, after performing whatever action it wants to take.
Callbacks and Errbacks, Two by Two
One thing that should be clear from the above discussion is that the order you add callbacks and errbacks to a deferred makes a big difference in how the deferred will fire. What should also be clear is that, in a deferred, callbacks and errbacks always occur in pairs. There are four methods on the
Deferred class you can use to add pairs to the chain:
Obviously, the first and last methods add a pair to the chain. But the middle two methods also add a callback/errback pair. The
addCallback method adds an explicit callback (the one you pass to the method) and an implicit “pass-through” errback. A pass-through function is a dummy function that just returns its first argument. Since the first argument to an errback is always a
Failure, a pass-through errback will always “fail” and send its error to the next errback in the chain.
As you’ve no doubt guessed, the
addErrback function adds an explicit errback and an implicit pass-through callback. And since the first argument to a callback is never a
Failure, a pass-through callback sends its result to the next callback in the chain.
The Deferred Simulator
It’s a good idea to become familiar with the way deferreds fire their callbacks and errbacks. The python script in twisted-deferred/deferred-simulator.py is a “deferred simulator”, a little python program that lets you explore how deferreds fire. When you run the script it will ask you to enter list of callback and errback pairs, one per line. For each callback or errback, you specify that either:
- It returns a given value (succeds), or
- It raises a given exception (fails), or
- It returns its argument (passthru).
After you’ve entered all the pairs you want to simulate, the script will print out, in high-resolution ASCII art, a diagram showing the contents of the chain and the firing patterns for the
errback methods. You will want to use a terminal window that is as wide as possible to see everything correctly. You can also use the --narrow option to print the diagrams one after the other, but it’s easier to see their relationships when you print them side-by-side.
Of course, in real code a callback isn’t going to return the same value every time, and a given function might sometimes succeed and other times fail. But the simulator can give you a picture of what will happen for a given combination of normal results and failures, in a given arrangement of callbacks and errbacks.
After thinking some more about callbacks, we realize that letting callback exceptions bubble up the stack isn’t going to work out so well, since callback programming inverts the usual relationship between low-context and high-context code. And the Deferred class tackles this problem by catching exceptions and sending them down the chain instead of up into the reactor.
We’ve also learned that ordinary results (
return values) move down the chain as well. Combining both facts together results in a kind of criss-cross firing pattern as the deferred switches back and forth between the callback and errback lines, depending on the result of each stage.
Armed with this knowledge, in Part 10 we will update our poetry client with some poetry transformation logic.
- Inspect the implementation of each of the four methods on the
Deferredwhich add callbacks and errbacks. Verify that all methods add a callback/errback pair.
- Use the deferred simulator to investigate the difference between this code:
and this code:
Recall that the last two methods add implicit pass-through functions as one member of the pair.