A generally thoughtful critique of web and open culture. Lanier always has an interesting perspective on things. He’s at his best when articulating his alternative vision, rather than attacking the present, I would say.
Part 18: Deferreds En Masse
In the last Part we learned a new way of structuring sequential asynchronous callbacks using a generator. Thus, including deferreds, we now have two techniques for chaining asynchronous operations together.
Sometimes, though, we want to run a group of asynchronous operations in “parallel”. Since Twisted is single-threaded they won’t really run concurrently, but the point is we want to use asynchronous I/O to work on a group of tasks as fast as possible. Our poetry clients, for example, download poems from multiple servers at the same time, rather than one server after another. That was the whole point of using Twisted for getting poetry, after all.
And, as a result, all our poetry clients have had to solve this problem: how do you know when all the asynchronous operations you have started are done? So far we have solved this by collecting our results into a list (like the
results list in client 7.0) and checking the length of the list. We have to be careful to collect failures as well as successful results, otherwise a single failure will cause the program to run forever, thinking there’s still work left to do.
As you might expect, Twisted includes an abstraction you can use to solve this problem and we’re going to take a look at it today.
DeferredList class allows us to treat a list of deferred objects as a single deferred. That way we can start a bunch of asynchronous operations and get notified only when all of them have finished (regardless of whether they succeeded or failed). Let’s look at some examples.
In deferred-list/deferred-list-1.py you will find this code:
from twisted.internet import defer def got_results(res): print 'We got:', res print 'Empty List.' d = defer.DeferredList() print 'Adding Callback.' d.addCallback(got_results)
And if you run it, you will get this output:
Empty List. Adding Callback. We got: 
Some things to notice:
DeferredListis created from a Python
list. In this case the list is empty, but we’ll soon see that the list elements must all be
DeferredListis itself a deferred (it inherits from
Deferred). That means you can add callbacks and errbacks to it just like you would a regular deferred.
- In the example above, our callback was fired as soon as we added it, so the
DeferredListmust have fired right away. We’ll discuss that more in a second.
- The result of the deferred list was itself a list (empty).
Now look at deferred-list/deferred-list-2.py:
from twisted.internet import defer def got_results(res): print 'We got:', res print 'One Deferred.' d1 = defer.Deferred() d = defer.DeferredList([d1]) print 'Adding Callback.' d.addCallback(got_results) print 'Firing d1.' d1.callback('d1 result')
Now we are creating our
DeferredList with a 1-element list containing a single deferred. Here’s the output we get:
One Deferred. Adding Callback. Firing d1. We got: [(True, 'd1 result')]
More things to notice:
- This time the
DeferredListdidn’t fire its callback until we fired the deferred in the list.
- The result is still a list, but now it has one element.
- The element is a tuple whose second value is the result of the deferred in the list.
Let’s try putting two deferreds in the list (deferred-list/deferred-list-3.py):
from twisted.internet import defer def got_results(res): print 'We got:', res print 'Two Deferreds.' d1 = defer.Deferred() d2 = defer.Deferred() d = defer.DeferredList([d1, d2]) print 'Adding Callback.' d.addCallback(got_results) print 'Firing d1.' d1.callback('d1 result') print 'Firing d2.' d2.callback('d2 result')
And here’s the output:
Two Deferreds. Adding Callback. Firing d1. Firing d2. We got: [(True, 'd1 result'), (True, 'd2 result')]
At this point it’s pretty clear the result of a
DeferredList, at least for the way we’ve been using it, is a list with the same number of elements as the list of deferreds we passed to the constructor. And the elements of that result list contain the results of the original deferreds, at least if the deferreds succeed. That means the
DeferredList itself doesn’t fire until all the deferreds in the original list have fired. And a
DeferredList created with an empty list fires right away since there aren’t any deferreds to wait for.
What about the order of the results in the final list? Consider deferred-list/deferred-list-4.py:
from twisted.internet import defer def got_results(res): print 'We got:', res print 'Two Deferreds.' d1 = defer.Deferred() d2 = defer.Deferred() d = defer.DeferredList([d1, d2]) print 'Adding Callback.' d.addCallback(got_results) print 'Firing d2.' d2.callback('d2 result') print 'Firing d1.' d1.callback('d1 result')
Now we are firing
d2 first and then
d1. Note the deferred list is still constructed with
d2 in their original order. Here’s the output:
Two Deferreds. Adding Callback. Firing d2. Firing d1. We got: [(True, 'd1 result'), (True, 'd2 result')]
The output list has the results in the same order as the original list of deferreds, not the order those deferreds happened to fire in. Which is very nice, because we can easily associate each individual result with the operation that generated it (for example, which poem came from which server).
Alright, what happens if one or more of the deferreds in the list fails? And what are those
True values doing there? Let’s try the example in deferred-list/deferred-list-5.py:
from twisted.internet import defer def got_results(res): print 'We got:', res d1 = defer.Deferred() d2 = defer.Deferred() d = defer.DeferredList([d1, d2], consumeErrors=True) d.addCallback(got_results) print 'Firing d1.' d1.callback('d1 result') print 'Firing d2 with errback.' d2.errback(Exception('d2 failure'))
Now we are firing
d1 with a normal result and
d2 with an error. Ignore the
consumerErrors option for now, we’ll get back to it. Here’s the output:
Firing d1. Firing d2 with errback. We got: [(True, 'd1 result'), (False, <twisted.python.failure.Failure <type 'exceptions.Exception'>>)]
Now the tuple corresponding to
d2 has a
Failure in slot two, and
False in slot one. At this point it should be pretty clear how a
DeferredList works (but see the Discussion below):
DeferredListis constructed with a list of deferred objects.
DeferredListis itself a deferred whose result is a list of the same length as the list of deferreds.
DeferredListfires after all the deferreds in the original list have fired.
- Each element of the result list corresponds to the deferred in the same position as the original list. If that deferred succeeded, the element is
(True, result)and if the deferred failed, the element is
DeferredListnever fails, since the result of each individual deferred is collected into the list no matter what (but again, see the Discussion below).
Now let’s talk about that
consumeErrors option we passed to the
DeferredList. If we run the same code but without passing the option (deferred-list/deferred-list-6.py), we get this output:
Firing d1. Firing d2 with errback. We got: [(True, 'd1 result'), (False, >twisted.python.failure.Failure >type 'exceptions.Exception'<<)] Unhandled error in Deferred: Traceback (most recent call last): Failure: exceptions.Exception: d2 failure
If you recall, the “Unhandled error in Deferred” message is generated when a deferred is garbage collected and the last callback in that deferred failed. The message is telling us we haven’t caught all the potential asynchronous failures in our program. So where is it coming from in our example? It’s clearly not coming from the
DeferredList, since that succeeds. So it must be coming from
DeferredList needs to know when each deferred it is monitoring fires. And the
DeferredList does that in the usual way — by adding a callback and errback to each deferred. And by default, the callback (and errback) return the original result (or failure) after putting it in the final list. And since returning the original failure from the errback triggers the next errback,
d2 remains in the failed state after it fires.
But if we pass
consumeErrors=True to the
DeferredList, the errback added by the
DeferredList to each deferred will instead return
None, thus “consuming” the error and eliminating the warning message. We could also handle the error by adding our own errback to
d2, as in deferred-list/deferred-list-7.py.
Version 8.0 of our Get Poetry Now! client uses a
DeferredList to find out when all the poetry has finished (or failed). You can find the new client in twisted-client-8/get-poetry.py. Once again the only change is in
poetry_main. Let’s look at the important changes:
... ds =  for (host, port) in addresses: d = get_transformed_poem(host, port) d.addCallbacks(got_poem) ds.append(d) dlist = defer.DeferredList(ds, consumeErrors=True) dlist.addCallback(lambda res : reactor.stop())
You may wish to compare it to the same section of
In client 8.0, we don’t need the
poem_done callback or the
results list. Instead, we put each deferred we get back from
get_transformed_poem into a list (
ds) and then create a
DeferredList. Since the
DeferredList won’t fire until all the poems have finished or failed, we just add a callback to the
DeferredList to shutdown the reactor. In this case, we aren’t using the result from the
DeferredList, we just need to know when everything is finished. And that’s it!
We can visualize how a
DeferredList works in Figure 37:
Pretty simple, really. There are a couple options to
DeferredList we haven’t covered, and which change the behavior from what we have described above. We will leave them for you to explore in the Exercises below.
In the next Part we will cover one more feature of the
Deferred class, a feature recently introduced in Twisted 10.1.0.
- Read the source code for the
- Modify the examples in deferred-list to experiment with the optional constructor arguments
fireOnOneErrback. Come up with scenarios where you would use one or the other (or both).
- Can you create a
DeferredListusing a list of
DeferredLists? If so, what would the result look like?
- Modify client 8.0 so that it doesn’t print out anything until all the poems have finished downloading. This time you will use the result from the
- Define the semantics of a
DeferredDictand then implement it.
Part 17: Just Another Way to Spell “Callback”
In this Part we’re going to return to the subject of callbacks. We’ll introduce another technique for writing callbacks in Twisted that uses generators. We’ll show how the technique works and contrast it with using “pure” Deferreds. Finally we’ll rewrite one of our poetry clients using this technique. But first let’s review how generators work so we can see why they are a candidate for creating callbacks.
A Brief Review of Generators
As you probably know, a Python generator is a “restartable function” that you create by using the
yield expression in the body of your function. By doing so, the function becomes a “generator function” that returns an iterator you can use to run the function in a series of steps. Each cycle of the iterator restarts the function, which proceeds to execute until it reaches the next
Generators (and iterators) are often used to represent lazily-created sequences of values. Take a look at the example code in inline-callbacks/gen-1.py:
def my_generator(): print 'starting up' yield 1 print "workin'" yield 2 print "still workin'" yield 3 print 'done' for n in my_generator(): print n
Here we have a generator that creates the sequence 1, 2, 3. If you run the code, you will see the
for loop as the loop cycles through the generator.
We can make this code more explicit by creating the generator ourselves (inline-callbacks/gen-2.py):
def my_generator(): print 'starting up' yield 1 print "workin'" yield 2 print "still workin'" yield 3 print 'done' gen = my_generator() while True: try: n = gen.next() except StopIteration: break else: print n
Considered as a sequence, the generator is just an object for getting successive values. But we can also view things from the point of view of the generator itself:
- The generator function doesn’t start running until “called” by the loop (using the
- Once the generator is running, it keeps running until it “returns” to the loop (using
- When the loop is running other code (like the
- When the generator is running, the loop is not running (it’s “blocked” waiting for the generator).
- Once a generator
yields control to the loop, an arbitrary amount of time may pass (and an arbitrary amount of other code may execute) until the generator runs again.
This is very much like the way callbacks work in an asynchronous system. We can think of the
while loop as the reactor, and the generator as a series of callbacks separated by
yield statements, with the interesting fact that all the callbacks share the same local variable namespace, and the namespace persists from one callback to the next.
Furthermore, you can have multiple generators active at once (see the example in inline-callbacks/gen-3.py), with their “callbacks” interleaved with each other, just as you can have independent asynchronous tasks running in a system like Twisted.
Something is still missing, though. Callbacks aren’t just called by the reactor, they also receive information. When part of a deferred’s chain, a callback either receives a result, in the form of a single Python value, or an error, in the form of a
Starting with Python 2.5, generators were extended in a way that allows you to send information to a generator when you restart it, as illustrated in inline-callbacks/gen-4.py:
class Malfunction(Exception): pass def my_generator(): print 'starting up' val = yield 1 print 'got:', val val = yield 2 print 'got:', val try: yield 3 except Malfunction: print 'malfunction!' yield 4 print 'done' gen = my_generator() print gen.next() # start the generator print gen.send(10) # send the value 10 print gen.send(20) # send the value 20 print gen.throw(Malfunction()) # raise an exception inside the generator try: gen.next() except StopIteration: pass
In Python 2.5 and later versions, the
yield statement is an expression that evaluates to a value. And the code that restarts the generator can determine that value using the
send method instead of
next (if you use
next the value is
None). What’s more, you can actually raise an arbitrary exception inside the generator using the
throw method. How cool is that?
Given what we just reviewed about
throwing values and exceptions into a generator, we can envision a generator as a series of callbacks, like the ones in a deferred, which receive either results or failures. The callbacks are separated by
yields and the value of each
yield expression is the result for the next callback (or the
yield raises an exception and that’s the failure). Figure 35 shows the correspondence:
Now when a series of callbacks is chained together in a deferred, each callback receives the result from the one prior. That’s easy enough to do with a generator — just
send the value you got from the previous run of the generator (the value it
yielded) the next time you restart it. But that also seems a bit silly. Since the generator computed the value to begin with, why bother sending it back? The generator could just save the value in a variable for the next time it’s needed. So what’s the point?
Recall the fact we learned in Part 13, that the callbacks in a deferred can return deferreds themselves. And when that happens, the outer deferred is paused until the inner deferred fires, and then the next callback (or errback) in the outer deferred’s chain is called with the result (or failure) from the inner deferred.
So imagine that our generator
yields a deferred object instead of an ordinary Python value. The generator is now “paused”, and that’s automatic; generators always pause after every
yield statement until they are explicitly restarted. So we can delay restarting the generator until the deferred fires, at which point we either
send the value (if the deferred succeeds) or
throw the exception (if the deferred fails). That would make our generator a genuine sequence of asynchronous callbacks and that’s the idea behind the
inlineCallbacks function in
Consider the example program in inline-callbacks/inline-callbacks-1.py:
from twisted.internet.defer import inlineCallbacks, Deferred @inlineCallbacks def my_callbacks(): from twisted.internet import reactor print 'first callback' result = yield 1 # yielded values that aren't deferred come right back print 'second callback got', result d = Deferred() reactor.callLater(5, d.callback, 2) result = yield d # yielded deferreds will pause the generator print 'third callback got', result # the result of the deferred d = Deferred() reactor.callLater(5, d.errback, Exception(3)) try: yield d except Exception, e: result = e print 'fourth callback got', repr(result) # the exception from the deferred reactor.stop() from twisted.internet import reactor reactor.callWhenRunning(my_callbacks) reactor.run()
Run the example and you will see the generator execute to the end and then stop the reactor. The example illustrates several aspects of the
inlineCallbacks function. First,
inlineCallbacks is a decorator and it always decorates generator functions, i.e., functions that use
yield. The whole purpose of
inlineCallbacks is turn a generator into a series of asynchronous callbacks according to the scheme we outlined before.
Second, when we invoke an
inlineCallbacks-decorated function, we don’t need to call
throw ourselves. The decorator takes care of those details for us and ensures the generator will run to the end (assuming it doesn’t raise an exception).
Third, if we
yield a non-deferred value from the generator, it is immediately restarted with that same value as the result of the
And finally, if we
yield a deferred from the generator, it will not be restarted until that deferred fires. If the deferred succeeds, the result of the
yield is just the result from the deferred. And if the deferred fails, the
yield statement raises the exception. Note the exception is just an ordinary
Exception object, rather than a
Failure, and we can catch it with a
except statement around the
In the example we are just using
callLater to fire the deferreds after a short period of time. While that’s a handy way to put in a non-blocking delay into our callback chain, normally we would be
yielding a deferred returned by some other asynchronous operation (i.e.,
get_poetry) invoked from our generator.
Ok, now we know how an
inlineCallbacks-decorated function runs, but what return value do you get if you actually call one? As you might have guessed, you get a deferred. Since we can’t know exactly when that generator will stop running (it might
yield one or more deferreds), the decorated function itself is asynchronous and a deferred is the appropriate return value. Note the deferred that is returned isn’t one of the deferreds the generator may
yield. Rather, it’s a deferred that fires only after the generator has completely finished (or throws an exception).
If the generator throws an exception, the returned deferred will fire its errback chain with that exception wrapped in a
Failure. But if we want the generator to return a normal value, we must “return” it using the
defer.returnValue function. Like the ordinary
return statement, it will also stop the generator (it actually raises a special exception). The inline-callbacks/inline-callbacks-2.py example illustrates both possibilities.
inlineCallbacks to work with a new version of our poetry client. You can see the code in twisted-client-7/get-poetry.py. You may wish to compare it to client 6.0 in twisted-client-6/get-poetry.py. The relevant changes are in
def poetry_main(): addresses = parse_args() xform_addr = addresses.pop(0) proxy = TransformProxy(*xform_addr) from twisted.internet import reactor results =  @defer.inlineCallbacks def get_transformed_poem(host, port): try: poem = yield get_poetry(host, port) except Exception, e: print >>sys.stderr, 'The poem download failed:', e raise try: poem = yield proxy.xform('cummingsify', poem) except Exception: print >>sys.stderr, 'Cummingsify failed!' defer.returnValue(poem) def got_poem(poem): print poem def poem_done(_): results.append(_) if len(results) == len(addresses): reactor.stop() for address in addresses: host, port = address d = get_transformed_poem(host, port) d.addCallbacks(got_poem) d.addBoth(poem_done) reactor.run()
In our new version the
inlineCallbacks generator function
get_transformed_poem is responsible for both fetching the poem and then applying the transformation (via the transform service). Since both operations are asynchronous, we yield a deferred each time and then (implicitly) wait for the result. As in client 6.0, if the transformation fails we just return the original poem. Notice we can use
except statements to handle asynchronous errors inside the generator.
We can test the new client out in the same way as before. First start up a transform server:
python twisted-server-1/transformedpoetry.py --port 10001
Then start a couple of poetry servers:
python twisted-server-1/fastpoetry.py --port 10002 poetry/fascination.txt python twisted-server-1/fastpoetry.py --port 10003 poetry/science.txt
Now you can run the new client:
python twisted-client-7/get-poetry.py 10001 10002 10003
Try turning off one or more of the servers to see how the client handles errors.
Deferred object, the
inlineCallbacks function gives us a new way of organizing our asynchronous callbacks. And, as with deferreds,
inlineCallbacks doesn’t change the rules of the game. Specifically, our callbacks still run one at a time, and they are still invoked by the reactor. We can confirm that fact in our usual way by printing out a traceback from an inline callback, as in the example script inline-callbacks/inline-callbacks-tb.py. Run that code and you will get a traceback with
reactor.run() at the top, lots of helper functions in between, and our callback at the bottom.
We can adapt Figure 29, which explains what happens when one callback in a deferred returns another deferred, to show what happens when an
yields a deferred. See Figure 36:
The same figure works in both cases because the idea being illustrated is the same — one asynchronous operation is waiting for another.
inlineCallbacks and deferreds solve many of the same problems, why choose one over the other? Here are some potential advantages of
- Since the callbacks share a namespace, there is no need to pass extra state around.
- The callback order is easier to see, as they just execute from top to bottom.
- With no function declarations for individual callbacks and implicit flow-control, there is generally less typing.
- Errors are handled with the familiar
And here are some potential pitfalls:
- The callbacks inside the generator cannot be invoked individually, which could make code re-use difficult. With a deferred, the code constructing the deferred is free to add arbitrary callbacks in an arbitrary order.
- The compact form of a generator can obscure the fact that an asynchronous callback is even involved. Despite its visually similar appearance to an ordinary sequential function, a generator behaves in a very different manner. The
inlineCallbacksfunction is not a way to avoid learning the asynchronous programming model.
As with any technique, practice will provide the experience necessary to make an informed choice.
In this Part we learned about the
inlineCallbacks decorator and how it allows us to express a sequence of asynchronous callbacks in the form of a Python generator.
In Part 18 we will learn a technique for managing a set of “parallel” asynchronous operations.
- Why is the
- Study the implementation of
inlineCallbacksand its helper function
_inlineCallbacks. Ponder the phrase “the devil is in the details”.
- How many callbacks are contained in a generator with N
yieldstatements, assuming it has no loops or
- Poetry client 7.0 might have three generators running at once. Conceptually, how many different ways might they be interleaved with one another? Considering the way they are invoked in the poetry client and the implementation of
inlineCallbacks, how many ways do you think are actually possible?
- Move the
got_poemcallback in client 7.0 inside the generator.
- Then move the
poem_donecallback inside the generator. Be careful! Make sure to handle all the failure cases so the reactor gets shutdown no matter what. How does the resulting code compare to using a deferred to shutdown the reactor?
- A generator with
yieldstatements inside a
whileloop can represent a conceptually infinite sequence. What does such a generator decorated with
Today I made my blog my main website, since the old one was getting kind of crufty. I moved the stuff I wanted to keep into WordPress, including my collection of links to programmer’s editors. The old /blog URLs will continue to work.