Categories
Blather Programming Python Software

When a Deferred Isn’t

Part 14: When a Deferred Isn’t

This continues the introduction started here. You can find an index to the entire series here.

Introduction

In this part we’re going to learn another aspect of the Deferred class. To motivate the discussion, we’ll add one more server to our stable of poetry-related services. Suppose we have a large number of internal clients who want to get poetry from the same external server. But this external server is slow and already over-burdened by the insatiable demand for poetry across the Internet. We don’t want to contribute to that poor server’s problems by sending all our clients there too.

So instead we’ll make a caching proxy server. When a client connects to the proxy, the proxy will either fetch the poem from the external server or return a cached copy of a previously retrieved poem. Then we can point all our clients at the proxy and our contribution to the external server’s load will be negligible. We illustrate this setup in Figure 30:

Figure 30: a caching proxy server
Figure 30: a caching proxy server

Consider what happens when a client connects to the proxy to get a poem. If the proxy’s cache is empty, the proxy must wait (asynchronously) for the external server to respond before sending a poem back. So far so good, we already know how to handle that situation with an asynchronous function that returns a deferred. On the other hand, if there’s already a poem in the cache, the proxy can send it back immediately, no need to wait at all.  So the proxy’s internal mechanism for getting a poem will sometimes be asynchronous and sometimes synchronous.

So what do we do if we have a function that is only asynchronous some of the time? Twisted provides a couple of options, and they both depend on a feature of the Deferred class we haven’t used yet: you can fire a deferred before you return it to the caller.

This works because, although you cannot fire a deferred twice, you can add callbacks and errbacks to a deferred after it has fired. And when you do so, the deferred simply continues firing the chain from where it last left off. One important thing to note is an already-fired deferred may fire the new callback (or errback, depending on the state of the deferred) immediately, i.e., right when you add it.

Consider Figure 31, showing a deferred that has been fired:

Figure 31: a deferred that has been fired
Figure 31: a deferred that has been fired

If we were to add another callback/errback pair at this point, then the deferred would immediately fire the new callback, as in Figure 32:

Figure 32: the same deferred with a new callback
Figure 32: the same deferred with a new callback

The callback (not the errback) is fired because the previous callback succeeded. If it had failed (raised an Exception or returned a Failure) then the new errback would have been called instead.

We can test out this new feature with the example code in twisted-deferred/defer-11.py. Read and run that script to see how a deferred behaves when you fire it and then add callbacks. Note how in the first example each new callback is invoked immediately (you can tell from the order of the print output).

The second example in that script shows how we can pause() a deferred so it doesn’t fire the callbacks right away. When we are ready for the callbacks to fire, we call unpause(). That’s actually the same mechanism the deferred uses to pause itself when one of its callbacks returns another deferred. Nifty!

Proxy 1.0

Now let’s look at the first version of the poetry proxy in twisted-server-1/poetry-proxy.py. Since the proxy acts as both a client and a server, it has two pairs of Protocol/Factory classes, one for serving up poetry, and one for getting a poem from the external server. We won’t bother looking at the code for the client pair, it’s the same as in previous poetry clients.

But before we look at the server pair, we’ll look at the ProxyService, which the server-side protocol uses to get a poem:

class ProxyService(object):

    poem = None # the cached poem

    def __init__(self, host, port):
        self.host = host
        self.port = port

    def get_poem(self):
        if self.poem is not None:
            print 'Using cached poem.'
            return self.poem

        print 'Fetching poem from server.'
        factory = PoetryClientFactory()
        factory.deferred.addCallback(self.set_poem)
        from twisted.internet import reactor
        reactor.connectTCP(self.host, self.port, factory)
        return factory.deferred

    def set_poem(self, poem):
        self.poem = poem
        return poem

The key method there is get_poem. If there’s already a poem in the cache, that method just returns the poem itself. On the other hand, if we haven’t got a poem yet, we initiate a connection to the external server and return a deferred that will fire when the poem comes back. So get_poem is a function that is only asynchronous some of the time.

How do you handle a function like that? Let’s look at the server-side protocol/factory pair:

class PoetryProxyProtocol(Protocol):

    def connectionMade(self):
        d = maybeDeferred(self.factory.service.get_poem)
        d.addCallback(self.transport.write)
        d.addBoth(lambda r: self.transport.loseConnection())

class PoetryProxyFactory(ServerFactory):

    protocol = PoetryProxyProtocol

    def __init__(self, service):
        self.service = service

The factory is straightforward — it’s just saving a reference to the proxy service so that protocol instances can call the get_poem method. The protocol is where the action is. Instead of calling get_poem directly, the protocol uses a wrapper function from the twisted.internet.defer module named maybeDeferred.

The maybeDeferred function takes a reference to another function, plus some optional arguments to call that function with (we aren’t using any here). Then maybeDeferred will actually call that function and:

  • If the function returns a deferred, maybeDeferred returns that same deferred, or
  • If the function returns a Failure, maybeDeferred returns a new deferred that has been fired (via .errback) with that Failure, or
  • If the function returns a regular value, maybeDeferred returns a deferred that has already been fired with that value as the result, or
  • If the function raises an exception, maybeDeferred returns a deferred that has already been fired (via .errback()) with that exception wrapped in a Failure.

In other words, the return value from maybeDeferred is guaranteed to be a deferred, even if the function you pass in never returns a deferred at all. This allows us to safely call a synchronous function (even one that fails with an exception) and treat it like an asynchronous function returning a deferred.

Note 1: There will still be a subtle difference, though. A deferred returned by a synchronous function has already been fired, so any callbacks or errbacks you add will run immediately, rather than in some future iteration of the reactor loop.

Note 2: In hindsight, perhaps naming a function that always returns a deferred “maybeDeferred” was not the best choice, but there you go.

Once the protocol has a real deferred in hand, it can just add some callbacks that send the poem to the client and then close the connection. And that’s it for our first poetry proxy!

Running the Proxy

To try out the proxy, start up a poetry server, like this:

python twisted-server-1/fastpoetry.py --port 10001 poetry/fascination.txt

And now start a proxy server like this:

python twisted-server-1/poetry-proxy.py --port 10000 10001

It should tell you that it’s proxying poetry on port 10000 for the server on port 10001.
Now you can point a client at the proxy:

python twisted-client-4/get-poetry.py 10000

We’ll use an earlier version of the client that isn’t concerned with poetry transformations. You should see the poem appear in the client window and some text in the proxy window saying it’s fetching the poem from the server. Now run the client again and the proxy should confirm it is using the cached version of the poem, while the client should show the same poem as before.

Proxy 2.0

As we mentioned earlier, there’s an alternative way to implement this scheme. This is illustrated in Poetry Proxy 2.0, located in twisted-server-2/poetry-proxy.py. Since we can fire deferreds before we return them, we can make the proxy service return an already-fired deferred when there’s already a poem in the cache. Here’s the new version of the get_poem method on the proxy service:

    def get_poem(self):
        if self.poem is not None:
            print 'Using cached poem.'
            # return an already-fired deferred
            return succeed(self.poem)

        print 'Fetching poem from server.'
        factory = PoetryClientFactory()
        factory.deferred.addCallback(self.set_poem)
        from twisted.internet import reactor
        reactor.connectTCP(self.host, self.port, factory)
        return factory.deferred

The defer.succeed function is just a handy way to make an already-fired deferred given a result. Read the implementation for that function and you’ll see it’s simply a matter of making a new deferred and then firing it with .callback(). If we wanted to return an already-failed deferred we could use defer.fail instead.

In this version, since get_poem always returns a deferred, the protocol class no longer needs to use maybeDeferred (though it would still work if it did, as we learned above):

class PoetryProxyProtocol(Protocol):

    def connectionMade(self):
        d = self.factory.service.get_poem()
        d.addCallback(self.transport.write)
        d.addBoth(lambda r: self.transport.loseConnection())

Other than these two changes, the second version of the proxy is just like the first, and you can run it in the same way we ran the original version.

Summary

In this Part we learned how deferreds can be fired before they are returned, and thus we can use them in synchronous (or sometimes synchronous) code. And we have two ways to do that:

  • We can use maybeDeferred to handle a function that sometimes returns a deferred and other times returns a regular value (or throws an exception), or
  • We can pre-fire our own deferreds, using defer.succeed and defer.fail, so our “semi-synchronous” functions always return a deferred no matter what.

Which technique we choose is really up to us. The former emphasizes the fact that our functions aren’t always asynchronous while the latter makes the client code simpler. Perhaps there’s not a definitive argument for choosing one over the other.

Both techniques are made possible because we can add callbacks and errbacks to a deferred after it has fired. And that explains the curious fact we discovered in Part 9 and the twisted-deferred/defer-unhandled.py example. We learned that an “unhandled error” in a deferred, in which either the last callback or errback fails, isn’t reported until the deferred is garbage collected (i.e., there are no more references to it in user code). Now we know why — since we could always add another callback pair to a deferred which does handle that error, it’s not until the last reference to a deferred is dropped that Twisted can say the error was not handled.

Now that you’ve spent so much time exploring the Deferred class, which is located in the twisted.internet package, you may have noticed it doesn’t actually have anything to do with the Internet. It’s just an abstraction for managing callbacks. So what’s it doing there? That is an artifact of Twisted’s history. In the best of all possible worlds (where I am paid millions of dollars to play in the World Ultimate Frisbee League), the defer module would probably be in twisted.python. Of course, in that world you would probably be too busy fighting crime with your super-powers to read this introduction. I suppose that’s life.

So is that it for deferreds? Do we finally know all their features? For the most part, we do. But Twisted includes alternate ways of using deferreds that we haven’t explored yet (we’ll get there!). And in the meantime, the Twisted developers have been beavering away adding new stuff. In an upcoming release, the Deferred class will acquire a brand new capability. We’ll introduce it in a future Part, but first we’ll take a break from deferreds and look at some other aspects of Twisted, including testing in Part 15.

Suggested Exercises

  1. Modify the twisted-deferred/defer-11.py example to illustrate pre-failing deferreds using .errback(). Read the documentation and implementation of the defer.fail function.
  2. Modify the proxy so that a cached poem older than 2 hours is discarded, causing the next poetry request to re-request it from the server
  3. The proxy is supposed to avoid contacting the server more than once, but if several client requests come in at the same time when there is no poem in the cache, the proxy will make multiple poetry requests. It’s easier to see if you use a slow server to test it out.

    Modify the proxy service so that only one request is generated. Right now the service only has two states: either the poem is in the cache or it isn’t. You will need to recognize a third state indicating a request has been made but not completed. When the get_poem method is called in the third state, add a new deferred to a list of ‘waiters’. That new deferred will be the result of the get_poem method. When the poem finally comes back, fire all the waiting deferreds with the poem and transition to the cached state. On the other hand, if the poem fails, fire the .errback() method of all the waiters and transition to the non-cached state.

  4. Add a transformation proxy to the proxy service. This service should work like the original transformation service, but use an external server to do the transformations.
  5. Consider this hypothetical piece of code:
    d = some_async_function() # d is a Deferred
    d.addCallback(my_callback)
    d.addCallback(my_other_callback)
    d.addErrback(my_errback)
    

    Suppose that when the deferred d is returned on line 1, it has not been fired. Is it possible for that deferred
    to fire while we are adding our callbacks and errback on lines 2-4? Why or why not?

26 replies on “When a Deferred Isn’t”

As always, great!

small typo “The key method there is get_poetry.” It is get_poem not get_poetry

Can’t wait for other parts 😉

Hi Dave

Your articles are so great – and I wish I had time to read them 🙂

An advanced(?) exercise that could be used here is to imagine that the proxy takes an identifier (perhaps just a string title) of the poem it is to fetch. The exercise is to think of a way to arrange for a request for poem P to receive the poem properly if a previous request (from another client) for P is already in flight. I.e., it’s neither of your 2 cases a) the poem is not cached or b) the poem is cached, it’s the intermediate one. Make sense?

I hit this situation and wrote a little decorator for it that I like a lot. It could use a little improvement, but the code at http://paste.pocoo.org/show/204769/ gets the job done.

You use it like:

class PoemProxy:

@DeferredPooler
def get_poem(self, poem):
# Do your normal stuff here, and return a deferred.

That’s it.

Thanks again!

Terry

I’ve written something similar myself 🙂 Here’s how I would describe it:

The DeferredPooler decorator collapses identical outstanding calls to the decorator
into single calls to the decorated function, which must return a deferred.

Two calls are identical if their arguments hash to the same value, where the keyword arguments
are first sorted. (And thus the arguments must be hashable). A call is outstanding if a deferred
returned by the decorated function has not yet fired.

Collapsed calls each receive identical callback values (or identical errback failures).
However, each call to the decorator returns a separate Deferred instance.

I wonder if it might be better in _callOthers to delete the key first, in case the callbacks trigger
another call to the decorator with the same arguments?

Maybe it’s the law of diminishing returns taking affect, but I have been going through your exercises for hours and haven’t had any problems until now. For exercise 3, the only change I made to the Service.get_poem was this. One client prints the full poem, and one client prints a blank string. Any hints on what else I could have missed? I absolutely love this site and your writing/teaching style. It works so well!

if self.request_made == True:
#request has been made but not completed
return self.factory.deferred

print ‘Fetching poem from server.’
self.factory = PoetryClientFactory()
self.factory.deferred.addCallback(self.set_poem)
from twisted.internet import reactor
reactor.connectTCP(self.host, self.port, self.factory)
self.request_made = True
return self.factory.deferred

Thanks for the kind words!

So consider that you are returning the same deferred
for calls that are ‘sharing’ the same request. Which
means that one deferred will have multiple callbacks
added to it. The second (and third, etc.) callbacks
will not receive the original result of the deferred,
but the result from the previous callback.

In this situation, I think you will want to return a
separate deferred object for each request, even if they
are ‘sharing’ the same underlying request to the server.
Otherwise, the different users of the service could end
up stepping on each others callbacks, if you see what I
mean.

What if you kept a list of deferreds instead?

Sometimes…you just need to sleep on it. I am not sure this is the best answer, but it works. If I could have done it better, let me know. Thanks so much!

class ProxyService(object):

poem = None # the cached poem
request_made = False
waiters = []

def __init__(self, host, port):
self.host = host
self.port = port

def get_poem(self):
if self.poem is not None:
print ‘Using cached poem.’
return self.poem

if self.request_made == True:
#request has been made but not completed
self.waiters.append(Deferred())
return self.waiters[-1]

print ‘Fetching poem from server.’
self.factory = PoetryClientFactory()
self.factory.deferred.addCallback(self.set_poem)
from twisted.internet import reactor
reactor.connectTCP(self.host, self.port, self.factory)
self.request_made = True
self.waiters.append(Deferred())
return self.waiters[-1]

def set_poem(self, poem):
self.request_made = False
self.poem = poem
for waiter in self.waiters:
waiter.callback(poem)

I think you got it! One thing to beware of is the class-level
attribute ‘waiters’. If there was more than one instance of
ProxyService in your program you wouldn’t want them to
share waiters amongst each other (since they would probably
be pointing to different servers).

Hi Dave, thanks a lot for a great tutorial!!
I was wondering, why do we need a lambda function in
d.addBoth(lambda r: self.transport.loseConnection())?

Hi Alex, that is because the transport.loseConnection method takes no arguments, but the callbacks and errbacks of a deferred are always given at least one argument: the result of the callback, or the failure of the errback. So we use the lambda to ‘swallow’ the result/error.

Hi dave, U R amazing 🙂
i haven’t had any problem with excercises until the third excercise 🙁
you told : ” … On the other hand, if the poem fails, fire the .errback() method of all the waiters and transition to the non-cached state. ”

My code is almost similar to Nathan’s code ( look above ). i don’t know how transfer an already created defer to non-cached state !
let me explain more :

“get_poem” returns a defer :
> if self.request_made == True:
> self.waiters.append(Deferred())
> return self.waiters[-1]
after that, “connectionMade” adds callback to defer :
> d.addCallback(self.transport.write)
> d.addBoth(lambda r: self.transport.loseConnection())

so i can’t transfer a defer which it’s first callback is “transport.write” to non-cached state !!!!
i hope i explained it clearly 🙂
thanks

Hello! Glad you like the series. What I mean by ‘non-cached state’ is that the caching service will re-attempt to get that poem when the next request for it comes in. In other words, it will not cache ‘errors’, only successful poems. Does that make sense?

Hi Dave, congrats and thanks for your work. It’s amazing.

I just have a very basic question:

I understand that every connection from a client to the proxy creates it’s own protocol instance.

Does this mean that there will never be a collision between variables in diferent connections such as ‘factory’ or ‘d’ (deferred object) even in the case that those connections happen simultaneous?.

Since the abstraction layer is pretty high I sometimes loose the track of what’s actually going on ‘beyond’, and reading code I feel that values in get_poetry() for one connection could be interfered by the arrival of another simultaneous connection which redefines these values.

Thanks.

Hi Andreu, glad you liked the tutorial!

Each connection in the server gets its own Protocol instance, so those are never shared.
There is, however, only on ProtocolFactory for the server on that connections, so each Protocol
instance will have a reference to the same factory. That allows the connections to share state.
To what extent they do is up to you.

Leave a Reply to NathanCancel reply

Discover more from krondo

Subscribe now to keep reading and get access to the full archive.

Continue reading