Categories
Books

Book: Let the Great World Spin

A very good collection of stories, loosely connected by Philippe Petit’s high-wire walk across the Twin Towers in 1974.

Categories
Books

Book: Bambi vs. Godzilla

Brilliant. The art and the industry of film, under the microscope of a master.

Categories
Books

Book: A Reporter’s Life

Really liked this one. Some great stories and a readable style, in a quick dash through much of the twentieth century.

Categories
Books

Book: The Meaning of the Constitution

I very much enjoyed this short but illuminating commentary on the complete text of the US Constitution. It starts with a brief historical narrative and then proceeds article by article, and then amendment by amendment, explaining the major case history surrounding each.

One thing I never noticed before: the US Constitution contains the word “speedy”. How cool is that?

Categories
Books Erlang Programming Software

Book: Erlang Programming

This is a nice complement to the Armstrong book. In particular, it has a great discussion of the Erlang tracing mechanisms.

Categories
Books

Book: The Hakawati

A wonderful feat of story-telling.

Categories
Blather Programming Python Software

When a Deferred Isn’t

Part 14: When a Deferred Isn’t

This continues the introduction started here. You can find an index to the entire series here.

Introduction

In this part we’re going to learn another aspect of the Deferred class. To motivate the discussion, we’ll add one more server to our stable of poetry-related services. Suppose we have a large number of internal clients who want to get poetry from the same external server. But this external server is slow and already over-burdened by the insatiable demand for poetry across the Internet. We don’t want to contribute to that poor server’s problems by sending all our clients there too.

So instead we’ll make a caching proxy server. When a client connects to the proxy, the proxy will either fetch the poem from the external server or return a cached copy of a previously retrieved poem. Then we can point all our clients at the proxy and our contribution to the external server’s load will be negligible. We illustrate this setup in Figure 30:

Figure 30: a caching proxy server
Figure 30: a caching proxy server

Consider what happens when a client connects to the proxy to get a poem. If the proxy’s cache is empty, the proxy must wait (asynchronously) for the external server to respond before sending a poem back. So far so good, we already know how to handle that situation with an asynchronous function that returns a deferred. On the other hand, if there’s already a poem in the cache, the proxy can send it back immediately, no need to wait at all.  So the proxy’s internal mechanism for getting a poem will sometimes be asynchronous and sometimes synchronous.

So what do we do if we have a function that is only asynchronous some of the time? Twisted provides a couple of options, and they both depend on a feature of the Deferred class we haven’t used yet: you can fire a deferred before you return it to the caller.

This works because, although you cannot fire a deferred twice, you can add callbacks and errbacks to a deferred after it has fired. And when you do so, the deferred simply continues firing the chain from where it last left off. One important thing to note is an already-fired deferred may fire the new callback (or errback, depending on the state of the deferred) immediately, i.e., right when you add it.

Consider Figure 31, showing a deferred that has been fired:

Figure 31: a deferred that has been fired
Figure 31: a deferred that has been fired

If we were to add another callback/errback pair at this point, then the deferred would immediately fire the new callback, as in Figure 32:

Figure 32: the same deferred with a new callback
Figure 32: the same deferred with a new callback

The callback (not the errback) is fired because the previous callback succeeded. If it had failed (raised an Exception or returned a Failure) then the new errback would have been called instead.

We can test out this new feature with the example code in twisted-deferred/defer-11.py. Read and run that script to see how a deferred behaves when you fire it and then add callbacks. Note how in the first example each new callback is invoked immediately (you can tell from the order of the print output).

The second example in that script shows how we can pause() a deferred so it doesn’t fire the callbacks right away. When we are ready for the callbacks to fire, we call unpause(). That’s actually the same mechanism the deferred uses to pause itself when one of its callbacks returns another deferred. Nifty!

Proxy 1.0

Now let’s look at the first version of the poetry proxy in twisted-server-1/poetry-proxy.py. Since the proxy acts as both a client and a server, it has two pairs of Protocol/Factory classes, one for serving up poetry, and one for getting a poem from the external server. We won’t bother looking at the code for the client pair, it’s the same as in previous poetry clients.

But before we look at the server pair, we’ll look at the ProxyService, which the server-side protocol uses to get a poem:

class ProxyService(object):

    poem = None # the cached poem

    def __init__(self, host, port):
        self.host = host
        self.port = port

    def get_poem(self):
        if self.poem is not None:
            print 'Using cached poem.'
            return self.poem

        print 'Fetching poem from server.'
        factory = PoetryClientFactory()
        factory.deferred.addCallback(self.set_poem)
        from twisted.internet import reactor
        reactor.connectTCP(self.host, self.port, factory)
        return factory.deferred

    def set_poem(self, poem):
        self.poem = poem
        return poem

The key method there is get_poem. If there’s already a poem in the cache, that method just returns the poem itself. On the other hand, if we haven’t got a poem yet, we initiate a connection to the external server and return a deferred that will fire when the poem comes back. So get_poem is a function that is only asynchronous some of the time.

How do you handle a function like that? Let’s look at the server-side protocol/factory pair:

class PoetryProxyProtocol(Protocol):

    def connectionMade(self):
        d = maybeDeferred(self.factory.service.get_poem)
        d.addCallback(self.transport.write)
        d.addBoth(lambda r: self.transport.loseConnection())

class PoetryProxyFactory(ServerFactory):

    protocol = PoetryProxyProtocol

    def __init__(self, service):
        self.service = service

The factory is straightforward — it’s just saving a reference to the proxy service so that protocol instances can call the get_poem method. The protocol is where the action is. Instead of calling get_poem directly, the protocol uses a wrapper function from the twisted.internet.defer module named maybeDeferred.

The maybeDeferred function takes a reference to another function, plus some optional arguments to call that function with (we aren’t using any here). Then maybeDeferred will actually call that function and:

  • If the function returns a deferred, maybeDeferred returns that same deferred, or
  • If the function returns a Failure, maybeDeferred returns a new deferred that has been fired (via .errback) with that Failure, or
  • If the function returns a regular value, maybeDeferred returns a deferred that has already been fired with that value as the result, or
  • If the function raises an exception, maybeDeferred returns a deferred that has already been fired (via .errback()) with that exception wrapped in a Failure.

In other words, the return value from maybeDeferred is guaranteed to be a deferred, even if the function you pass in never returns a deferred at all. This allows us to safely call a synchronous function (even one that fails with an exception) and treat it like an asynchronous function returning a deferred.

Note 1: There will still be a subtle difference, though. A deferred returned by a synchronous function has already been fired, so any callbacks or errbacks you add will run immediately, rather than in some future iteration of the reactor loop.

Note 2: In hindsight, perhaps naming a function that always returns a deferred “maybeDeferred” was not the best choice, but there you go.

Once the protocol has a real deferred in hand, it can just add some callbacks that send the poem to the client and then close the connection. And that’s it for our first poetry proxy!

Running the Proxy

To try out the proxy, start up a poetry server, like this:

python twisted-server-1/fastpoetry.py --port 10001 poetry/fascination.txt

And now start a proxy server like this:

python twisted-server-1/poetry-proxy.py --port 10000 10001

It should tell you that it’s proxying poetry on port 10000 for the server on port 10001.
Now you can point a client at the proxy:

python twisted-client-4/get-poetry.py 10000

We’ll use an earlier version of the client that isn’t concerned with poetry transformations. You should see the poem appear in the client window and some text in the proxy window saying it’s fetching the poem from the server. Now run the client again and the proxy should confirm it is using the cached version of the poem, while the client should show the same poem as before.

Proxy 2.0

As we mentioned earlier, there’s an alternative way to implement this scheme. This is illustrated in Poetry Proxy 2.0, located in twisted-server-2/poetry-proxy.py. Since we can fire deferreds before we return them, we can make the proxy service return an already-fired deferred when there’s already a poem in the cache. Here’s the new version of the get_poem method on the proxy service:

    def get_poem(self):
        if self.poem is not None:
            print 'Using cached poem.'
            # return an already-fired deferred
            return succeed(self.poem)

        print 'Fetching poem from server.'
        factory = PoetryClientFactory()
        factory.deferred.addCallback(self.set_poem)
        from twisted.internet import reactor
        reactor.connectTCP(self.host, self.port, factory)
        return factory.deferred

The defer.succeed function is just a handy way to make an already-fired deferred given a result. Read the implementation for that function and you’ll see it’s simply a matter of making a new deferred and then firing it with .callback(). If we wanted to return an already-failed deferred we could use defer.fail instead.

In this version, since get_poem always returns a deferred, the protocol class no longer needs to use maybeDeferred (though it would still work if it did, as we learned above):

class PoetryProxyProtocol(Protocol):

    def connectionMade(self):
        d = self.factory.service.get_poem()
        d.addCallback(self.transport.write)
        d.addBoth(lambda r: self.transport.loseConnection())

Other than these two changes, the second version of the proxy is just like the first, and you can run it in the same way we ran the original version.

Summary

In this Part we learned how deferreds can be fired before they are returned, and thus we can use them in synchronous (or sometimes synchronous) code. And we have two ways to do that:

  • We can use maybeDeferred to handle a function that sometimes returns a deferred and other times returns a regular value (or throws an exception), or
  • We can pre-fire our own deferreds, using defer.succeed and defer.fail, so our “semi-synchronous” functions always return a deferred no matter what.

Which technique we choose is really up to us. The former emphasizes the fact that our functions aren’t always asynchronous while the latter makes the client code simpler. Perhaps there’s not a definitive argument for choosing one over the other.

Both techniques are made possible because we can add callbacks and errbacks to a deferred after it has fired. And that explains the curious fact we discovered in Part 9 and the twisted-deferred/defer-unhandled.py example. We learned that an “unhandled error” in a deferred, in which either the last callback or errback fails, isn’t reported until the deferred is garbage collected (i.e., there are no more references to it in user code). Now we know why — since we could always add another callback pair to a deferred which does handle that error, it’s not until the last reference to a deferred is dropped that Twisted can say the error was not handled.

Now that you’ve spent so much time exploring the Deferred class, which is located in the twisted.internet package, you may have noticed it doesn’t actually have anything to do with the Internet. It’s just an abstraction for managing callbacks. So what’s it doing there? That is an artifact of Twisted’s history. In the best of all possible worlds (where I am paid millions of dollars to play in the World Ultimate Frisbee League), the defer module would probably be in twisted.python. Of course, in that world you would probably be too busy fighting crime with your super-powers to read this introduction. I suppose that’s life.

So is that it for deferreds? Do we finally know all their features? For the most part, we do. But Twisted includes alternate ways of using deferreds that we haven’t explored yet (we’ll get there!). And in the meantime, the Twisted developers have been beavering away adding new stuff. In an upcoming release, the Deferred class will acquire a brand new capability. We’ll introduce it in a future Part, but first we’ll take a break from deferreds and look at some other aspects of Twisted, including testing in Part 15.

Suggested Exercises

  1. Modify the twisted-deferred/defer-11.py example to illustrate pre-failing deferreds using .errback(). Read the documentation and implementation of the defer.fail function.
  2. Modify the proxy so that a cached poem older than 2 hours is discarded, causing the next poetry request to re-request it from the server
  3. The proxy is supposed to avoid contacting the server more than once, but if several client requests come in at the same time when there is no poem in the cache, the proxy will make multiple poetry requests. It’s easier to see if you use a slow server to test it out.

    Modify the proxy service so that only one request is generated. Right now the service only has two states: either the poem is in the cache or it isn’t. You will need to recognize a third state indicating a request has been made but not completed. When the get_poem method is called in the third state, add a new deferred to a list of ‘waiters’. That new deferred will be the result of the get_poem method. When the poem finally comes back, fire all the waiting deferreds with the poem and transition to the cached state. On the other hand, if the poem fails, fire the .errback() method of all the waiters and transition to the non-cached state.

  4. Add a transformation proxy to the proxy service. This service should work like the original transformation service, but use an external server to do the transformations.
  5. Consider this hypothetical piece of code:
    d = some_async_function() # d is a Deferred
    d.addCallback(my_callback)
    d.addCallback(my_other_callback)
    d.addErrback(my_errback)
    

    Suppose that when the deferred d is returned on line 1, it has not been fired. Is it possible for that deferred
    to fire while we are adding our callbacks and errback on lines 2-4? Why or why not?

Categories
Books

Book: Statistics: A Gentle Introduction

It certainly is gentle, and is spruced up with some interesting bits about the history of statistics. I would have liked more in-depth explanations of the statistical techniques, though. This book will teach you how to perform basic statistical analysis, but won’t really enlighten you as to how, and why, it works.