Categories
Blather Programming Python Software

Deferred All The Way Down

Part 13: Deferred All The Way Down

This continues the introduction started here. You can find an index to the entire series here.

Introduction

Recall poetry client 5.1 from Part 10.The client used a Deferred to manage a callback chain that included a call to a poetry transformation engine. In client 5.1, the engine was implemented as a synchronous function call implemented in the client itself.

Now we want to make a new client that uses the networked poetry transformation service we wrote in Part 12. But here’s the wrinkle: since the transformation service is accessed over the network, we’ll need to use asynchronous I/O. And that means our API for requesting a transformation will have to be asynchronous, too. In other words, the try_to_cummingsify callback is going to return a Deferred in our new client.

So what happens when a callback in a deferred’s chain returns another deferred? Let’s call the first deferred the ‘outer’ deferred and the second the ‘inner’ one. Suppose callback N in the outer deferred returns the inner deferred. That callback  is saying “I’m asynchronous, my result isn’t here yet”. Since the outer deferred needs to call the next callback or errback in the chain with the result, the outer deferred needs to wait until the inner deferred is fired. Of course, the outer deferred can’t block either, so instead the outer deferred suspends the execution of the callback chain and returns control to the reactor (or whatever fired the outer deferred).

And how does the outer deferred know when to resume? Simple — by adding a callback/errback pair to the inner deferred. Thus, when the inner deferred is fired the outer deferred will resume executing its chain. If the inner deferred succeeds (i.e., it calls the callback added by the outer deferred), then the outer deferred calls its N+1 callback with the result. And if the inner deferred fails (calls the errback added by the outer deferred), the outer deferred calls the N+1 errback with the failure.

That’s a lot to digest, so let’s illustrate the idea in Figure 28:

Figure 28: outer and inner deferred processing
Figure 28: outer and inner deferred processing

In this figure the outer deferred has 4 layers of callback/errback pairs. When the outer deferred fires, the first callback in the chain returns a deferred (the inner deferred). At that point, the outer deferred will stop firing its chain and return control to the reactor (after adding a callback/errback pair to the inner deferred). Then, some time later, the inner deferred fires and the outer deferred resumes processing its callback chain. Note the outer deferred does not fire the inner deferred itself. That would be impossible, since the outer deferred cannot know when the inner deferred’s result is available, or what that result might be. Rather, the outer deferred simply waits (asynchronously) for the inner deferred to fire.

Notice how the line connecting the callback to the inner deferred in Figure 28 is black instead of green or red. That’s because we don’t know whether the callback succeeded or failed until the inner deferred is fired. Only then can the outer deferred decide whether to call the next callback or the next errback in its own chain.

Figure 29 shows the same outer/inner deferred firing sequence in Figure 28 from the point of view of the reactor:

Figure 29: the thread of control in Figure 28
Figure 29: the thread of control in Figure 28

This is probably the most complicated feature of the Deferred class, so don’t worry if you need some time to absorb it. We’ll illustrate it one more way using the example code in twisted-deferred/defer-10.py. That example creates two outer deferreds, one with plain callbacks, and one where a single callback returns an inner deferred. By studying the code and the output you can see how the second outer deferred stops running its chain when the inner deferred is returned, and then starts up again when the inner deferred is fired.

Client 6.0

Let’s use our new knowledge of nested deferreds and re-implement our poetry client to use the network transformation service from Part 12. You can find the code in twisted-client-6/get-poetry.py. The poetry Protocol and Factory are unchanged from the previous version. But now we have a Protocol and Factory for making transformation requests. Here’s the transform client Protocol:

class TransformClientProtocol(NetstringReceiver):

    def connectionMade(self):
        self.sendRequest(self.factory.xform_name, self.factory.poem)

    def sendRequest(self, xform_name, poem):
        self.sendString(xform_name + '.' + poem)

    def stringReceived(self, s):
        self.transport.loseConnection()
        self.poemReceived(s)

    def poemReceived(self, poem):
        self.factory.handlePoem(poem)

Using the NetstringReceiver as a base class makes this implementation pretty simple. As soon as the connection is established we send the transform request to the server, retrieving the name of the transform and the poem from our factory. And when we get the poem back, we pass it on to the factory for processing. Here’s the code for the Factory:

class TransformClientFactory(ClientFactory):

    protocol = TransformClientProtocol

    def __init__(self, xform_name, poem):
        self.xform_name = xform_name
        self.poem = poem
        self.deferred = defer.Deferred()

    def handlePoem(self, poem):
        d, self.deferred = self.deferred, None
        d.callback(poem)

    def clientConnectionLost(self, _, reason):
        if self.deferred is not None:
            d, self.deferred = self.deferred, None
            d.errback(reason)

    clientConnectionFailed = clientConnectionLost

This factory is designed for clients and handles a single transformation request, storing both the transform name and the poem for use by the Protocol. The Factory creates a single Deferred which represents the result of the transformation request. Notice how the Factory handles two error cases: a failure to connect and a connection that is closed before the poem is received. Also note the clientConnectionLost method is called even if we receive the poem, but in that case self.deferred will be None, thanks to the handlePoem method.

This Factory class creates the Deferred that it also fires. That’s a good rule to follow in Twisted programming, so let’s highlight it:

In general, an object that makes a Deferred should also be in charge of firing that Deferred.

This “you make it, you fire it” rule helps ensure a given deferred is only fired once and makes it easier to follow the flow of control in a Twisted program.

In addition to the transform Factory, there is also a Proxy class which hides the details of making the TCP connection to a particular transform server:

class TransformProxy(object):
    """
    I proxy requests to a transformation service.
    """

    def __init__(self, host, port):
        self.host = host
        self.port = port

    def xform(self, xform_name, poem):
        factory = TransformClientFactory(xform_name, poem)
        from twisted.internet import reactor
        reactor.connectTCP(self.host, self.port, factory)
        return factory.deferred

This class presents a single xform() interface that other code can use to request transformations. So that other code can just request a transform and get a deferred back without mucking around with hostnames and port numbers.

The rest of the program is unchanged except for the try_to_cummingsify callback:

    def try_to_cummingsify(poem):
        d = proxy.xform('cummingsify', poem)

        def fail(err):
            print >>sys.stderr, 'Cummingsify failed!'
            return poem

        return d.addErrback(fail)

This callback now returns a deferred, but we didn’t have to change the rest of the main function at all, other than to create the Proxy instance. Since try_to_cummingsify was part of a deferred chain (the deferred returned by get_poetry), it was already being used asynchronously and nothing else need change.

You’ll note we are returning the result of d.addErrback(fail). That’s just a little bit of syntactic sugar. The addCallback and addErrback methods return the original deferred. We might just as well have written:

        d.addErrback(fail)
        return d

The first version is the same thing, just shorter.

Testing out the Client

The new client has a slightly different syntax than the others. If you have a transformation service running on port 10001 and two poetry servers running on ports 10002 and 10003, you would run:

python twisted-client-6/get-poetry.py 10001 10002 10003

To download two poems and transform them both. You can start the transform server like this:

python twisted-server-1/transformedpoetry.py --port 10001

And the poetry servers like this:

python twisted-server-1/fastpoetry.py --port 10002 poetry/fascination.txt
python twisted-server-1/fastpoetry.py --port 10003 poetry/science.txt

Then you can run the poetry client as above. After that, try crashing the transform server and re-running the client with the same command.

Wrapping Up

In this Part we learned how deferreds can transparently handle other deferreds in a callback chain, and thus we can safely add asynchronous callbacks to an ‘outer’ deferred without worrying about the details. That’s pretty handy since lots of our functions are going to end up being asynchronous.

Do we know everything there is to know about deferreds yet? Not quite! There’s one more important feature to talk about, but we’ll save it for Part 14.

Suggested Exercises

  1. Modify the client so we can ask for a specific kind of transformation by name.
  2. Modify the client so the transformation server address is an optional argument. If it’s not provided, skip the transformation step.
  3. The PoetryClientFactory currently violates the “you make it, you fire it” rule for deferreds. Refactor get_poetry and PoetryClientFactory to remedy that.
  4. Although we didn’t demonstrate it, the case where an errback returns a deferred is symmetrical. Modify the twisted-deferred/defer-10.py example to verify it.
  5. Find the place in the Deferred implementation that handles the case where a callback/errback returns another Deferred.
Categories
Blather Programming Python Software

A Poetry Transformation Server

Part 12: A Poetry Transformation Server

This continues the introduction started here. You can find an index to the entire series here.

One More Server

Alright, we’ve written one Twisted server so let’s write another, and then we’ll get back to learning some more about Deferreds.

In Parts 9 and 10 we introduced the idea of a poetry transformation engine. The one we eventually implemented, the cummingsifier, was so simple we had to add random exceptions to simulate a failure. But if the transformation engine was located on another server, providing a network “poetry transformation service”, then there is a much more realistic failure mode: the transformation server is down.

So in Part 12 we’re going to implement a poetry transformation server and then, in the next Part, we’ll update our poetry client to use the external transformation service and learn a few new things about Deferreds in the process.

Designing the Protocol

Up till now the interactions between client and server have been strictly one-way. The server sends a poem to the client while the client never sends anything at all to the server. But a transformation service is two-way — the client sends a poem to the server and then the server sends a transformed poem back. So we’ll need to use, or invent, a protocol to handle that interaction.

While we’re at it, let’s allow the server to support multiple kinds of transformations and allow the client to select which one to use. So the client will send two pieces of information: the name of the transformation and the complete text of the poem. And the server will return a single piece of information, namely the text of the transformed poem. So we’ve got a very simple sort of Remote Procedure Call.

Twisted includes support for several protocols we could use to solve this problem, including XML-RPC, Perspective Broker, and AMP.

But introducing any of these full-featured protocols would require us to go too far afield, so we’ll roll our own humble protocol instead. Let’s have the client send a string of the form (without the angle brackets):

<transform-name>.<text of the poem>

That’s just the name of the transform, followed by a period, followed by the complete text of the poem itself. And we’ll encode the whole thing in the form of a netstring. And the server will send back the text of the transformed poem, also in a netstring. Since netstrings use length-encoding, the client will be able to detect the case where the server fails to send back a complete result (maybe it crashed in the middle of the operation). If you recall, our original poetry protocol has trouble detecting aborted poetry deliveries.

So much for the protocol design. It’s not going to win any awards, but it’s good enough for our purposes.

The Code

Let’s look at the code of our transformation server, located in twisted-server-1/transformedpoetry.py. First, we define a TransformService class:

class TransformService(object):

    def cummingsify(self, poem):
        return poem.lower()

The transform service currently implements one transformation, cummingsify, via a method of the same name. We could add additional algorithms by adding additional methods. Here’s something important to notice: the transformation service is entirely independent of the particular details of the protocol we settled on earlier. Separating the protocol logic from the service logic is a common pattern in Twisted programming. Doing so makes it easy to provide the same service via multiple protocols without duplicating code.

Now let’s look at the protocol factory (we’ll look at the protocol right after):

class TransformFactory(ServerFactory):

    protocol = TransformProtocol

    def __init__(self, service):
        self.service = service

    def transform(self, xform_name, poem):
        thunk = getattr(self, 'xform_%s' % (xform_name,), None)

        if thunk is None: # no such transform
            return None

        try:
            return thunk(poem)
        except:
            return None # transform failed

    def xform_cummingsify(self, poem):
        return self.service.cummingsify(poem)

This factory provides a transform method which a protocol instance can use to request a poetry transformation on behalf of a connected client. The method returns None if there is no such transformation or if the transformation fails. And like the TransformService, the protocol factory is independent of the wire-level protocol, the details of which are delegated to the protocol class itself.

One thing to notice is the way we guard access to the service though the xform_-prefixed methods. This is a pattern you will find in the Twisted sources, although the prefixes vary and they are usually on an object separate from the factory. It’s one way of preventing client code from executing an arbitrary method on the service object, since the client can send any transform name they want. It also provides a place to perform protocol-specific adaptation to the API provided by the service object.

Now we’ll take a look at the protocol implementation:

class TransformProtocol(NetstringReceiver):

    def stringReceived(self, request):
        if '.' not in request: # bad request
            self.transport.loseConnection()
            return

        xform_name, poem = request.split('.', 1)

        self.xformRequestReceived(xform_name, poem)

    def xformRequestReceived(self, xform_name, poem):
        new_poem = self.factory.transform(xform_name, poem)

        if new_poem is not None:
            self.sendString(new_poem)

        self.transport.loseConnection()

In the protocol implementation we take advantage of the fact that Twisted supports netstrings via the NetstringReceiver protocol. That base class takes care of decoding (and encoding) the netstrings and all we have to do is implement the stringReceived method. In other words, stringReceived is called with the content of a netstring sent by the client, without the extra bytes added by the netstring encoding. The base class also takes care of buffering the incoming bytes until we have enough to decode a complete string.

If everything goes ok (and if it doesn’t we just close the connection) we send the transformed poem back to the client using the sendString method provided by NetstringReceiver (and which ultimately calls transport.write()). And that’s all there is to it. We won’t bother listing the main function since it’s similar to the ones we’ve seen before.

Notice how we continue the Twisted pattern of translating the incoming byte stream to higher and higher levels of abstraction by defining the xformRequestReceived method, which is passed the name of the transform and the poem as two separate arguments.

A Simple Client

We’ll implement a Twisted client for the transformation service in the next Part. For now we’ll just make do with a simple script located in twisted-server-1/transform-test. It uses the netcat program to send a poem to the server and then prints out the response (which will be encoded as a netstring). Let’s say you run the transformation server on port 11000 like this:

python twisted-server-1/transformedpoetry.py --port 11000

Then you could run the test script against that server like this:

./twisted-server-1/transform-test 11000

And you should see some output like this:

15:here is my poem,

That’s the netstring-encoded transformed poem (the original is in all upper case).

Discussion

We introduced a few new ideas in this Part:

  1. Two-way communication.
  2. Building on an existing protocol implementation provided by Twisted.
  3. Using a service object to separate functional logic from protocol logic.

The basic mechanics of two-way communication are simple. We used the same techniques for reading and writing data in previous clients and servers; the only difference is we used them both together. Of course, a more complex protocol will require more complex code to process the byte stream and format outgoing messages. And that’s a great reason to use an existing protocol implementation like we did today.

Once you start getting comfortable writing basic protocols, it’s a good idea to take a look at the different protocol implementations provided by Twisted. You might start by perusing the twisted.protocols.basic module and going from there. Writing simple protocols is a great way to familiarize yourself with the Twisted style of programming, but in a “real” program it’s probably a lot more common to use a ready-made implementation, assuming there is one available for the protocol you want to use.

The last new idea we introduced, the use of a Service object to separate functional and protocol logic, is a really important design pattern in Twisted programming. Although the service object we made today is trivial, you can imagine a more realistic network service could be quite complex. And by making the Service independent of protocol-level details, we can quickly provide the same service on a new protocol without duplicating code.

Figure 27 shows a transformation server that is providing poetry transformations via two different protocols (the version of the server we presented above only has one protocol):

Figure 27: a transformation server with two protocols
Figure 27: a transformation server with two protocols

Although we need two separate protocol factories in Figure 27, they might differ only in their protocol class attribute and would be otherwise identical. The factories would share the same Service object and only the Protocols themselves would require separate implementations. Now that’s code re-use!

Looking Ahead

So much for our transformation server. In Part 13, we’ll update our poetry client to use the transform server instead of implementing transformations in the client itself.

Suggested Exercises

  1. Read the source code for the NetstringReceiver class. What happens if the client sends a malformed netstring? What happens if the client tries to send a huge netstring?
  2. Invent another transformation algorithm and add it to the transformation service and the protocol factory. Test it out by modifying the netcat client.
  3. Invent another protocol for requesting poetry transformations and modify the server to handle both protocols (on two different ports). Use the same instance of the TransformService for both.
  4. How would the code need to change if the methods on the TransformService were asynchronous (i.e., they returned Deferreds)?
  5. Write a synchronous client for the transformation server.
  6. Update the original client and server to use netstrings when sending poetry.
Categories
Blather Programming Python Software

Your Poetry is Served

Part 11: Your Poetry is Served

This continues the introduction started here. You can find an index to the entire series here.

A Twisted Poetry Server

Now that we’ve learned so much about writing clients with Twisted, let’s turn around and re-implement our poetry server with Twisted too. And thanks to the generality of Twisted’s abstractions, it turns out we’ve already learned almost everything we need to know. Take a look at our Twisted poetry server located in twisted-server-1/fastpoetry.py. It’s called fastpoetry because this server sends the poetry as fast as possible, without any delays at all. Note there’s significantly less code than in the client!

Let’s take the pieces of the server one at a time. First, the PoetryProtocol:

class PoetryProtocol(Protocol):

    def connectionMade(self):
        self.transport.write(self.factory.poem)
        self.transport.loseConnection()

Like the client, the server uses a separate Protocol instance to manage each different connection (in this case, connections that clients make to the server). Here the Protocol is implementing the server-side portion of our poetry protocol. Since our wire protocol is strictly one-way, the server’s Protocol instance only needs to be concerned with sending data. If you recall, our wire protocol requires the server to start sending the poem immediately after the connection is made, so we implement the connectionMade method, a callback that is invoked after a Protocol instance is connected to a Transport.

Our method tells the Transport to do two things: send the entire text of the poem (self.transport.write) and close the connection (self.transport.loseConnection). Of course, both of those operations are asynchronous. So the call to write() really means “eventually send all this data to the client” and the call to loseConnection() really means “close this connection once all the data I’ve asked you to write has been written”.

As you can see, the Protocol retrieves the text of the poem from the Factory, so let’s look at that next:

class PoetryFactory(ServerFactory):

    protocol = PoetryProtocol

    def __init__(self, poem):
        self.poem = poem

Now that’s pretty darn simple. Our factory’s only real job, besides making PoetryProtocol instances on demand, is storing the poem that each PoetryProtocol sends to a client.

Notice that we are sub-classing ServerFactory instead of ClientFactory. Since our server is passively listening for connections instead of actively making them, we don’t need the extra methods ClientFactory provides. How can we be sure of that? Because we are using the listenTCP reactor method and the documentation for that method explains that the factory argument should be an instance of ServerFactory.

Here’s the main function where we call listenTCP:

def main():
    options, poetry_file = parse_args()

    poem = open(poetry_file).read()

    factory = PoetryFactory(poem)

    from twisted.internet import reactor

    port = reactor.listenTCP(options.port or 0, factory,
                             interface=options.iface)

    print 'Serving %s on %s.' % (poetry_file, port.getHost())

    reactor.run()

It basically does three things:

  1. Read the text of the poem we are going to serve.
  2. Create a PoetryFactory with that poem.
  3. Use listenTCP to tell Twisted to listen for connections on a port, and use our factory to make the protocol instances for each new connection.

After that, the only thing left to do is tell the reactor to start running the loop. You can use any of our previous poetry clients (or just netcat) to test out the server.

Discussion

Recall Figure 8 and Figure 9 from Part 5. Those figures illustrated how a new Protocol instance is created and initialized after Twisted makes a new connection on our behalf. It turns out the same mechanism is used when Twisted accepts a new incoming connection on a port we are listening on. That’s why both connectTCP and listenTCP require factory arguments.

One thing we didn’t show in Figure 9 is that the connectionMade callback is also called as part of Protocol initialization. This happens no matter what, but we didn’t need to use it in the client code. And the Protocol methods that we did use in the client aren’t used in the server’s implementation. So if we wanted to, we could make a shared library with a single PoetryProtocol that works for both clients and servers. That’s actually the way things are typically done in Twisted itself. For example, the NetstringReceiver Protocol can both read and write netstrings from and to a Transport.

We skipped writing a low-level version of our server, but let’s think about what sort of things are going on under the hood. First, calling listenTCP tells Twisted to create a listening socket and add it to the event loop. An “event” on a listening socket doesn’t mean there is data to read; instead it means there is a client waiting to connect to us.

Twisted will automatically accept incoming connection requests, thus creating a new client socket that links the server directly to an individual client. That client socket is also added to the event loop, and Twisted creates a new Transport and (via the PoetryFactory) a new PoetryProtocol instance to service that specific client. So the Protocol instances are always connected to client sockets, never to the listening socket.

We can visualize all of this in Figure 26:

Figure 26: the poetry server in action

In the figure there are three clients currently connected to the poetry server. Each Transport represents a single client socket, and the listening socket makes a total of four file descriptors for the select loop to monitor. When a client is disconnected the associated Transport and PoetryProtocol will be dereferenced and garbage-collected (assuming we haven’t stashed a reference to one of them somewhere, a practice we should avoid to prevent memory leaks). The PoetryFactory, meanwhile, will stick around as long as we keep listening for new connections which, in our poetry server, is forever. Like the beauty of poetry. Or something. At any rate, Figure 26 certainly cuts a fine figure of a Figure, doesn’t it?

The client sockets and their associated Python objects won’t live very long if the poem we are serving is relatively short. But with a large poem and a really busy poetry server we could end up with hundreds or thousands of simultaneous clients. And that’s OK — Twisted has no built-in limits on the number of connections it can handle. Of course, as you increase the load on any server, at some point you will find it cannot keep up or some internal OS limit is reached. For highly-loaded servers, careful measurement and testing is the order of the day.

Twisted also imposes no limit on the number of ports we can listen on. In fact, a single Twisted process could listen on dozens of ports and provide a different service on each one (by using a different factory class for each listenTCP call). And with careful design, whether you provide multiple services with a single Twisted process or several is a decision you could potentially even postpone to the deployment phase.

There’s a couple things our server is missing. First of all, it doesn’t generate any logs that might help us debug problems or analyze our network traffic. Furthermore, the server doesn’t run as a daemon, making it vulnerable to death by accidental Ctrl-C (or just logging out). We’ll fix both those problems in a future Part but first, in Part 12, we’ll write another server to perform poetry transformation.

Suggested Exercises

  1. Write an asynchronous poetry server without using Twisted, like we did for the client in Part 2. Note that listening sockets need to be monitored for reading and a “readable” listening socket means we can accept a new client socket.
  2. Write a low-level asynchronous poetry server using Twisted, but without using listenTCP or protocols, transports, and factories, like we did for the client in Part 4. So you’ll still be making your own sockets, but you can use the Twisted reactor instead of your own select loop.
  3. Make the high-level version of the Twisted poetry server a “slow server” by using callLater or LoopingCall to make multiple calls to transport.write(). Add the --num-bytes and --delay command line options supported by the blocking server. Don’t forget to handle the case where the client disconnects before receiving the whole poem.
  4. Extend the high-level Twisted server so it can serve multiple poems (on different ports).
  5. What are some reasons to serve multiple services from the same Twisted process? What are some reasons not to?
Categories
Blather Programming Python Software

Poetry Transformed

Part 10: Poetry Transformed

This continues the introduction started here. You can find an index to the entire series here.

Client 5.0

Now we’re going to add some transformation logic to our poetry client, along the lines suggested in Part 9. But first, I have a shameful and humbling confession to make: I don’t know how to write the Byronification Engine. It is beyond my programming abilities. So instead, I’m going to implement a simpler transformation, the Cummingsifier. The Cummingsifier is an algorithm that takes a poem and returns a new poem like the original but written in the style of e.e. cummings. Here is the Cummingsifier algorithm in its entirety:

def cummingsify(poem)
    return poem.lower()

Unfortunately, this algorithm is so simple it never actually fails, so in client 5.0, located in twisted-client-5/get-poetry.py, we use a modified version of cummingsify that randomly does one of the following:

  1. Return a cummingsified version of the poem.
  2. Raise a GibberishError.
  3. Raise a ValueError.

In this way we simulate a more complicated algorithm that sometimes fails in unexpected ways.

The only other changes in client 5.0 are in the poetry_main function:

def poetry_main():
    addresses = parse_args()

    from twisted.internet import reactor

    poems = []
    errors = []

    def try_to_cummingsify(poem):
        try:
            return cummingsify(poem)
        except GibberishError:
            raise
        except:
            print 'Cummingsify failed!'
            return poem

    def got_poem(poem):
        print poem
        poems.append(poem)

    def poem_failed(err):
        print >>sys.stderr, 'The poem download failed.'
        errors.append(err)

    def poem_done(_):
        if len(poems) + len(errors) == len(addresses):
            reactor.stop()

    for address in addresses:
        host, port = address
        d = get_poetry(host, port)
        d.addCallback(try_to_cummingsify)
        d.addCallbacks(got_poem, poem_failed)
        d.addBoth(poem_done)

    reactor.run()

So when the program downloads a poem from the server, it will either:

  1. Print the cummingsified (lower-cased) version of the poem.
  2. Print “Cummingsify failed!” followed by the original poem.
  3. Print “The poem download failed.”

Although we have retained the ability to download from multiple servers, when you are testing out client 5.0 it’s easier to just use a single server and run the program multiple times, until you see all three different outcomes. Also try running the client on a port with no server.

Let’s draw the callback/errback chain we create on each Deferred we get back from get_poetry:

Figure 19: the deferred chain in client 5.0
Figure 19: the deferred chain in client 5.0

Note the pass-through errback that gets added by addCallback. It passes whatever Failure it receives onto the next errback (poem_failed). Thus, poem_failed can handle failures from both get_poetry (i.e., the deferred is fired with the errback method) and the cummingsify function.

Also note the hauntingly beautiful drop-shadow around the border of the deferred in Figure 19. It doesn’t signify anything other than me discovering how to do it in Inkscape. Expect more drop-shadows in the future.

Let’s analyze the different ways our deferred can fire. The case where we get a poem and the cummingsify function works correctly is shown in Figure 20:

Figure 20: when we download a poem and transform it correctly
Figure 20: when we download a poem and transform it correctly

In this case no callback fails, so control flows down the callback line. Note that poem_done receives None as its result, since got_poem doesn’t actually return a value. If we wanted subsequent callbacks to have access to the poem, we would modify got_poem to return the poem explicitly.

Figure 21 shows the case where we get a poem, but cummingsify raises a GibberishError:

Figure 21: when we download a poem and get a GibberishError
Figure 21: when we download a poem and get a GibberishError

Since the try_to_cummingsify callback re-raises a GibberishError, control switches to the errback line and poem_failed is called with the exception as its argument (wrapped in a Failure, of course).

And since poem_failed doesn’t raise an exception, or return a Failure, after it is done control switches back to the callback line. If we want poem_failed to handle the error completely, then returning None is a reasonable behavior. On the other hand, if we wanted poem_failed to take some action, but still propagate the error, we could change poem_failed to return its err argument and processing would continue down the errback line.

Note that in the current code neither got_poem nor poem_failed ever fail themselves, so the poem_done errback will never be called. But it’s safe to add it in any case and doing so represents an instance of “defensive” programming, as either got_poem or poem_failed might have bugs we don’t know about. Since the addBoth method ensures that a particular function will run no matter how the deferred fires, using addBoth is analogous to adding a finally clause to a try/except statement.

Now examine the case where we download a poem and the cummingsify function raises a ValueError, displayed in Figure 22:

Figure 22: when we download a poem and cummingsify fails
Figure 22: when we download a poem and cummingsify fails

This is the same as figure 20, except got_poem receives the original version of the poem instead of the transformed version. The switch happens entirely inside the try_to_cummingsify callback, which traps the ValueError with an ordinary try/except statement and returns the original poem instead. The deferred object never sees that error at all.

Lastly, we show the case where we try to download a poem from a non-existent server in Figure 23:

Figure 23: when we cannot connect to a server
Figure 23: when we cannot connect to a server

As before, poem_failed returns None so afterwards control switches to the callback line.

Client 5.1

In client 5.0 we are trapping exceptions from cummingsify in our try_to_cummingsify callback using an ordinary try/except statement, rather than letting the deferred catch them first. There isn’t necessarily anything wrong with this strategy, but it’s instructive to see how we might do this differently.

Let’s suppose we wanted to let the deferred catch both GibberishError and ValueError exceptions and send them to the errback line. To preserve the current behavior our subsequent errback needs to check to see if the error is a ValueError and, if so, handle it by returning the original poem, so that control goes back to the callback line and the original poem gets printed out.

But there’s a problem: the errback wouldn’t get the original poem, it would get the Failure-wrapped ValueError raised by the cummingsify function. To let the errback handle the error, we need to arrange for it to receive the original poem.

One way to do that is to modify the cummingsify function so the original poem is included in the exception. That’s what we’ve done in client 5.1, located in twisted-client-5/get-poetry-1.py. We changed the ValueError exception into a custom CannotCummingsify exception which takes the original poem as the first argument.

If cummingsify were a real function in an external module, then it would probably be best to wrap it with another function that trapped any exception that wasn’t GibberishError and raise a CannotCummingsify exception instead. With this new setup, our poetry_main function looks like this:

def poetry_main():
    addresses = parse_args()

    from twisted.internet import reactor

    poems = []
    errors = []

    def cummingsify_failed(err):
        if err.check(CannotCummingsify):
            print 'Cummingsify failed!'
            return err.value.args[0]
        return err

    def got_poem(poem):
        print poem
        poems.append(poem)

    def poem_failed(err):
        print >>sys.stderr, 'The poem download failed.'
        errors.append(err)

    def poem_done(_):
        if len(poems) + len(errors) == len(addresses):
            reactor.stop()

    for address in addresses:
        host, port = address
        d = get_poetry(host, port)
        d.addCallback(cummingsify)
        d.addErrback(cummingsify_failed)
        d.addCallbacks(got_poem, poem_failed)
        d.addBoth(poem_done)

And each deferred we create has the structure pictured in Figure 24:

Figure 24: the deferred chain in client 5.1
Figure 24: the deferred chain in client 5.1

Examine the cummingsify_failed errback:

    def cummingsify_failed(err):
        if err.check(CannotCummingsify):
            print 'Cummingsify failed!'
            return err.value.args[0]
        return err

We are using the check method on Failure objects to test whether the exception embedded in the Failure is an instance of CannotCummingsify. If so, we return the first argument to the exception (the original poem) and thus handle the error. Since the return value is not a Failure, control returns to the callback line. Otherwise, we return the Failure itself and send (re-raise) the error down the errback line. As you can see, the exception is available as the value attribute on the Failure.

Figure 25 shows what happens when we get a CannotCummingsify exception:

Figure 25: when we get a CannotCummingsify error
Figure 25: when we get a CannotCummingsify error

So when we are using a deferred, we can sometimes choose whether we want to use try/except statements to handle exceptions, or let the deferred re-route errors to an errback.

Summary

In Part 10 we updated our poetry client to make use of the Deferred‘s ability to route errors and results down the chain. Although the example was rather artificial, it did illustrate how control flow in a deferred switches back and forth between the callback and errback line depending on the result of each stage.

So now we know everything there is to know about deferreds, right? Not yet! We’re going to explore some more features of deferreds in a future Part. But first we’ll take a little detour and, in Part 11, implement a Twisted version of our poetry server.

Suggested Exercises

  1. Figure 25 shows one of the four possible ways the deferreds in client 5.1 can fire. Draw the other three.
  2. Use the deferred simulator to simulate all possible firings for clients 5.0 and 5.1. To get you started, this simulator program can represent the case where the try_to_cummingsify function succeeds in client 5.0:
    r poem p
    r None r None
    r None r None
Categories
Blather Programming Python Software

A Second Interlude, Deferred

Part 9: A Second Interlude, Deferred

This continues the introduction started here. You can find an index to the entire series here.

More Consequence of Callbacks

We’re going to pause for a moment to think about callbacks again. Although we now know enough about deferreds to write simple asynchronous programs in the Twisted style, the Deferred class provides more features that only come into play in more complex settings. So we’re going to think up some more complex settings and see what sort of challenges they pose when programming with callbacks. Then we’ll investigate how deferreds address those challenges.

To motivate our discussion we’re going to add a hypothetical feature to our poetry client. Suppose some hard-working Computer Science professor has invented a new poetry-related algorithm, the Byronification Engine. This nifty algorithm takes a single poem as input and produces a new poem like the original, but written in the style of Lord Byron. What’s more, our professor has kindly provided a reference implementation in Python, with this interface:

class IByronificationEngine(Interface):

    def byronificate(poem):
        """
        Return a new poem like the original, but in the style of Lord Byron.

        Raises GibberishError if the input is not a genuine poem.
        """

Like most bleeding-edge software, the implementation has some bugs. This means that in addition to the documented exception, the byronificate method sometimes throws random exceptions when it hits a corner-case the professor forgot to handle.

We’ll also assume the engine runs fast enough that we can just call it in the main thread without worrying about tying up the reactor. This is how we want our program to work:

  1. Try to download the poem.
  2. If the download fails, tell the user we couldn’t get the poem.
  3. If we do get the poem, transform it with the Byronification Engine.
  4. If the engine throws a GibberishError, tell the user we couldn’t get the poem.
  5. If the engine throws another exception, just keep the original poem.
  6. If we have a poem, print it out.
  7. End the program.

The idea here is that a GibberishError means we didn’t get an actual poem after all, so we’ll just tell the user the download failed. That’s not so useful for debugging, but our users just want to know whether we got a poem or not. On the other hand, if the engine fails for some other reason then we’ll use the poem we got from the server. After all, some poetry is better than none at all, even if it’s not in the trademark Byron style.

Here’s the synchronous version of our code:

try:
    poem = get_poetry(host, port) # synchronous get_poetry
except:
    print >>sys.stderr, 'The poem download failed.'
else:
    try:
        poem = engine.byronificate(poem)
    except GibberishError:
        print >>sys.stderr, 'The poem download failed.'
    except:
        print poem # handle other exceptions by using the original poem
    else:
        print poem

sys.exit()

This sketch of a program could be make simpler with some refactoring, but it illustrates the flow of logic pretty clearly. We want to update our most recent poetry client (which uses deferreds) to implement this same scheme. But we won’t do that until Part 10. For now, instead, let’s imagine how we might do this with client 3.1, our last client that didn’t use deferreds at all. Suppose we didn’t bother handling exceptions, but instead just changed the got_poem callback like this:

def got_poem(poem):
    poems.append(byron_engine.byronificate(poem))
    poem_done()

What happens when the byronificate method raises a GibberishError or some other exception? Looking at Figure 11 from Part 6, we can see that:

  1. The exception will propagate to the poem_finished callback in the factory, the method that actually invokes the callback.
  2. Since poem_finished doesn’t catch the exception, it will proceed to poemReceived on the protocol.
  3. And then on to connectionLost, also on the protocol.
  4. And then up into the core of Twisted itself, finally ending up at the reactor.

As we have learned, the reactor will catch and log the exception instead of crashing. But what it certainly won’t do is tell the user we couldn’t download a poem. The reactor doesn’t know anything about poems or GibberishErrors, it’s a general-purpose piece of code used for all kinds of networking, even non-poetry-related networking.

Notice how, at each step in the list above, the exception moves to a more general-purpose piece of code than the one before. And at no step after got_poem is the exception in a piece of code that could be expected to handle an error in the specific way we want for this client. This situation is basically the exact opposite of the way exceptions propagate in synchronous code.

Take a look at Figure 15, an illustration of a call stack we might see with a synchronous poetry client :

Figure 15: exceptions in synchronous code
Figure 15: synchronous code and exceptions

The main function is “high-context”, meaning it knows a lot about the whole program, why it exists, and how it’s supposed to behave overall. Typically, main would have access to the command-line options that indicate just how the user wants the program to work (and perhaps what to do if something goes wrong). It also has a very specific purpose: running the show for a command-line poetry client.

The socket connect method, on the other hand, is “low-context”. All it knows is that it’s supposed to connect to some network address. It doesn’t know what’s on the other end or why we need to connect right now. But connect is quite general-purpose — you can use it no matter what sort of service you are connecting to.

And get_poetry is in the middle. It knows it’s getting some poetry (and that’s the only thing it’s really good at), but not what should happen if it can’t.

So an exception thrown by connect will  move up the stack, from low-context and general-purpose code to high-context and special-purpose code, until it reaches some code with enough context to know what to do when something goes wrong (or it hits the Python interpreter and the program crashes).

Of course the exception is really just moving up the stack no matter what rather than literally seeking out high-context code. It’s just that in a typical synchronous program “up the stack” and “towards higher-context” are the same direction.

Now recall our hypothetical modification to client 3.1 above. The call stack we analyzed is pictured in Figure 16, abbreviated to just a few functions:

Figure 16: asynchronous callbacks and exceptions
Figure 16: asynchronous callbacks and exceptions

The problem is now clear: during a callback, low-context code (the reactor) is calling higher-context code which may in turn call even higher-context code, and so on. So if an exception occurs and it isn’t handled immediately, close to the same stack frame where it occurred, it’s unlikely to be handled at all. Because each time the exception moves up the stack it moves to a piece of lower-context code that’s even less likely to know what to do.

Once an exception crosses over into the Twisted core the game is up. The exception will not be handled, it will only be noted (when the reactor finally catches it). So when we are programming with “plain old” callbacks (without using deferreds), we must be careful to catch every exception before it gets back into Twisted proper, at least if we want to have any chance of handling errors according to our own rules. And that includes exceptions caused by our own bugs!

Since a bug can exist anywhere in our code, we would need to wrap every callback we write in an extra “outer layer” of try/except statements so the exceptions from our fumble-fingered typos can be handled as well. And the same goes for our errbacks because code to handle errors can have bugs too.

Well that’s not so nice.

The Fine Structure of Deferreds

It turns out the Deferred class helps us solve this problem. Whenever a deferred invokes a callback or errback, it catches any exception that might be raised. In other words, a deferred acts as the “outer layer” of try/except statements so we don’t need to write that layer after all, as long as we use deferreds. But what does a deferred do with an exception it catches? Simple — it passes the exception (in the form of a Failure) to the next errback in the chain.

So the first errback we add to a deferred is there to handle whatever error condition is signaled when the deferred’s .errback(err) method is called. But the second errback will handle any exception raised by either the first callback or the first errback, and so on down the line.

Recall Figure 12, a visual representation of a deferred with some callbacks and errbacks in the chain. Let’s call the first callback/errback pair stage 0, the next pair stage 1, and so on.

At a given stage N, if either the callback or the errback (whichever was executed) fails, then the errback in stage N+1 is called with the appropriate Failure object and the callback in stage N+1 is not called.

By passing exceptions raised by callbacks “down the chain”, a deferred moves exceptions in the direction of “higher context”. This also means that invoking the callback and errback methods of a deferred will never result in an exception for the caller (as long as you only fire the deferred once!), so lower-level code can safely fire a deferred without worrying about catching exceptions. Instead, higher-level code catches the exception by adding errbacks to the deferred (with addErrback, etc.).

Now in synchronous code, an exception stops propagating as soon as it is caught. So how does an errback signal the fact that it “caught” the error? Also simple — by not raising an exception. And in that case, the execution switches over to the callback line. So at a given stage N, if either the callback or errback succeeds (i.e., doesn’t raise an exception) then the callback in stage N+1 is called with the return value from stage N, and the errback in stage N+1 is not called.

Let’s summarize what we know about the deferred firing pattern:

  1. A deferred contains a chain of ordered callback/errback pairs (stages). The pairs are in the order they were added to the deferred.
  2. Stage 0, the first callback/errback pair, is invoked when the deferred is fired. If the deferred is fired with the callback method, then the stage 0 callback is called. If the deferred is fired with the errback method, then the stage 0 errback is called.
  3. If stage N fails, then the stage N+1 errback is called with the exception (wrapped in a Failure) as the first argument.
  4. If stage N succeeds, then the stage N+1 callback is called with the stage N return value as the first argument.

This pattern is illustrated in Figure 17:

Figure 17: control flow in a deferred
Figure 17: control flow in a deferred

The green lines indicate what happens when a callback or errback succeeds and the red lines are for failures. The lines show both the flow of control and the flow of exceptions and return values down the chain. Figure 17 shows all possible paths a deferred might take, but only one path will be taken in any particular case. Figure 18 shows one possible path for a “firing”:

Figure 18: one possible deferred firing pattern
Figure 18: one possible deferred firing pattern

In figure 18, the deferred’s callback method is called, which invokes the callback in stage 0. That callback succeeds, so control (and the return value from stage 0) passes to the stage 1 callback. But that callback fails (raises an exception), so control switches to the errback in stage 2. The errback “handles” the error (it doesn’t raise an exception) so control moves back to the callback chain and the callback in stage 3 is called with the result from the stage 2 errback.

Notice that any path you can make with Figure 17 will pass through every stage in the chain, but only one member of the callback/errback pair at any stage will be called.

In Figure 18, we’ve indicated that the stage 3 callback succeeds by drawing a green arrow out of it, but since there aren’t any more stages in that deferred, the result of stage 3 doesn’t really go anywhere. If the callback succeeds, that’s not really a problem, but what if it had failed? If the last stage in a deferred fails, then we say the failure is unhandled, since there is no errback to “catch” it.

In synchronous code an unhandled exception will crash the interpreter, and in plain-old-callbacks asynchronous code an unhandled exception is caught by the reactor and logged. What happens to unhandled exceptions in deferreds? Let’s try it out and see. Look at the sample code in twisted-deferred/defer-unhandled.py. That code is firing a deferred with a single callback that always raises an exception. Here’s the output of the program:

Finished
Unhandled error in Deferred:
Traceback (most recent call last):
  ...
--- <exception caught here> ---
  ...
exceptions.Exception: oops

Some things to notice:

  1. The last print statement runs, so the program is not “crashed” by the exception.
  2. That means the Traceback is just getting printed out, it’s not crashing the interpreter.
  3. The text of the traceback tells us where the deferred itself caught the exception.
  4. The “Unhandled” message gets printed out after “Finished”.

So when you use deferreds, unhandled exceptions in callbacks will still be noted, for debugging purposes, but as usual they won’t crash the program (in fact they won’t even make it to the reactor, the deferred will catch them first). By the way, the reason that “Finished” comes first is because the “Unhandled” message isn’t actually printed until the deferred is garbage collected. We’ll see the reason for that in a future Part.

Now, in synchronous code we can “re-raise” an exception using the raise keyword without any arguments. Doing so raises the original exception we were handling and allows us to take some action on an error without completely handling it. It turns out we can do the same thing in an errback. A deferred will consider a callback/errback to have failed if:

  • The callback/errback raises any kind of exception, or
  • The callback/errback returns a Failure object.

Since an errback’s first argument is always a Failure, an errback can “re-raise” the exception by returning its first argument, after performing whatever action it wants to take.

Callbacks and Errbacks, Two by Two

One thing that should be clear from the above discussion is that the order you add callbacks and errbacks to a deferred makes a big difference in how the deferred will fire. What should also be clear is that, in a deferred, callbacks and errbacks always occur in pairs. There are four methods on the Deferred class you can use to add pairs to the chain:

  1. addCallbacks
  2. addCallback
  3. addErrback
  4. addBoth

Obviously, the first and last methods add a pair to the chain. But the middle two methods also add a callback/errback pair. The addCallback method adds an explicit callback (the one you pass to the method) and an implicit “pass-through” errback. A pass-through function is a dummy function that just returns its first argument. Since the first argument to an errback is always a Failure, a pass-through errback will always “fail” and send its error to the next errback in the chain.

As you’ve no doubt guessed, the addErrback function adds an explicit errback and an implicit pass-through callback. And since the first argument to a callback is never a Failure, a pass-through callback sends its result to the next callback in the chain.

The Deferred Simulator

It’s a good idea to become familiar with the way deferreds fire their callbacks and errbacks. The python script in twisted-deferred/deferred-simulator.py is a “deferred simulator”, a little python program that lets you explore how deferreds fire. When you run the script it will ask you to enter list of callback and errback pairs, one per line. For each callback or errback, you specify that either:

  • It returns a given value (succeds), or
  • It raises a given exception (fails), or
  • It returns its argument (passthru).

After you’ve entered all the pairs you want to simulate, the script will print out, in high-resolution ASCII art, a diagram showing the contents of the chain and the firing patterns for the callback and errback methods. You will want to use a terminal window that is as wide as possible to see everything correctly. You can also use the --narrow option to print the diagrams one after the other, but it’s easier to see their relationships when you print them side-by-side.

Of course, in real code a callback isn’t going to return the same value every time, and a given function might sometimes succeed and other times fail. But the simulator can give you a picture of what will happen for a given combination of normal results and failures, in a given arrangement of callbacks and errbacks.

Summary

After thinking some more about callbacks, we realize that letting callback exceptions bubble up the stack isn’t going to work out so well, since callback programming inverts the usual relationship between low-context and high-context code. And the Deferred class tackles this problem by catching exceptions and sending them down the chain instead of up into the reactor.

We’ve also learned that ordinary results (return values) move down the chain as well. Combining both facts together results in a kind of criss-cross firing pattern as the deferred switches back and forth between the callback and errback lines, depending on the result of each stage.

Armed with this knowledge, in Part 10 we will update our poetry client with some poetry transformation logic.

Suggested Exercises

  1. Inspect the implementation of each of the four methods on the Deferred which add callbacks and errbacks. Verify that all methods add a callback/errback pair.
  2. Use the deferred simulator to investigate the difference between this code:
    deferred.addCallbacks(my_callback, my_errback)

    and this code:

    deferred.addCallback(my_callback)
    deferred.addErrback(my_errback)

    Recall that the last two methods add implicit pass-through functions as one member of the pair.

Categories
Blather Programming Python Software

Deferred Poetry

Part 8: Deferred Poetry

This continues the introduction started here. You can find an index to the entire series here.

Client 4.0

Now that we have know something about deferreds, we can rewrite our Twisted poetry client to use them. You can find client 4.0 in twisted-client-4/get-poetry.py.

Our get_poetry function no longer needs callback or errback arguments. Instead, it returns a new deferred to which the user may attach callbacks and errbacks as needed.

def get_poetry(host, port):
    """
    Download a poem from the given host and port. This function
    returns a Deferred which will be fired with the complete text of
    the poem or a Failure if the poem could not be downloaded.
    """
    d = defer.Deferred()
    from twisted.internet import reactor
    factory = PoetryClientFactory(d)
    reactor.connectTCP(host, port, factory)
    return d

Our factory object is initialized with a deferred instead of a callback/errback pair. Once we have the poem, or we find out we couldn’t connect to the server, the deferred is fired with either a poem or a failure:

class PoetryClientFactory(ClientFactory):

    protocol = PoetryProtocol

    def __init__(self, deferred):
        self.deferred = deferred

    def poem_finished(self, poem):
        if self.deferred is not None:
            d, self.deferred = self.deferred, None
            d.callback(poem)

    def clientConnectionFailed(self, connector, reason):
        if self.deferred is not None:
            d, self.deferred = self.deferred, None
            d.errback(reason)

Notice the way we release our reference to the deferred after it is fired. This is a pattern found in several places in the Twisted source code and helps to ensure we do not fire the same deferred twice. It makes life a little easier for the Python garbage collector, too.

Once again, there is no need to change the PoetryProtocol, it’s just fine as is. All that remains is to update the poetry_main function:

def poetry_main():
    addresses = parse_args()

    from twisted.internet import reactor

    poems = []
    errors = []

    def got_poem(poem):
        poems.append(poem)

    def poem_failed(err):
        print >>sys.stderr, 'Poem failed:', err
        errors.append(err)

    def poem_done(_):
        if len(poems) + len(errors) == len(addresses):
            reactor.stop()

    for address in addresses:
        host, port = address
        d = get_poetry(host, port)
        d.addCallbacks(got_poem, poem_failed)
        d.addBoth(poem_done)

    reactor.run()

    for poem in poems:
        print poem

Notice how we take advantage of the chaining capabilities of the deferred to refactor the poem_done invocation out of our primary callback and errback.

Because deferreds are used so much in Twisted code, it’s common practice to use the single-letter local variable d to hold the deferred you are currently working on. For longer term storage, like object attributes, the name “deferred” is often used.

Discussion

With our new client the asynchronous version of get_poetry accepts the same information as our synchronous version, just the address of the poetry server. The synchronous version returns a poem, while the asynchronous version returns a deferred. Returning a deferred is typical of the asynchronous APIs in Twisted and programs written with Twisted, and this points to another way of conceptualizing deferreds:

A Deferred object represents an “asynchronous result” or a “result that has not yet come”.

We can contrast these two styles of programming in Figure 13:

Figure 13: sync versus async
Figure 13: sync versus async

By returning a deferred, an asynchronous API is giving this message to the user:

I’m an asynchronous function. Whatever you want me to do might not be done yet. But when it is done, I’ll fire the callback chain of this deferred with the result. On the other hand, if something goes wrong, I’ll fire the errback chain of this deferred instead.

Of course, that function itself won’t literally fire the deferred, it has already returned. Rather, the function has set in motion a chain of events that will eventually result in the deferred being fired.

So deferreds are a way of “time-shifting” the results of functions to accommodate the needs of the asynchronous model. And a deferred returned by a function is a notice that the function is asynchronous, the embodiment of the future result, and a promise that the result will be delivered.

It is possible for a synchronous function to return a deferred, so technically a deferred return value means the function is potentially asynchronous. We’ll see examples of synchronous functions returning deferreds in future Parts.

Because the behavior of deferreds is well-defined and well-known (to folks with some experience programming with Twisted), by returning deferreds from your own APIs you are making it easier for other Twisted programmers to understand and use your code. Without deferreds, each Twisted program, or even each internal Twisted component, might have its own unique method for managing callbacks that you would have to learn in order to use it.

When You’re Using Deferreds, You’re Still Using Callbacks, and They’re Still Invoked by the Reactor

When first learning Twisted, it is a common mistake to attribute more functionality to deferreds than they actually have. Specifically, it is often assumed that adding a function to a deferred’s chain automatically makes that function asynchronous. This might lead you to think you could use, say, os.system with Twisted by adding it to a deferred with addCallback.

I think this mistake is caused by trying to learn Twisted without first learning the asynchronous model. Since typical Twisted code uses lots of deferreds and only occasionally refers to the reactor, it can appear that deferreds are doing all the work. If you have read this introduction from the beginning, it should be clear this is far from the case. Although Twisted is composed of many parts that work together, the primary responsibility for implementing the asynchronous model falls to the reactor. Deferreds are a useful abstraction, but we wrote several versions of our Twisted client without using them in any way.

Let’s look at a stack trace at the point when our first callback is invoked. Run the example program in twisted-client-4/get-poetry-stack.py with the address of a running poetry server. You should get some output like this:

  File "twisted-client-4/get-poetry-stack.py", line 129, in
    poetry_main()
  File "twisted-client-4/get-poetry-stack.py", line 122, in poetry_main
    reactor.run()

  ... # some more Twisted function calls

    protocol.connectionLost(reason)
  File "twisted-client-4/get-poetry-stack.py", line 59, in connectionLost
    self.poemReceived(self.poem)
  File "twisted-client-4/get-poetry-stack.py", line 62, in poemReceived
    self.factory.poem_finished(poem)
  File "twisted-client-4/get-poetry-stack.py", line 75, in poem_finished
    d.callback(poem) # here's where we fire the deferred

  ... # some more methods on Deferreds

  File "twisted-client-4/get-poetry-stack.py", line 105, in got_poem
    traceback.print_stack()

That’s pretty similar to the stack trace we created for client 2.0. We can visualize the latest trace in Figure 14:

Figure 13: A callback with a deferred
Figure 14: A callback with a deferred

Again, this is similar to our previous Twisted clients, though the visual representation is starting to become vaguely disturbing. We probably won’t be showing any more of these, for the sake of the children. One wrinkle not reflected in the figure: the callback chain above doesn’t return control to the reactor until the second callback in the deferred (poem_done) is invoked, which happens right after the first callback (got_poem) returns.

There’s one more difference with our new stack trace. The line separating “Twisted code” from “our code” is a little fuzzier, since the methods on deferreds are really Twisted code. This interleaving of Twisted and user code in a callback chain is common in larger Twisted programs which make extensive use of other Twisted abstractions.

By using a deferred we’ve added a few more steps in the callback chain that starts in the Twisted reactor, but we haven’t changed the fundamental mechanics of the asynchronous model. Recall these facts about callback programming:

  1. Only one callback runs at a time.
  2. When the reactor is running our callbacks are not.
  3. And vice-versa.
  4. If our callback blocks then the whole program blocks.

Attaching a callback to a deferred doesn’t change these facts in any way. In particular, a callback that blocks will still block if it’s attached to a deferred. So that deferred will block when it is fired (d.callback), and thus Twisted will block. And we conclude:

Deferreds are a solution (a particular one invented by the Twisted developers) to the problem of managing callbacks. They are neither a way of avoiding callbacks nor a way to turn blocking callbacks into non-blocking callbacks.

We can confirm the last point by constructing a deferred with a blocking callback. Consider the example code in twisted-deferred/defer-block.py. The second callback blocks using the time.sleep function. If you run that script and examine the order of the print statements, it will be clear that a blocking callback also blocks inside a deferred.

Summary

By returning a Deferred, a function tells the user “I’m asynchronous” and provides a mechanism (add your callbacks and errbacks here!) to obtain the asynchronous result when it arrives. Deferreds are used extensively throughout the Twisted codebase and as you explore Twisted’s APIs you are bound to keep encountering them. So it will pay to become familiar with deferreds and comfortable in their use.

Client 4.0 is the first version of our Twisted poetry client that’s truly written in the “Twisted style”, using a deferred as the return value of an asynchronous function call. There are a few more Twisted APIs we could use to make it a little cleaner, but I think it represents a pretty good example of how simple Twisted programs are written, at least on the client side. Eventually we’ll re-write our poetry server using Twisted, too.

But we’re not quite finished with deferreds. For a relatively short piece of code, the Deferred class provides a surprising number of features. We’ll talk about some more of those features, and their motivation, in Part 9.

Suggested Exercises

  1. Update client 4.0 to timeout if the poem isn’t received after a given period of time. Fire the deferred’s errback with a custom exception in that case. Don’t forget to close the connection when you do.
  2. Update client 4.0 to print out the appropriate server address when a poem download fails, so the user can tell which server is the culprit. Don’t forget you can add extra positional- and keyword-arguments when you attach callbacks and errbacks.
Categories
Blather Programming Python Software

An Interlude,  Deferred

Part 7: An Interlude,  Deferred

This continues the introduction started here. You can find an index to the entire series here.

Callbacks and Their Consequences

In Part 6 we came face-to-face with this fact: callbacks are a fundamental aspect of asynchronous programming with Twisted. Rather than just a way of interfacing with the reactor, callbacks will be woven into the structure of any Twisted program we write. So using Twisted, or any reactor-based asynchronous system, means organizing our code in a particular way, as a series of “callback chains” invoked by a reactor loop.

Even an API as simple as our get_poetry function required callbacks, two of them in fact: one for normal results and one for errors. Since, as Twisted programmers, we’re going to have to make so much use of them, we should spend a little bit of time thinking about the best ways to use callbacks, and what sort of pitfalls we might encounter.

Consider this piece of code that uses the Twisted version of get_poetry from client 3.1:

...
def got_poem(poem):
    print poem
    reactor.stop()

def poem_failed(err):
    print >>sys.stderr, 'poem download failed'
    print >>sys.stderr, 'I am terribly sorry'
    print >>sys.stderr, 'try again later?'
    reactor.stop()

get_poetry(host, port, got_poem, poem_failed)

reactor.run()

The basic plan here is clear:

  1. If we get the poem, print it out.
  2. If we don’t get the poem, print out an Error Haiku.
  3. In either case, end the program.

The ‘synchronous analogue’ to the above code might look something like this:

...
try:
    poem = get_poetry(host, port) # the synchronous version of get_poetry
except Exception, err:
    print >>sys.stderr, 'poem download failed'
    print >>sys.stderr, 'I am terribly sorry'
    print >>sys.stderr, 'try again later?'
    sys.exit()
else:
    print poem
    sys.exit()

So the callback is like the else block and the errback is like the except. That means invoking the errback is the asynchronous analogue to raising an exception and invoking the callback corresponds to the normal program flow.

What are some of the differences between the two versions? For one thing, in the synchronous version the Python interpreter will ensure that, as long as get_poetry raises any kind of exception at all, for any reason, the except block will run. If we trust the interpreter to run Python code correctly we can trust that error block to run at the right time.

Contrast that with the asynchronous version: the poem_failed errback is invoked by our code, the clientConnectionFailed method of the PoetryClientFactory. We, not Python, are in charge of making sure the error code runs if something goes wrong. So we have to make sure to handle every possible error case by invoking the errback with a Failure object. Otherwise, our program will become “stuck” waiting for a callback that never comes.

That shows another difference between the synchronous and asynchronous versions. If we didn’t bother catching the exception in the synchronous version (by not using a try/except), the Python interpreter would “catch” it for us and crash to show us the error of our ways. But if we forget to “raise” our asynchronous exception (by calling the errback function in PoetryClientFactory), our program will just run forever, blissfully unaware that anything is amiss.

Clearly, handling errors in an asynchronous program is important, and also somewhat tricky. You might say that handling errors in asynchronous code is actually more important than handling the normal case, as things can go wrong in far more ways than they can go right. Forgetting to handle the error case is a common mistake when programming with Twisted.

Here’s another fact about the synchronous code above: either the else block runs exactly once, or the except block runs exactly once (assuming the synchronous version of get_poetry doesn’t enter an infinite loop). The Python interpreter won’t suddenly decide to run them both or, on a whim, run the else block twenty-seven times. And it would be basically impossible to program in Python if it did!

But again, in the asynchronous case we are in charge of running the callback or the errback. Knowing us, we might make some mistakes. We could call both the callback and the errback, or invoke the callback twenty-seven times. That would be unfortunate for the users of get_poetry. Although the docstring doesn’t explicitly say so, it really goes without saying that, like the else and except blocks in a try/except statement, either the callback will run exactly once or the errback will run exactly once, for each specific call to get_poetry. Either we get the poem or we don’t.

Imagine trying to debug a program that makes three poetry requests and gets seven callback invocations and two errback invocations. Where would you even start? You’d probably end up writing your callbacks and errbacks to detect when they got invoked a second time for the same get_poetry call and throw an exception right back. Take that, get_poetry.

One more observation: both versions have some duplicate code. The asynchronous version has two calls to reactor.stop and the synchronous version has two calls to sys.exit. We might refactor the synchronous version like this:

...
try:
    poem = get_poetry(host, port) # the synchronous version of get_poetry
except Exception, err:
    print >>sys.stderr, 'poem download failed'
    print >>sys.stderr, 'I am terribly sorry'
    print >>sys.stderr, 'try again later?'
else:
    print poem

sys.exit()

Can we refactor the asynchronous version in a similar way? It’s not really clear that we can, since the callback and errback are two different functions. Do we have to go back to a single callback to make this possible?

Ok, here are some of the insights we’ve discovered about programming with callbacks:

  1. Calling errbacks is very important. Since errbacks take the place of except blocks, users need to be able to count on them. They aren’t an optional feature of our APIs.
  2. Not invoking callbacks at the wrong time is just as important as calling them at the right time. For a typical use case, the callback and errback are mutually exclusive and invoked exactly once.
  3. Refactoring common code might be harder when using callbacks.

We’ll have more to say about callbacks in future Parts, but for now this is enough to see why Twisted might have an abstraction devoted to managing them.

The Deferred

Since callbacks are used so much in asynchronous programming, and since using them correctly can, as we have discovered, be a bit tricky, the Twisted developers created an abstraction called a Deferred to make programming with callbacks easier. The Deferred class is defined in twisted.internet.defer.

The word “deferred” is either a verb or an adjective in everyday English, so it might sound a little strange used as a noun. Just know that, from now on, when I use the phrase “the deferred” or “a deferred”, I’m referring to an instance of the Deferred class. We’ll talk about why it is called Deferred in a future Part. It might help to mentally add the word “result” to each phrase, as in “the deferred result”. As we will eventually see, that’s really what it is.

A deferred contains a pair of callback chains, one for normal results and one for errors. A newly-created deferred has two empty chains. We can populate the chains by adding callbacks and errbacks and then fire the deferred with either a normal result (here’s your poem!) or an exception (I couldn’t get the poem, and here’s why). Firing the deferred will invoke the appropriate callbacks or errbacks in the order they were added. Figure 12 illustrates a deferred instance with its callback/errback chains:

Figure 12: A Deferred
Figure 12: A Deferred

Let’s try this out. Since deferreds don’t use the reactor, we can test them out without starting up the loop.

You might have noticed a method on Deferred called setTimeout that does use the reactor. It is deprecated and will cease to exist in a future release. Pretend it’s not there and don’t use it.

Our first example is in twisted-deferred/defer-1.py:

from twisted.internet.defer import Deferred

def got_poem(res):
    print 'Your poem is served:'
    print res

def poem_failed(err):
    print 'No poetry for you.'

d = Deferred()

# add a callback/errback pair to the chain
d.addCallbacks(got_poem, poem_failed)

# fire the chain with a normal result
d.callback('This poem is short.')

print "Finished"

This code makes a new deferred, adds a callback/errback pair with the addCallbacks method, and then fires the “normal result” chain with the callback method. Of course, it’s not much of a chain since it only has a single callback, but no matter. Run the code and it produces this output:

Your poem is served:
This poem is short.
Finished

That’s pretty simple. Here are some things to notice:

  1. Just like the callback/errback pairs we used in client 3.1, the callbacks we add to this deferred each take one argument, either a normal result or an error result. It turns out that deferreds support callbacks and errbacks with multiple arguments, but they always have at least one, and the first argument is always either a normal result or an error result.
  2. We add callbacks and errbacks to the deferred in pairs.
  3. The callback method fires the deferred with a normal result, the method’s only argument.
  4. Looking at the order of the print output, we can see that firing the deferred invokes the callbacks immediately. There’s nothing asynchronous going on at all. There can’t be, since no reactor is running. It really boils down to an ordinary Python function call.

Ok, let’s push the other button. The example in twisted-deferred/defer-2.py fires the deferred’s errback chain:

from twisted.internet.defer import Deferred
from twisted.python.failure import Failure

def got_poem(res):
    print 'Your poem is served:'
    print res

def poem_failed(err):
    print 'No poetry for you.'

d = Deferred()

# add a callback/errback pair to the chain
d.addCallbacks(got_poem, poem_failed)

# fire the chain with an error result
d.errback(Failure(Exception('I have failed.')))

print "Finished"

And after running that script we get this output:

No poetry for you.
Finished

So firing the errback chain is just a matter of calling the errback method instead of the callback method, and the method argument is the error result. And just as with callbacks, the errbacks are invoked immediately upon firing.

In the previous example we are passing a Failure object to the errback method like we did in client 3.1. That’s just fine, but a deferred will turn ordinary Exceptions into Failures for us. We can see that with twisted-deferred/defer-3.py:

from twisted.internet.defer import Deferred

def got_poem(res):
    print 'Your poem is served:'
    print res

def poem_failed(err):
    print err.__class__
    print err
    print 'No poetry for you.'

d = Deferred()

# add a callback/errback pair to the chain
d.addCallbacks(got_poem, poem_failed)

# fire the chain with an error result
d.errback(Exception('I have failed.'))

Here we are passing a regular Exception to the errback method. In the errback, we are printing out the class and the error result itself. We get this output:

twisted.python.failure.Failure
[Failure instance: Traceback (failure with no frames): : I have failed.
]
No poetry for you.

This means when we use deferreds we can go back to working with ordinary Exceptions and the Failures will get created for us automatically. A deferred will guarantee that each errback is invoked with an actual Failure instance.

We tried pressing the callback button and we tried pressing the errback button. Like any good engineer, you probably want to start pressing them over and over. To make the code shorter, we’ll use the same function for both the callback and the errback. Just remember they get different return values; one is a result and the other is a failure. Check out twisted-deferred/defer-4.py:

from twisted.internet.defer import Deferred
def out(s): print s
d = Deferred()
d.addCallbacks(out, out)
d.callback('First result')
d.callback('Second result')
print 'Finished'

Now we get this output:

First result
Traceback (most recent call last):
  ...
twisted.internet.defer.AlreadyCalledError

This is interesting! A deferred will not let us fire the normal result callbacks a second time. In fact, a deferred cannot be fired a second time no matter what, as demonstrated by these examples:

Notice those final print statements are never called. The callback and errback methods are raising genuine Exceptions to let us know we’ve already fired that deferred. Deferreds help us avoid one of the pitfalls we identified with callback programming. When we use a deferred to manage our callbacks, we simply can’t make the mistake of calling both the callback and the errback, or invoking the callback twenty-seven times. We can try, but the deferred will raise an exception right back at us, instead of passing our mistake onto the callbacks themselves.

Can deferreds help us to refactor asynchronous code? Consider the example in twisted-deferred/defer-8.py:

import sys

from twisted.internet.defer import Deferred

def got_poem(poem):
    print poem
    from twisted.internet import reactor
    reactor.stop()

def poem_failed(err):
    print >>sys.stderr, 'poem download failed'
    print >>sys.stderr, 'I am terribly sorry'
    print >>sys.stderr, 'try again later?'
    from twisted.internet import reactor
    reactor.stop()

d = Deferred()

d.addCallbacks(got_poem, poem_failed)

from twisted.internet import reactor

reactor.callWhenRunning(d.callback, 'Another short poem.')

reactor.run()

This is basically our original example above, with a little extra code to get the reactor going. Notice we are using callWhenRunning to fire the deferred after the reactor starts up. We’re taking advantage of the fact that callWhenRunning accepts additional positional- and keyword-arguments to pass to the callback when it is run. Many Twisted APIs that register callbacks follow this same convention, including the APIs to add callbacks to deferreds.

Both the callback and the errback stop the reactor. Since deferreds support chains of callbacks and errbacks, we can refactor the common code into a second link in the chains, a technique illustrated in twisted-deferred/defer-9.py:

import sys

from twisted.internet.defer import Deferred

def got_poem(poem):
    print poem

def poem_failed(err):
    print >>sys.stderr, 'poem download failed'
    print >>sys.stderr, 'I am terribly sorry'
    print >>sys.stderr, 'try again later?'

def poem_done(_):
    from twisted.internet import reactor
    reactor.stop()

d = Deferred()

d.addCallbacks(got_poem, poem_failed)
d.addBoth(poem_done)

from twisted.internet import reactor

reactor.callWhenRunning(d.callback, 'Another short poem.')

reactor.run()

The addBoth method adds the same function to both the callback and errback chains. And we can refactor asynchronous code after all.

Note: there is a subtlety in the way this deferred would actually execute its errback chain. We’ll discuss it in a future Part, but keep in mind there is more to learn about deferreds.

Summary

In this Part we analyzed callback programming and identified some potential problems. We also saw how the Deferred class can help us out:

  1. We can’t ignore errbacks, they are required for any asynchronous API. Deferreds have support for errbacks built in.
  2. Invoking callbacks multiple times will likely result in subtle, hard-to-debug problems. Deferreds can only be fired once, making them similar to the familiar semantics of try/except statements.
  3. Programming with plain callbacks can make refactoring tricky. With deferreds, we can refactor by adding links to the chain and moving code from one link to another.

We’re not done with the story of deferreds, there are more details of their rationale and behavior to explore. But we’ve got enough to start using them in our poetry client, so we’ll do that in Part 8.

Suggested Exercises

  1. The last example ignores the argument to poem_done. Print it out instead. Make got_poem return a value and see how that changes the argument to poem_done.
  2. Modify the last two deferred examples to fire the errback chains. Make sure to fire the errback with an Exception.
  3. Read the docstrings for the addCallback and addErrback methods on Deferred.
Categories
Blather Programming Python Software

And Then We Took It Higher

Part 6: And Then We Took It Higher

This continues the introduction started here. You can find an index to the entire series here.

Poetry for Everyone

We’ve made a lot of progress with our poetry client. Our last version (2.0) is using Transports, Protocols, and Protocol Factories, the workhorses of Twisted networking. But there are more improvements to make. Client 2.0 (and also 2.1) can only be used for downloading poetry at the command line. This is because the PoetryClientFactory is not only in charge of getting poetry, but also in charge of shutting down the program when it’s finished. That’s an odd job for something called “PoetryClientFactory“, it really ought to do nothing beyond making PoetryProtocols and collecting finished poems.

We need a way to send a poem to the code that requested the poem in the first place. In a synchronous program we might make an API like this:

def get_poetry(host, post):
    """Return a poem from the poetry server at the given host and port."""

But of course, we can’t do that here. The above function necessarily blocks until the poem is received in entirety, otherwise it couldn’t work the way the documentation claims. But this is a reactive program so blocking on a network socket is out of the question. We need a way to tell the calling code when the poem is ready, without blocking while the poem is in transit. But this is the same sort of problem that Twisted itself has. Twisted needs to tell our code when a socket is ready for I/O, or when some data has been received, or when a timeout has occurred, etc. We’ve seen that Twisted solves this problem using callbacks, so we can use callbacks too:

def get_poetry(host, port, callback):
    """
    Download a poem from the given host and port and invoke

      callback(poem)

    when the poem is complete.
    """

Now we have an asynchronous API we can use with Twisted, so let’s go ahead and implement it.

As I said before, we will at times be writing code in ways a typical Twisted programmer wouldn’t. This is one of those times and one of those ways. We’ll see in Parts 7 and 8 how to do this the “Twisted way” (surprise, it uses an abstraction!) but starting out simply will give us more insight into the finished version.

Client 3.0

You can find version 3.0 of our poetry client in twisted-client-3/get-poetry.py. This version has an implementation of the get_poetry function:

def get_poetry(host, port, callback):
    from twisted.internet import reactor
    factory = PoetryClientFactory(callback)
    reactor.connectTCP(host, port, factory)

The only new wrinkle here is passing the callback function to the PoetryClientFactory. The factory uses the callback to deliver the poem:

class PoetryClientFactory(ClientFactory):

    protocol = PoetryProtocol

    def __init__(self, callback):
        self.callback = callback

    def poem_finished(self, poem):
        self.callback(poem)

Notice the factory is much simpler than in version 2.1 since it’s no longer in charge of shutting the reactor down. It’s also missing the code for detecting failures to connect, but we’ll fix that in a little bit. The PoetryProtocol itself doesn’t need to change at all so we just re-use the one from client 2.1:

class PoetryProtocol(Protocol):

    poem = ''

    def dataReceived(self, data):
        self.poem += data

    def connectionLost(self, reason):
        self.poemReceived(self.poem)

    def poemReceived(self, poem):
        self.factory.poem_finished(poem)

With this change, the get_poetry function, and the PoetryClientFactory and PoetryProtocol classes, are now completely re-usable. They are all about downloading poetry and nothing else. All the logic for starting up and shutting down the reactor is in the main function of our script:

def poetry_main():
    addresses = parse_args()

    from twisted.internet import reactor

    poems = []

    def got_poem(poem):
        poems.append(poem)
        if len(poems) == len(addresses):
            reactor.stop()

    for address in addresses:
        host, port = address
        get_poetry(host, port, got_poem)

    reactor.run()

    for poem in poems:
        print poem

So if we wanted, we could take the re-usable parts and put them in a shared module that anyone could use to get their poetry (as long as they were using Twisted, of course).

By the way, when you’re actually testing client 3.0 you might re-configure the poetry servers to send the poetry faster or in bigger chunks. Now that the client is less chatty in terms of output it’s not as interesting to watch while it downloads the poems.

Discussion

We can visualize the callback chain at the point when a poem is delivered in Figure 11:

Figure 11: the poem callbacks
Figure 11: the poem callbacks

Figure 11 is worth contemplating. Up until now we have depicted callback chains that terminate with a single call to “our code”. But when you are programming with Twisted, or any single-threaded reactive system, these callback chains might well include bits of our code making callbacks to other bits of our code. In other words, the reactive style of programming doesn’t stop when it reaches code we write ourselves. In a reactor-based system, it’s callbacks all the way down.

Keep that fact in mind when choosing Twisted for a project. When you make this decision:

I’m going to use Twisted!

You are also making this decision:

I’m going to structure my program as a series of asynchronous callback chain invocations powered by a reactor loop!

Now maybe you won’t exclaim it out loud the way I do, but it is nevertheless the case. That’s how Twisted works.

It’s likely that most Python programs are synchronous and most Python modules are synchronous too. If we were writing a synchronous program and suddenly realized it needed some poetry, we might use the synchronous version of our get_poetry function by adding a few lines of code to our script like these:

...
import poetrylib # I just made this module name up
poem = poetrylib.get_poetry(host, port)
...

And continue on our way. If, later on, we decided we didn’t really want that poem after all then we’d just snip out those lines and no one would be the wiser. But if we were writing a synchronous program and then decided to use the Twisted version of get_poetry, we would need to re-architect our program in the asynchronous style using callbacks. We would probably have to make significant changes to the code. Now, I’m not saying it would necessarily be a mistake to rewrite the program. It might very well make sense to do so given our requirements. But it won’t be as simple as adding an import line and an extra function call. Simply put, synchronous and asynchronous code do not mix.

If you are new to Twisted and asynchronous programming, I might recommend writing a few Twisted programs from scratch before you attempt to port an existing codebase. That way you will get a feel for using Twisted without the extra complexity of trying to think in both modes at once as you port from one to the other.

If, however, your program is already asynchronous then using Twisted might be much easier. Twisted integrates relatively smoothly with pyGTK and pyQT, the Python APIs for two reactor-based GUI toolkits.

When Things Go Wrong

In client 3.0 we no longer detect a failure to connect to a poetry server, an omission which causes even more problems than it did in client 1.0. If we tell client 3.0 to download a poem from a non-existent server then instead of crashing it just waits there forever. The clientConnectionFailed callback still gets called, but the default implementation in the ClientFactory base class doesn’t do anything at all. So the got_poem callback is never called, the reactor is never stopped, and we’ve got another do-nothing program like the ones we made in Part 2.

Clearly we need to handle this error, but where? The information about the failure to connect is delivered to the factory object via clientConnectionFailed so we’ll have to start there. But this factory is supposed to be re-usable, and the proper way to handle an error will depend on the context in which the factory is being used. In some applications, missing poetry might be a disaster (No poetry?? Might as well just crash). In others, maybe we just keep on going and try to get another poem from somewhere else.

In other words, the users of get_poetry need to know when things go wrong, not just when they go right. In a synchronous program, get_poetry would raise an Exception and the calling code could handle it with a try/except statement. But in a reactive program, error conditions have to be delivered asynchronously, too. After all, we won’t even find out the connection failed until after get_poetry returns. Here’s one possibility:

def get_poetry(host, port, callback):
    """
    Download a poem from the given host and port and invoke

      callback(poem)

    when the poem is complete. If there is a failure, invoke:

      callback(None)

    instead.
    """

By testing the callback argument (i.e., if poem is None) the client can determine whether we actually got a poem or not. This would suffice for our client to avoid running forever, but that approach still has some problems. First of all, using None to indicate failure is somewhat ad-hoc. Some asynchronous APIs might want to use None as a default return value instead of an error condition. Second, a None value carries a very limited amount of information. It can’t tell us what went wrong, or include a traceback object we can use in debugging. Ok, second try:

def get_poetry(host, port, callback):
    """
    Download a poem from the given host and port and invoke

      callback(poem)

    when the poem is complete. If there is a failure, invoke:

      callback(err)

    instead, where err is an Exception instance.
    """

Using an Exception is closer to what we are used to with synchronous programming. Now we can look at the exception to get more information about what went wrong and None is free for use as a regular value. Normally, though, when we encounter an exception in Python we also get a traceback we can analyze or print to a log for debugging at some later date. Tracebacks are extremely useful so we shouldn’t give them up just because we are using asynchronous programming.

Keep in mind we don’t want a traceback object for the point where our callback is invoked, that’s not where the problem happened. What we really want is both the Exception instance and the traceback from the point where that exception was raised (assuming it was raised and not simply created).

Twisted includes an abstraction called a Failure that wraps up both an Exception and the traceback, if any, that went with it. The Failure docstring explains how to create one. By passing Failure objects to callbacks we can preserve the traceback information that’s so handy for debugging.

There is some example code that uses Failure objects in twisted-failure/failure-examples.py. It shows how Failures can preserve the traceback information from a raised exception, even outside the context of an except block. We won’t dwell too much on making Failure instances. In Part 7 we’ll see that Twisted generally ends up making them for us.

Alright, third try:

def get_poetry(host, port, callback):
    """
    Download a poem from the given host and port and invoke

      callback(poem)

    when the poem is complete. If there is a failure, invoke:

      callback(err)

    instead, where err is a twisted.python.failure.Failure instance.
    """

With this version we get both an Exception and possibly a traceback record when things go wrong. Nice.

We’re almost there, but we’ve got one more problem. Using the same callback for both normal results and failures is kind of odd. In general, we need to do quite different things on failure than on success. In a synchronous Python program we generally handle success and failure with two different code paths in a try/except statement like this:

try:
    attempt_to_do_something_with_poetry()
except RhymeSchemeViolation:
    # the code path when things go wrong
else:
    # the code path when things go so, so right baby

If we want to preserve this style of error-handling, then we need to use a separate code path for failures. In asynchronous programming a separate code path means a separate callback:

def get_poetry(host, port, callback, errback):
    """
    Download a poem from the given host and port and invoke

      callback(poem)

    when the poem is complete. If there is a failure, invoke:

      errback(err)

    instead, where err is a twisted.python.failure.Failure instance.
    """

Client 3.1

Now that we have an API with reasonable error-handling semantics we can implement it. Client 3.1 is located in twisted-client-3/get-poetry-1.py. The changes are pretty straightforward. The PoetryClientFactory gets both a callback and an errback, and now it implements clientConnectionFailed:

class PoetryClientFactory(ClientFactory):

    protocol = PoetryProtocol

    def __init__(self, callback, errback):
        self.callback = callback
        self.errback = errback

    def poem_finished(self, poem):
        self.callback(poem)

    def clientConnectionFailed(self, connector, reason):
        self.errback(reason)

Since clientConnectionFailed already receives a Failure object (the reason argument) that explains why the connection failed, we just pass that along to the errback.

The other changes are all of a piece so I won’t bother posting them here. You can test client 3.1 by using a port with no server like this:

python twisted-client-3/get-poetry-1.py 10004

And you’ll get some output like this:

Poem failed: [Failure instance: Traceback (failure with no frames): : Connection was refused by other side: 111: Connection refused.
]

That’s from the print statement in our poem_failed errback. In this case, Twisted has simply passed us an Exception rather than raising it, so we don’t get a traceback here. But a traceback isn’t really needed since this isn’t a bug, it’s just Twisted informing us, correctly, that we can’t connect to that address.

Summary

Here’s what we’ve learned in Part 6:

  • The APIs we write for Twisted programs will have to be asynchronous.
  • We can’t mix synchronous code with asynchronous code.
  • Thus, we have to use callbacks in our own code, just like Twisted does.
  • And we have to handle errors with callbacks, too.

Does that mean every API we write with Twisted has to include two extra arguments, a callback and an errback? That doesn’t sound so nice. Fortunately, Twisted has an abstraction we can use to eliminate both those arguments and pick up a few extra features in the bargain. We’ll learn about it in Part 7.

Suggested Exercises

  1. Update client 3.1 to timeout if the poem isn’t received after a given period of time. Invoke the errback with a custom exception in that case. Don’t forget to close the connection when you do.
  2. Study the trap method on Failure objects. Compare it to the except clause in the try/except statement.
  3. Use print statements to verify that clientConnectionFailed is called after get_poetry returns.
Categories
Blather Programming Python Software

Twistier Poetry

Part 5: Twistier Poetry

This continues the introduction started here. You can find an index to the entire series here.

Abstract Expressionism

In Part 4 we made our first poetry client that uses Twisted. It works pretty well, but there is definitely room for improvement.

First of all, the client includes code for mundane details like creating network sockets and receiving data from those sockets. Twisted provides support for these sorts of things so we don’t have to implement them ourselves every time we write a new program. This is especially helpful because asynchronous I/O requires a few tricky bits involving exception handling as you can see in the client code. And there are even more tricky bits if you want your code to work on multiple platforms. If you have a free afternoon, search the Twisted sources for “win32” to see how many corner cases that platform introduces.

Another problem with the current client is error handling. Try running version 1.0 of the Twisted client and tell it to download from a port with no server. It just crashes. We could fix the current client, but error handling is easier with the Twisted APIs we’ll be using today.

Finally, the client isn’t particularly re-usable. How would another module get a poem with our client? How would the “calling” module know when the poem had finished downloading? We can’t write a function that simply returns the text of the poem as that would require blocking until the entire poem is read. This is a real problem but we’re not going to fix it today — we’ll save that for future Parts.

We’re going to fix the first and second problems using a higher-level set of APIs and Interfaces. The Twisted framework is loosely composed of layers of abstractions and learning Twisted means learning what those layers provide, i.e, what APIs, Interfaces, and implementations are available for use in each one. Since this is an introduction we’re not going to study each abstraction in complete detail or do an exhaustive survey of every abstraction that Twisted offers. We’re just going to look at the most important pieces to get a better feel for how Twisted is put together. Once you become familiar with the overall style of Twisted’s architecture, learning new parts on your own will be much easier.

In general, each Twisted abstraction is concerned with one particular concept. For example, the 1.0 client from Part 4 uses IReadDescriptor, the abstraction of a “file descriptor you can read bytes from”. A Twisted abstraction is usually defined by an Interface specifying how an object embodying that abstraction should behave. The most important thing to keep in mind when learning a new Twisted abstraction is this:

Most higher-level abstractions in Twisted are built by using lower-level ones, not by replacing them.

So when you are learning a new Twisted abstraction, keep in mind both what it does and what it does not do. In particular, if some earlier abstraction A implements feature F, then F is probably not implemented by any other abstraction. Rather, if another abstraction B needs feature F, it will use A rather than implement F itself.  (In general, an implementation of B will either sub-class an implementation of A or refer to another object that implements A).

Networking is a complex subject, and thus Twisted contains lots of abstractions. By starting with lower levels first, we are hopefully getting a clearer picture of how they all get put together in a working Twisted program.

Loopiness in the Brain

The most important abstraction we have learned so far, indeed the most important abstraction in Twisted, is the reactor. At the center of every program built with Twisted, no matter how many layers that program might have, there is a reactor loop spinning around and making the whole thing go. Nothing else in Twisted provides the functionality the reactor offers. Much of the rest of Twisted, in fact, can be thought of as “stuff that makes it easier to do X using the reactor” where X might be “serve a web page” or “make a database query” or some other specific feature. Although it’s possible to stick with the lower-level APIs, like the client 1.0 does, we have to implement more things ourselves if we do. Moving to higher-level abstractions generally means writing less code (and letting Twisted handle the platform-dependent corner cases).

But when we’re working at the outer layers of Twisted it can be easy to forget the reactor is there. In any Twisted program of reasonable size, relatively few parts of our code will actually use the reactor APIs directly. The same is true for some of the other low-level abstractions. The file descriptor abstractions we used in client 1.0 are so thoroughly subsumed by higher-level concepts that they basically disappear in real Twisted programs (they are still used on the inside, we just don’t see them as such).

As far as the file descriptor abstractions go, that’s not really a problem. Letting Twisted handle the mechanics of asynchronous I/O frees us to concentrate on whatever problem we are trying to solve. But the reactor is different. It never really disappears. When you choose to use Twisted you are also choosing to use the Reactor Pattern, and that means programming in the “reactive style” using callbacks and cooperative multi-tasking. If you want to use Twisted correctly, you have to keep the reactor’s existence (and the way it works) in mind. We’ll have more to say about this in Part 6, but for now our message is this:

Figure 5 and Figure 6 are the most important diagrams in this introduction.

We’ll keep using diagrams to illustrate new concepts, but those two Figures are the ones that you need to burn into your brain, so to speak. Those are the pictures I constantly have in mind while writing programs with Twisted.

Before we dive into the code, there are three new abstractions to introduce: Transports, Protocols, and Protocol Factories.

Transports

The Transport abstraction is defined by ITransport in the main Twisted interfaces module. A Twisted Transport represents a single connection that can send and/or receive bytes. For our poetry clients, the Transports are abstracting TCP connections like the ones we have been making ourselves in earlier versions. But Twisted also supports I/O over UNIX Pipes and UDP sockets among other things. The Transport abstraction represents any such connection and handles the details of asynchronous I/O for whatever sort of connection it represents.

If you scan the methods defined for ITransport, you won’t find any for receiving data. That’s because Transports always handle the low-level details of reading data asynchronously from their connections, and give the data to us via callbacks. Along similar lines, the write-related methods of Transport objects may choose not to write the data immediately to avoid blocking. Telling a Transport to write some data means “send this data as soon as you can do so,  subject to the requirement to avoid blocking”. The data will be written in the order we provide it, of course.

We generally don’t implement our own Transport objects or create them in our code. Rather, we use the implementations that Twisted already provides and which are created for us when we tell the reactor to make a connection.

Protocols

Twisted Protocols are defined by IProtocol in the same interfaces module. As you might expect, Protocol objects implement protocols. That is to say, a particular implementation of a Twisted Protocol should implement one specific networking protocol, like FTP or IMAP or some nameless protocol we invent for our own purposes. Our poetry protocol, such as it is, simply sends all the bytes of the poem as soon as a connection is established, while the close of the connection signifies the end of the poem.

Strictly speaking, each instance of a Twisted Protocol object implements a protocol for one specific connection. So each connection our program makes (or, in the case of servers, accepts) will require one instance of a Protocol. This makes Protocol instances the natural place to store both the state of “stateful” protocols and the accumulated data of partially received messages (since we receive the bytes in arbitrary-sized chunks with asynchronous I/O).

So how do Protocol instances know what connection they are responsible for? If you look at the IProtocol definition, you will find a method called makeConnection. This method is a callback and Twisted code calls it with a Transport instance as the only argument. The Transport is the connection the Protocol is going to use.

Twisted includes a large number of ready-built Protocol implementations for various common protocols. You can find a few simpler ones in twisted.protocols.basic. It’s a good idea to check the Twisted sources before you write a new Protocol to see if there’s already an implementation you can use. But if there isn’t, it’s perfectly OK to implement your own, as we will do for our poetry clients.

Protocol Factories

So each connection needs its own Protocol and that Protocol might be an instance of a class we implement ourselves. Since we will let Twisted handle creating the connections, Twisted needs a way to make the appropriate Protocol “on demand” whenever a new connection is made. Making Protocol instances is the job of Protocol Factories.

As you’ve probably guessed, the Protocol Factory API is defined by IProtocolFactory, also in the interfaces module. Protocol Factories are an example of the Factory design pattern and they work in a straightforward way. The buildProtocol method is supposed to return a new Protocol instance each time it is called. This is the method that Twisted uses to make a new Protocol for each new connection.

Get Poetry 2.0: First Blood.0

Alright, let’s take a look at version 2.0 of the Twisted poetry client. The code is in twisted-client-2/get-poetry.py. You can run it just like the others and get similar output so I won’t bother posting output here. This is also the last version of the client that prints out task numbers as it receives bytes. By now it should be clear that all Twisted programs work by interleaving tasks and processing relatively small chunks of data at a time. We’ll still use print statements to show what is going on at key moments, but the clients won’t be quite as verbose in the future.

In client 2.0, sockets have disappeared. We don’t even import the socket module and we never refer to a socket object, or a file descriptor, in any way. Instead, we tell the reactor to make the connections to the poetry servers on our behalf like this:

factory = PoetryClientFactory(len(addresses))

from twisted.internet import reactor

for address in addresses:
    host, port = address
    reactor.connectTCP(host, port, factory)

The connectTCP method is the one to focus on. The first two arguments should be self-explanatory. The third is an instance of our PoetryClientFactory class. This is the Protocol Factory for poetry clients and passing it to the reactor allows Twisted to create instances of our PoetryProtocol on demand.

Notice that we are not implementing either the Factory or the Protocol from scratch, unlike the PoetrySocket objects in our previous client. Instead, we are sub-classing the base implementations that Twisted provides in twisted.internet.protocol. The primary Factory base class is twisted.internet.protocol.Factory, but we are using the ClientFactory sub-class which is specialized for clients (processes that make connections instead of listening for connections like a server).

We are also taking advantage of the fact that the Twisted Factory class implements buildProtocol for us. We call the base class implementation in our sub-class:

def buildProtocol(self, address):
    proto = ClientFactory.buildProtocol(self, address)
    proto.task_num = self.task_num
    self.task_num += 1
    return proto

How does the base class know what Protocol to build? Notice we are also setting the class attribute protocol on PoetryClientFactory:

class PoetryClientFactory(ClientFactory):

    task_num = 1

    protocol = PoetryProtocol # tell base class what proto to build

The base Factory class implements buildProtocol by instantiating the class we set on protocol (i.e., PoetryProtocol) and setting the factory attribute on that new instance to be a reference to its “parent” Factory. This is illustrated in Figure 8:

Figure 8: a Protocol is born
Figure 8: a Protocol is born

As we mentioned above, the factory attribute on Protocol objects allows Protocols created with the same Factory to share state. And since Factories are created by “user code”, that same attribute allows Protocol objects to communicate results back to the code that initiated the request in the first place, as we will see in Part 6.

Note that while the factory attribute on Protocols refers to an instance of a Protocol Factory, the protocol attribute on the Factory refers to the class of the Protocol. In general, a single Factory might create many Protocol instances.

The second stage of Protocol construction connects a Protocol with a Transport, using the makeConnection method. We don’t have to implement this method ourselves since the Twisted base class provides a default implementation. By default, makeConnection stores a reference to the Transport on the transport attribute and sets the connected attribute to a True value, as depicted in Figure 9:

Figure 9: a Protocol meets its Transport
Figure 9: a Protocol meets its Transport

Once initialized in this way, the Protocol can start performing its real job — translating a lower-level stream of data into a higher-level stream of protocol messages (and vice-versa for 2-way connections). The key method for processing incoming data is dataReceived, which our client implements like this:

def dataReceived(self, data):
    self.poem += data
    msg = 'Task %d: got %d bytes of poetry from %s'
    print  msg % (self.task_num, len(data), self.transport.getPeer())

Each time dataReceived is called we get a new sequence of bytes (data) in the form of a string. As always with asynchronous I/O, we don’t know how much data we are going to get so we have to buffer it until we receive a complete protocol message. In our case, the poem isn’t finished until the connection is closed, so we just keep adding the bytes to our .poem attribute.

Note we are using the getPeer method on our Transport to identify which server the data is coming from. We are only doing this to be consistent with earlier clients. Otherwise our code wouldn’t need to use the Transport explicitly at all, since we never send any data to the servers.

Let’s take a quick look at what’s going on when the dataReceived method is called. In the same directory as our 2.0 client, there is another client called twisted-client-2/get-poetry-stack.py. This is just like the 2.0 client except the dataReceived method has been changed like this:

def dataReceived(self, data):
    traceback.print_stack()
    os._exit(0)

With this change the program will print a stack trace and then quit the first time it receives some data. You could run this version like so:

python twisted-client-2/get-poetry-stack.py 10000

And you will get a stack trace like this:

File "twisted-client-2/get-poetry-stack.py", line 125, in
    poetry_main()

... # I removed a bunch of lines here

File ".../twisted/internet/tcp.py", line 463, in doRead  # Note the doRead callback
    return self.protocol.dataReceived(data)
File "twisted-client-2/get-poetry-stack.py", line 58, in dataReceived
    traceback.print_stack()

There’s the doRead callback we used in client 1.0! As we noted before, Twisted builds new abstractions by using the old ones, not by replacing them. So there is still an IReadDescriptor implementation hard at work, it’s just implemented by Twisted instead of our code. If you are curious, Twisted’s implementation is in twisted.internet.tcp. If you follow the code, you’ll find that the same object implements IWriteDescriptor and ITransport too. So the IReadDescriptor is actually the Transport object in disguise. We can visualize a dataReceived callback with Figure 10:

Figure 10: the dataReceived callback
Figure 10: the dataReceived callback

Once a poem has finished downloading, the PoetryProtocol object notifies its PoetryClientFactory:

def connectionLost(self, reason):
    self.poemReceived(self.poem)

def poemReceived(self, poem):
    self.factory.poem_finished(self.task_num, poem)

The connectionLost callback is invoked when the transport’s connection is closed. The reason argument is a twisted.python.failure.Failure object with additional information on whether the connection was closed cleanly or due to an error. Our client just ignores this value and assumes we received the entire poem.

The factory shuts down the reactor after all the poems are done. Once again we assume the only thing our program is doing is downloading poems, which makes PoetryClientFactory objects less reusable. We’ll fix that in the next Part, but notice how the poem_finished callback keeps track of the number of poems left to go:

...
    self.poetry_count -= 1

    if self.poetry_count == 0:
        ...

If we were writing a multi-threaded program where each poem was downloaded in a separate thread we would need to protect this section of code with a lock in case two or more threads invoked poem_finished at the same time. Otherwise we might end up shutting down the reactor twice (and getting a traceback for our troubles). But with a reactive system we needn’t bother. The reactor can only make one callback at a time, so this problem just can’t happen.

Our new client also handles a failure to connect with more grace than the 1.0 client. Here’s the callback on the PoetryClientFactory class which does the job:

def clientConnectionFailed(self, connector, reason):
    print 'Failed to connect to:', connector.getDestination()
    self.poem_finished()

Note the callback is on the factory, not on the protocol. Since a protocol is only created after a connection is made, the factory gets the news when a connection cannot be established.

A simpler client

Although our new client is pretty simple already, we can make it simpler if we dispense with the task numbers. The client should really be about the poetry, after all. There is a simplified 2.1 version in twisted-client-2/get-poetry-simple.py.

Wrapping Up

Client 2.0 uses Twisted abstractions that should be familiar to any Twisted hacker. And if all we wanted was a command-line client that printed out some poetry and then quit, we could even stop here and call our program done. But if we wanted some re-usable code, some code that we could embed in a larger program that needs to download some poetry but also do other things, then we still have some work to do. In Part 6 we’ll take a first stab at it.

Suggested Exercises

  1. Use callLater to make the client timeout if a poem hasn’t finished after a given interval. Use the loseConnection method on the transport to close the connection on a timeout, and don’t forget to cancel the timeout if the poem finishes on time.
  2. Use the stacktrace method to analyze the callback sequence that occurs when connectionLost is invoked.
Categories
Blather Programming Python Software

Twisted Poetry

Part 4: Twisted Poetry

This continues the introduction started here. You can find an index to the entire series here.

Our First Twisted Client

Although Twisted is probably more often used to write servers, clients are simpler than servers and we’re starting out as simply as possible. Let’s try out our first poetry client written with Twisted. The source code is in twisted-client-1/get-poetry.py. Start up some poetry servers as before:

python blocking-server/slowpoetry.py --port 10000 poetry/ecstasy.txt --num-bytes 30
python blocking-server/slowpoetry.py --port 10001 poetry/fascination.txt
python blocking-server/slowpoetry.py --port 10002 poetry/science.txt

And then run the client like this:

python twisted-client-1/get-poetry.py 10000 10001 10002

And you should get some output like this:

Task 1: got 60 bytes of poetry from 127.0.0.1:10000
Task 2: got 10 bytes of poetry from 127.0.0.1:10001
Task 3: got 10 bytes of poetry from 127.0.0.1:10002
Task 1: got 30 bytes of poetry from 127.0.0.1:10000
Task 3: got 10 bytes of poetry from 127.0.0.1:10002
Task 2: got 10 bytes of poetry from 127.0.0.1:10001
...
Task 1: 3003 bytes of poetry
Task 2: 623 bytes of poetry
Task 3: 653 bytes of poetry
Got 3 poems in 0:00:10.134220

Just like we did with our non-Twisted asynchronous client. Which isn’t surprising as they are doing essentially the same thing. Let’s take a look at the source code to see how it works. Open up the client in your editor so you can examine the code we are discussing.

Note: As I mentioned in Part 1, we will begin our use of Twisted by using some very low-level APIs. By doing this we bypass some of the layers of Twisted’s abstractions so we can learn Twisted from the “inside out”. But this means a lot of the APIs we will learn in the beginning are not often used when writing real code. Just keep in mind that these early programs are learning exercises, not examples of how to write production software.

The Twisted client starts up by creating a set of PoetrySocket objects. A PoetrySocket initializes itself by creating a real network socket, connecting to a server, and switching to non-blocking mode:

self.sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
self.sock.connect(address)
self.sock.setblocking(0)

Eventually we’ll get to a level of abstraction where we aren’t working with sockets at all, but for now we still need to. After creating the network connection, a PoetrySocket passes itself to the reactor via the addReader method:

# tell the Twisted reactor to monitor this socket for reading
from twisted.internet import reactor
reactor.addReader(self)

This method gives Twisted a file descriptor you want to monitor for incoming data. Why are we passing Twisted an object instead of a file descriptor and a callback? And how will Twisted know what to do with our object since Twisted certainly doesn’t contain any poetry-specific code? Trust me, I’ve looked. Open up the twisted.internet.interfaces module and follow along with me.

Twisted Interfaces

There are a number of sub-modules in Twisted called interfaces. Each one defines a set of Interface classes. As of version 8.0, Twisted uses zope.interface as the basis for those classes, but the details of that package aren’t so important for us. We’re just concerned with the Interface sub-classes in Twisted itself, like the ones you are looking at now.

One of the principle purposes of Interfaces is documentation. As a Python programmer you are doubtless familiar with Duck Typing, the notion that the type of an object is principally defined not by its position in a class hierarchy but by the public interface it presents to the world. Thus two objects which present the same public interface (i.e., walk like a duck, quack like a …) are, as far as duck typing is concerned, the same sort of thing (a duck!). Well an Interface is a somewhat formalized way of specifying just what it means to walk like a duck.

A quick note on terminology: with zope.interface we say that a class implements an interface and instances of that class provide the interface (assuming it is the instances upon which we invoke the methods defined by the interface). We will try to stick to that terminology in our discussion.

Skip down the twisted.internet.interfaces source code until you come to the definition of the addReader method. It is declared in the IReactorFDSet Interface and should look something like this:

def addReader(reader):
    """
    I add reader to the set of file descriptors to get read events for.

    @param reader: An L{IReadDescriptor} provider that will be checked for
                   read events until it is removed from the reactor with
                   L{removeReader}.

    @return: C{None}.
    """

IReactorFDSet is one of the Interfaces that Twisted reactors provide. Thus, any Twisted reactor has a method called addReader that works as described by the docstring above. The method declaration does not have a self argument because it is solely concerned with defining a public interface, and the self argument is part of the implementation (i.e., the caller does not have to pass self explicitly). Interface objects are never instantiated or used as base classes for real implementations.

Note 1: Technically, IReactorFDSet would only be provided by reactors that support waiting on file descriptors. As far as I know, that currently includes all available reactors.

Note 2: It is possible to use Interfaces for more than documentation. The zope.interface module allows you to explicitly declare that a class implements one or more interfaces, and comes with mechanisms to examine these declarations at run-time. Also supported is the concept of adaptation, the ability to dynamically provide a given interface for an object that might not support that interface directly. But we’re not going to delve into these more advanced use cases.

Note 3: You might notice a similarity between Interfaces and Abstract Base Classes, a recent addition to the Python language. We will not be exploring their similarities and differences here, but you might be interested in reading an essay by Glyph, the Twisted project founder, that touches on that subject.

According to the docstring above, the reader argument of addReader should implement the IReadDescriptor interface. And that means our PoetrySocket objects have to do just that.

Scrolling through the module to find this new interface, we see:

class IReadDescriptor(IFileDescriptor):

    def doRead():
        """
        Some data is available for reading on your descriptor.
        """

And you will find an implementation of doRead on our PoetrySocket class. It reads data from the socket asynchronously, whenever it is called by the Twisted reactor. So doRead is really a callback, but instead of passing it directly to Twisted, we pass in an object with a doRead method. This is a common idiom in the Twisted framework — instead of passing a function you pass an object that must provide a given Interface. This allows us to pass a set of related callbacks (the methods defined by the Interface) with a single argument. It also lets the callbacks communicate with each other through shared state stored on the object.

So what other callbacks are provided on PoetrySocket objects? Notice that IReadDescriptor is a sub-class of IFileDescriptor. That means any object that provides IReadDescriptor must also provide IFileDescriptor. And if you do some more scrolling, you will find:

class IFileDescriptor(ILoggingContext):
    """
    A file descriptor.
    """

    def fileno():
        ...

    def connectionLost(reason):
        ...

I left out the docstrings above, but the purpose of these callbacks is fairly clear from the names: fileno should return the file descriptor we want to monitor, and connectionLost is called when the connection is closed. And you can see our PoetrySocket objects provide those methods as well.

Finally, IFileDescriptor inherits from ILoggingContext. I won’t bother to show it here, but that’s why we need to include the logPrefix callback. You can find the details in the interfaces module.

Note: You might notice that doRead is returning special values to indicate when the socket is closed. How did I know to do that? Basically, it didn’t work without it and I peeked at Twisted’s implementation of the same interface to see what to do. You may wish to sit down for this: sometimes software documentation is wrong or incomplete. Perhaps when you have recovered from the shock, I’ll have finished Part 5.

More on Callbacks

Our new Twisted client is really quite similar to our original asynchronous client. Both clients connect their own sockets, and read data from those sockets (asynchronously). The main difference is the Twisted client doesn’t need its own select loop — it uses the Twisted reactor instead.

The doRead callback is the most important one. Twisted calls it to tell us there is some data ready to read from our socket. We can visualize the process in Figure 7:

Figure 7: the doRead callback
Figure 7: the doRead callback

Each time the callback is invoked it’s up to us to read all the data we can and then stop without blocking. And as we said in Part 3, Twisted can’t stop our code from misbehaving (from blocking needlessly). We can do just that and see what happens. In the same directory as our Twisted client is a broken client called twisted-client-1/get-poetry-broken.py. This client is identical to the one you’ve been looking at, with two exceptions:

  1. The broken client doesn’t bother to make the socket non-blocking.
  2. The doRead callback just keeps reading bytes (and possibly blocking) until the socket is closed.

Now try running the broken client like this:

python twisted-client-1/get-poetry-broken.py 10000 10001 10002

You’ll get some output that looks something like this:

Task 1: got 3003 bytes of poetry from 127.0.0.1:10000
Task 3: got 653 bytes of poetry from 127.0.0.1:10002
Task 2: got 623 bytes of poetry from 127.0.0.1:10001
Task 1: 3003 bytes of poetry
Task 2: 623 bytes of poetry
Task 3: 653 bytes of poetry
Got 3 poems in 0:00:10.132753

Aside from a slightly different task order this looks like our original blocking client. But that’s because the broken client is a blocking client. By using a blocking recv call in our callback, we’ve turned our nominally asynchronous Twisted program into a synchronous one. So we’ve got the complexity of a select loop without any of the benefits of asynchronicity.

The sort of multi-tasking capability that an event loop like Twisted provides is cooperative. Twisted will tell us when it’s OK to read or write to a file descriptor, but we have to play nice by only transferring as much data as we can without blocking. And we must avoid making other kinds of blocking calls, like os.system. Furthermore, if we have a long-running computational (CPU-bound) task, it’s up to us to split it up into smaller chunks so that I/O tasks can still make progress if possible.

Note that there is a sense in which our broken client still works: it does manage to download all the poetry we asked it to. It’s just that it can’t take advantage of the efficiencies of asynchronous I/O. Now you might notice the broken client still runs a lot faster than the original blocking client. That’s because the broken client connects to all the servers at the start of the program. Since the servers start sending data immediately, and since the OS will buffer some of the incoming data for us even if we don’t read it (up to a limit), our blocking client is effectively receiving data from the other servers even though it is only reading from one at a time.

But this “trick” only works for small amounts of data, like our short poems. If we were downloading, say, the three 20 million-word epic sagas that chronicle one hacker’s attempt to win his true love by writing the world’s greatest Lisp interpreter, the operating system buffers would quickly fill up and our broken client would be scarcely more efficient than our original blocking one.

Wrapping Up

I don’t have much more to say about our first Twisted poetry client. You might note the connectionLost callback shuts down the reactor after there are no more PoetrySockets waiting for poems. That’s not such a great technique since it assumes we aren’t doing anything else in the program other than download poetry, but it does illustrate a couple more low-level reactor APIs, removeReader and getReaders.

There are Writer equivalents to the Reader APIs we used in this client, and they work in analogous ways for file descriptors we want to monitor for sending data to. Consult the interfaces file for more details. The reason reading and writing have separate APIs is because the select call distinguishes between those two kinds of events (a file descriptor becoming available for reading or writing, respectively). It is, of course, possible to wait for both events on the same file descriptor.

In Part 5, we will write a second version of our Twisted poetry client using some higher-level abstractions, and learn some more Twisted Interfaces and APIs along the way.

Suggested Exercises

  1. Fix the client so that a failure to connect to a server does not crash the program.
  2. Use callLater to make the client timeout if a poem hasn’t finished after a given interval. Read about the return value of callLater so you can cancel the timeout if the poem finishes on time.