Categories
Blather Programming Python Software

Deferreds En Masse

Part 18: Deferreds En Masse

This continues the introduction started here. You can find an index to the entire series here.

Introduction

In the last Part we learned a new way of structuring sequential asynchronous callbacks using a generator. Thus, including deferreds, we now have two techniques for chaining asynchronous operations together.

Sometimes, though, we want to run a group of asynchronous operations in “parallel”. Since Twisted is single-threaded they won’t really run concurrently, but the point is we want to use asynchronous I/O to work on a group of tasks as fast as possible. Our poetry clients, for example, download poems from multiple servers at the same time, rather than one server after another. That was the whole point of using Twisted for getting poetry, after all.

And, as a result, all our poetry clients have had to solve this problem: how do you know when all the asynchronous operations you have started are done? So far we have solved this by collecting our results into a list (like the results list in client 7.0) and checking the length of the list. We have to be careful to collect failures as well as successful results, otherwise a single failure will cause the program to run forever, thinking there’s still work left to do.

As you might expect, Twisted includes an abstraction you can use to solve this problem and we’re going to take a look at it today.

The DeferredList

The DeferredList class allows us to treat a list of deferred objects as a single deferred. That way we can start a bunch of asynchronous operations and get notified only when all of them have finished (regardless of whether they succeeded or failed). Let’s look at some examples.

In deferred-list/deferred-list-1.py you will find this code:

from twisted.internet import defer

def got_results(res):
    print 'We got:', res

print 'Empty List.'
d = defer.DeferredList([])
print 'Adding Callback.'
d.addCallback(got_results)

And if you run it, you will get this output:

Empty List.
Adding Callback.
We got: []

Some things to notice:

  • A DeferredList is created from a Python list. In this case the list is empty, but we’ll soon see that the list elements must all be Deferred objects.
  • A DeferredList is itself a deferred (it inherits from Deferred). That means you can add callbacks and errbacks to it just like you would a regular deferred.
  • In the example above, our callback was fired as soon as we added it, so the DeferredList must have fired right away. We’ll discuss that more in a second.
  • The result of the deferred list was itself a list (empty).

Now look at deferred-list/deferred-list-2.py:

from twisted.internet import defer

def got_results(res):
    print 'We got:', res

print 'One Deferred.'
d1 = defer.Deferred()
d = defer.DeferredList([d1])
print 'Adding Callback.'
d.addCallback(got_results)
print 'Firing d1.'
d1.callback('d1 result')

Now we are creating our DeferredList with a 1-element list containing a single deferred. Here’s the output we get:

One Deferred.
Adding Callback.
Firing d1.
We got: [(True, 'd1 result')]

More things to notice:

  • This time the DeferredList didn’t fire its callback until we fired the deferred in the list.
  • The result is still a list, but now it has one element.
  • The element is a tuple whose second value is the result of the deferred in the list.

Let’s try putting two deferreds in the list (deferred-list/deferred-list-3.py):

from twisted.internet import defer

def got_results(res):
    print 'We got:', res

print 'Two Deferreds.'
d1 = defer.Deferred()
d2 = defer.Deferred()
d = defer.DeferredList([d1, d2])
print 'Adding Callback.'
d.addCallback(got_results)
print 'Firing d1.'
d1.callback('d1 result')
print 'Firing d2.'
d2.callback('d2 result')

And here’s the output:

Two Deferreds.
Adding Callback.
Firing d1.
Firing d2.
We got: [(True, 'd1 result'), (True, 'd2 result')]

At this point it’s pretty clear the result of a DeferredList, at least for the way we’ve been using it, is a list with the same number of elements as the list of deferreds we passed to the constructor. And the elements of that result list contain the results of the original deferreds, at least if the deferreds succeed. That means the DeferredList itself doesn’t fire until all the deferreds in the original list have fired. And a DeferredList created with an empty list fires right away since there aren’t any deferreds to wait for.

What about the order of the results in the final list? Consider deferred-list/deferred-list-4.py:

from twisted.internet import defer

def got_results(res):
    print 'We got:', res

print 'Two Deferreds.'
d1 = defer.Deferred()
d2 = defer.Deferred()
d = defer.DeferredList([d1, d2])
print 'Adding Callback.'
d.addCallback(got_results)
print 'Firing d2.'
d2.callback('d2 result')
print 'Firing d1.'
d1.callback('d1 result')

Now we are firing d2 first and then d1. Note the deferred list is still constructed with d1 and d2 in their original order. Here’s the output:

Two Deferreds.
Adding Callback.
Firing d2.
Firing d1.
We got: [(True, 'd1 result'), (True, 'd2 result')]

The output list has the results in the same order as the original list of deferreds, not the order those deferreds happened to fire in. Which is very nice, because we can easily associate each individual result with the operation that generated it (for example, which poem came from which server).

Alright, what happens if one or more of the deferreds in the list fails? And what are those True values doing there? Let’s try the example in deferred-list/deferred-list-5.py:

from twisted.internet import defer

def got_results(res):
    print 'We got:', res

d1 = defer.Deferred()
d2 = defer.Deferred()
d = defer.DeferredList([d1, d2], consumeErrors=True)
d.addCallback(got_results)
print 'Firing d1.'
d1.callback('d1 result')
print 'Firing d2 with errback.'
d2.errback(Exception('d2 failure'))

Now we are firing d1 with a normal result and d2 with an error. Ignore the consumeErrors option for now, we’ll get back to it. Here’s the output:

Firing d1.
Firing d2 with errback.
We got: [(True, 'd1 result'), (False, <twisted.python.failure.Failure <type 'exceptions.Exception'>>)]

Now the tuple corresponding to d2 has a Failure in slot two, and False in slot one. At this point it should be pretty clear how a DeferredList works (but see the Discussion below):

  • A DeferredList is constructed with a list of deferred objects.
  • A DeferredList is itself a deferred whose result is a list of the same length as the list of deferreds.
  • The DeferredList fires after all the deferreds in the original list have fired.
  • Each element of the result list corresponds to the deferred in the same position as the original list. If that deferred succeeded, the element is (True, result) and if the deferred failed, the element is (False, failure).
  • A DeferredList never fails, since the result of each individual deferred is collected into the list no matter what (but again, see the Discussion below).

Now let’s talk about that consumeErrors option we passed to the DeferredList. If we run the same code but without passing the option (deferred-list/deferred-list-6.py), we get this output:

Firing d1.
Firing d2 with errback.
We got: [(True, 'd1 result'), (False, >twisted.python.failure.Failure >type 'exceptions.Exception'<<)]
Unhandled error in Deferred:
Traceback (most recent call last):
Failure: exceptions.Exception: d2 failure

If you recall, the “Unhandled error in Deferred” message is generated when a deferred is garbage collected and the last callback in that deferred failed. The message is telling us we haven’t caught all the potential asynchronous failures in our program. So where is it coming from in our example? It’s clearly not coming from the DeferredList, since that succeeds. So it must be coming from d2.

A DeferredList needs to know when each deferred it is monitoring fires. And the DeferredList does that in the usual way — by adding a callback and errback to each deferred. And by default, the callback (and errback) return the original result (or failure) after putting it in the final list. And since returning the original failure from the errback triggers the next errback, d2 remains in the failed state after it fires.

But if we pass consumeErrors=True to the DeferredList, the errback added by the DeferredList to each deferred will instead return None, thus “consuming” the error and eliminating the warning message. We could also handle the error by adding our own errback to d2, as in deferred-list/deferred-list-7.py.

Client 8.0

Version 8.0 of our Get Poetry Now! client uses a DeferredList to find out when all the poetry has finished (or failed). You can find the new client in twisted-client-8/get-poetry.py. Once again the only change is in poetry_main. Let’s look at the important changes:

    ...
    ds = []

    for (host, port) in addresses:
        d = get_transformed_poem(host, port)
        d.addCallbacks(got_poem)
        ds.append(d)

    dlist = defer.DeferredList(ds, consumeErrors=True)
    dlist.addCallback(lambda res : reactor.stop())

You may wish to compare it to the same section of client 7.0.

In client 8.0, we don’t need the poem_done callback or the results list. Instead, we put each deferred we get back from get_transformed_poem into a list (ds) and then create a DeferredList. Since the DeferredList won’t fire until all the poems have finished or failed, we just add a callback to the DeferredList to shutdown the reactor. In this case, we aren’t using the result from the DeferredList, we just need to know when everything is finished. And that’s it!

Discussion

We can visualize how a DeferredList works in Figure 37:

Figure 37: the result of a DeferredList
Figure 37: the result of a DeferredList

Pretty simple, really. There are a couple options to DeferredList we haven’t covered, and which change the behavior from what we have described above. We will leave them for you to explore in the Exercises below.

In the next Part we will cover one more feature of the Deferred class, a feature recently introduced in Twisted 10.1.0.

Suggested Exercises

  1. Read the source code for the DeferredList.
  2. Modify the examples in deferred-list to experiment with the optional constructor arguments fireOnOneCallback and fireOnOneErrback. Come up with scenarios where you would use one or the other (or both).
  3. Can you create a DeferredList using a list of DeferredLists? If so, what would the result look like?
  4. Modify client 8.0 so that it doesn’t print out anything until all the poems have finished downloading. This time you will use the result from the DeferredList.
  5. Define the semantics of a DeferredDict and then implement it.
Categories
Blather Programming Python Software

Just Another Way to Spell “Callback”

Part 17: Just Another Way to Spell “Callback”

This continues the introduction started here. You can find an index to the entire series here.

Introduction

In this Part we’re going to return to the subject of callbacks. We’ll introduce another technique for writing callbacks in Twisted that uses generators. We’ll show how the technique works and contrast it with using “pure” Deferreds. Finally we’ll rewrite one of our poetry clients using this technique. But first let’s review how generators work so we can see why they are a candidate for creating callbacks.

A Brief Review of Generators

As you probably know, a Python generator is a “restartable function” that you create by using the yield expression in the body of your function. By doing so, the function becomes a “generator function” that returns an iterator you can use to run the function in a series of steps. Each cycle of the iterator restarts the function, which proceeds to execute until it reaches the next yield.

Generators (and iterators) are often used to represent lazily-created sequences of values. Take a look at the example code in inline-callbacks/gen-1.py:

def my_generator():
    print 'starting up'
    yield 1
    print "workin'"
    yield 2
    print "still workin'"
    yield 3
    print 'done'

for n in my_generator():
    print n

Here we have a generator that creates the sequence 1, 2, 3. If you run the code, you will see the print statements in the generator interleaved with the print statement in the for loop as the loop cycles through the generator.

We can make this code more explicit by creating the generator ourselves (inline-callbacks/gen-2.py):

def my_generator():
    print 'starting up'
    yield 1
    print "workin'"
    yield 2
    print "still workin'"
    yield 3
    print 'done'

gen = my_generator()

while True:
    try:
        n = gen.next()
    except StopIteration:
        break
    else:
        print n

Considered as a sequence, the generator is just an object for getting successive values. But we can also view things from the point of view of the generator itself:

  1. The generator function doesn’t start running until “called” by the loop (using the next method).
  2. Once the generator is running, it keeps running until it “returns” to the loop (using yield).
  3. When the loop is running other code (like the print statement), the generator is not running.
  4. When the generator is running, the loop is not running (it’s “blocked” waiting for the generator).
  5. Once a generator yields control to the loop, an arbitrary amount of time may pass (and an arbitrary amount of other code may execute) until the generator runs again.

This is very much like the way callbacks work in an asynchronous system. We can think of the while loop as the reactor, and the generator as a series of callbacks separated by yield statements, with the interesting fact that all the callbacks share the same local variable namespace, and the namespace persists from one callback to the next.

Furthermore, you can have multiple generators active at once (see the example in inline-callbacks/gen-3.py), with their “callbacks” interleaved with each other, just as you can have independent asynchronous tasks running in a system like Twisted.

Something is still missing, though. Callbacks aren’t just called by the reactor, they also receive information. When part of a deferred’s chain, a callback either receives a result, in the form of a single Python value, or an error, in the form of a Failure.

Starting with Python 2.5, generators were extended in a way that allows you to send information to a generator when you restart it, as illustrated in inline-callbacks/gen-4.py:

class Malfunction(Exception):
    pass

def my_generator():
    print 'starting up'

    val = yield 1
    print 'got:', val

    val = yield 2
    print 'got:', val

    try:
        yield 3
    except Malfunction:
        print 'malfunction!'

    yield 4

    print 'done'

gen = my_generator()

print gen.next() # start the generator
print gen.send(10) # send the value 10
print gen.send(20) # send the value 20
print gen.throw(Malfunction()) # raise an exception inside the generator

try:
    gen.next()
except StopIteration:
    pass

In Python 2.5 and later versions, the yield statement is an expression that evaluates to a value. And the code that restarts the generator can determine that value using the send method instead of next (if you use next the value is None). What’s more, you can actually raise an arbitrary exception inside the generator using the throw method. How cool is that?

Inline Callbacks

Given what we just reviewed about sending and throwing values and exceptions into a generator, we can envision a generator as a series of callbacks, like the ones in a deferred, which receive either results or failures. The callbacks are separated by yields and the value of each yield expression is the result for the next callback (or the yield raises an exception and that’s the failure). Figure 35 shows the correspondence:

Figure 35: generator as a callback sequence
Figure 35: generator as a callback sequence

Now when a series of callbacks is chained together in a deferred, each callback receives the result from the one prior. That’s easy enough to do with a generator — just send the value you got from the previous run of the generator (the value it yielded) the next time you restart it. But that also seems a bit silly. Since the generator computed the value to begin with, why bother sending it back? The generator could just save the value in a variable for the next time it’s needed. So what’s the point?

Recall the fact we learned in Part 13, that the callbacks in a deferred can return deferreds themselves. And when that happens, the outer deferred is paused until the inner deferred fires, and then the next callback (or errback) in the outer deferred’s chain is called with the result (or failure) from the inner deferred.

So imagine that our generator yields a deferred object instead of an ordinary Python value. The generator is now “paused”, and that’s automatic; generators always pause after every yield statement until they are explicitly restarted. So we can delay restarting the generator until the deferred fires, at which point we either send the value (if the deferred succeeds) or throw the exception (if the deferred fails). That would make our generator a genuine sequence of asynchronous callbacks and that’s the idea behind the inlineCallbacks function in twisted.internet.defer.

inlineCallbacks

Consider the example program in inline-callbacks/inline-callbacks-1.py:

from twisted.internet.defer import inlineCallbacks, Deferred

@inlineCallbacks
def my_callbacks():
    from twisted.internet import reactor

    print 'first callback'
    result = yield 1 # yielded values that aren't deferred come right back

    print 'second callback got', result
    d = Deferred()
    reactor.callLater(5, d.callback, 2)
    result = yield d # yielded deferreds will pause the generator

    print 'third callback got', result # the result of the deferred

    d = Deferred()
    reactor.callLater(5, d.errback, Exception(3))

    try:
        yield d
    except Exception, e:
        result = e

    print 'fourth callback got', repr(result) # the exception from the deferred

    reactor.stop()

from twisted.internet import reactor
reactor.callWhenRunning(my_callbacks)
reactor.run()

Run the example and you will see the generator execute to the end and then stop the reactor. The example illustrates several aspects of the inlineCallbacks function. First, inlineCallbacks is a decorator and it always decorates generator functions, i.e., functions that use yield. The whole purpose of inlineCallbacks is turn a generator into a series of asynchronous callbacks according to the scheme we outlined before.

Second, when we invoke an inlineCallbacks-decorated function, we don’t need to call next or send or throw ourselves. The decorator takes care of those details for us and ensures the generator will run to the end (assuming it doesn’t raise an exception).

Third, if we yield a non-deferred value from the generator, it is immediately restarted with that same value as the result of the yield.

And finally, if we yield a deferred from the generator, it will not be restarted until that deferred fires. If the deferred succeeds, the result of the yield is just the result from the deferred. And if the deferred fails, the yield statement raises the exception. Note the exception is just an ordinary Exception object, rather than a Failure, and we can catch it with a try/except statement around the yield expression.

In the example we are just using callLater to fire the deferreds after a short period of time. While that’s a handy way to put in a non-blocking delay into our callback chain, normally we would be yielding a deferred returned by some other asynchronous operation (i.e., get_poetry) invoked from our generator.

Ok, now we know how an inlineCallbacks-decorated function runs, but what return value do you get if you actually call one? As you might have guessed, you get a deferred. Since we can’t know exactly when that generator will stop running (it might yield one or more deferreds), the decorated function itself is asynchronous and a deferred is the appropriate return value. Note the deferred that is returned isn’t one of the deferreds the generator may yield. Rather, it’s a deferred that fires only after the generator has completely finished (or throws an exception).

If the generator throws an exception, the returned deferred will fire its errback chain with that exception wrapped in a Failure. But if we want the generator to return a normal value, we must “return” it using the defer.returnValue function. Like the ordinary return statement, it will also stop the generator (it actually raises a special exception). The inline-callbacks/inline-callbacks-2.py example illustrates both possibilities.

Client 7.0

Let’s put inlineCallbacks to work with a new version of our poetry client. You can see the code in twisted-client-7/get-poetry.py. You may wish to compare it to client 6.0 in twisted-client-6/get-poetry.py. The relevant changes are in poetry_main:

def poetry_main():
    addresses = parse_args()

    xform_addr = addresses.pop(0)

    proxy = TransformProxy(*xform_addr)

    from twisted.internet import reactor

    results = []

    @defer.inlineCallbacks
    def get_transformed_poem(host, port):
        try:
            poem = yield get_poetry(host, port)
        except Exception, e:
            print >>sys.stderr, 'The poem download failed:', e
            raise

        try:
            poem = yield proxy.xform('cummingsify', poem)
        except Exception:
            print >>sys.stderr, 'Cummingsify failed!'

        defer.returnValue(poem)

    def got_poem(poem):
        print poem

    def poem_done(_):
        results.append(_)
        if len(results) == len(addresses):
            reactor.stop()

    for address in addresses:
        host, port = address
        d = get_transformed_poem(host, port)
        d.addCallbacks(got_poem)
        d.addBoth(poem_done)

    reactor.run()

In our new version the inlineCallbacks generator function get_transformed_poem is responsible for both fetching the poem and then applying the transformation (via the transform service). Since both operations are asynchronous, we yield a deferred each time and then (implicitly) wait for the result. As in client 6.0, if the transformation fails we just return the original poem. Notice we can use try/except statements to handle asynchronous errors inside the generator.

We can test the new client out in the same way as before. First start up a transform server:

python twisted-server-1/transformedpoetry.py --port 10001

Then start a couple of poetry servers:

python twisted-server-1/fastpoetry.py --port 10002 poetry/fascination.txt
python twisted-server-1/fastpoetry.py --port 10003 poetry/science.txt

Now you can run the new client:

python twisted-client-7/get-poetry.py 10001 10002 10003

Try turning off one or more of the servers to see how the client handles errors.

Discussion

Like the Deferred object, the inlineCallbacks function gives us a new way of organizing our asynchronous callbacks. And, as with deferreds, inlineCallbacks doesn’t change the rules of the game. Specifically, our callbacks still run one at a time, and they are still invoked by the reactor. We can confirm that fact in our usual way by printing out a traceback from an inline callback, as in the example script inline-callbacks/inline-callbacks-tb.py. Run that code and you will get a traceback with reactor.run() at the top, lots of helper functions in between, and our callback at the bottom.

We can adapt Figure 29, which explains what happens when one callback in a deferred returns another deferred, to show what happens when an inlineCallbacks generator yields a deferred. See Figure 36:

Figure 36: flow control in an inlineCallbacks function
Figure 36: flow control in an inlineCallbacks function

The same figure works in both cases because the idea being illustrated is the same — one asynchronous operation is waiting for another.

Since inlineCallbacks and deferreds solve many of the same problems, why choose one over the other? Here are some potential advantages of inlineCallbacks:

  • Since the callbacks share a namespace, there is no need to pass extra state around.
  • The callback order is easier to see, as they just execute from top to bottom.
  • With no function declarations for individual callbacks and implicit flow-control, there is generally less typing.
  • Errors are handled with the familiar try/except statement.

And here are some potential pitfalls:

  • The callbacks inside the generator cannot be invoked individually, which could make code re-use difficult. With a deferred, the code constructing the deferred is free to add arbitrary callbacks in an arbitrary order.
  • The compact form of a generator can obscure the fact that an asynchronous callback is even involved. Despite its visually similar appearance to an ordinary sequential function, a generator behaves in a very different manner. The inlineCallbacks function is not a way to avoid learning the asynchronous programming model.

As with any technique, practice will provide the experience necessary to make an informed choice.

Summary

In this Part we learned about the inlineCallbacks decorator and how it allows us to express a sequence of asynchronous callbacks in the form of a Python generator.

In Part 18 we will learn a technique for managing a set of “parallel” asynchronous operations.

Suggested Exercises

  1. Why is the inlineCallbacks function plural?
  2. Study the implementation of inlineCallbacks and its helper function _inlineCallbacks. Ponder the phrase “the devil is in the details”.
  3. How many callbacks are contained in a generator with N yield statements, assuming it has no loops or if statements?
  4. Poetry client 7.0 might have three generators running at once. Conceptually, how many different ways might they be interleaved with one another? Considering the way they are invoked in the poetry client and the implementation of inlineCallbacks, how many ways do you think are actually possible?
  5. Move the got_poem callback in client 7.0 inside the generator.
  6. Then move the poem_done callback inside the generator. Be careful! Make sure to handle all the failure cases so the reactor gets shutdown no matter what. How does the resulting code compare to using a deferred to shutdown the reactor?
  7. A generator with yield statements inside a while loop can represent a conceptually infinite sequence. What does such a generator decorated with inlineCallbacks represent?
Categories
Blather Programming Python Software

Twisted Daemonologie

Part 16: Twisted Daemonologie

This continues the introduction started here. You can find an index to the entire series here.

Introduction

The servers we have written so far have just run in a terminal window, with output going to the screen via print statements. This works alright for development, but it’s hardly a way to deploy services in production. A well-behaved production server ought to:

  1. Run as a daemon process, unconnected with any terminal or user session. You don’t want a service to shut down just because the administrator logs out.
  2. Send debugging and error output to a set of rotated log files, or to the syslog service.
  3. Drop excessive privileges, e.g., switching to a lower-privileged user before running.
  4. Record its pid in a file so that the administrator can easily send signals to the daemon.

We can get all of those features using the twistd script provided by Twisted. But first we’ll have to change our code a bit.

The Concepts

Understanding twistd will require learning a few new concepts in Twisted, the most important being a Service. As usual, several of the new concepts are accompanied by new Interfaces.

IService

The IService interface defines a named service that can be started and stopped. What does the service do? Whatever you like — rather than define the specific function of the service, the interface requires only that it provide a small set of generic attributes and methods.

There are two required attributes: name and running. The name attribute is just a string, like 'fastpoetry', or None if you don’t want to give your service a name. The running attribute is a Boolean value and is true if the service has been successfully started.

We’re only going to touch on some of the methods of IService. We’ll skip some that are obvious, and others that are more advanced and often go unused in simpler Twisted programs. The two principle methods of IService are startService and stopService:

    def startService():
        """
        Start the service.
        """

    def stopService():
        """
        Stop the service.

        @rtype: L{Deferred}
        @return: a L{Deferred} which is triggered when the service has
            finished shutting down. If shutting down is immediate, a
            value can be returned (usually, C{None}).
        """

Again, what these methods actually do will depend on the service in question. For example, the startService method might:

  • Load some configuration data, or
  • Initialize a database, or
  • Start listening on a port, or
  • Do nothing at all.

And the stopService method might:

  • Persist some state, or
  • Close open database connections, or
  • Stop listening on a port, or
  • Do nothing at all.

When we write our own custom services we’ll need to implement these methods appropriately. For some common behaviors, like listening on a port, Twisted provides ready-made services we can use instead.

Notice that stopService may optionally return a deferred, which is required to fire when the service has completely shut down. This allows our services to finish cleaning up after themselves before the entire application terminates. If your service shuts down immediately you can just return None instead of a deferred.

Services can be organized into collections that get started and stopped together. The last IService method we’re going to look at, setServiceParent, adds a Service to a collection:

    def setServiceParent(parent):
        """
        Set the parent of the service.

        @type parent: L{IServiceCollection}
        @raise RuntimeError: Raised if the service already has a parent
            or if the service has a name and the parent already has a child
            by that name.
        """

Any service can have a parent, which means services can be organized in a hierarchy. And that brings us to the next Interface we’re going to look at today.

IServiceCollection

The IServiceCollection interface defines an object which can contain IService objects. A service collection is a just plain container class with methods to:

Note that an implementation of IServiceCollection isn’t automatically an implementation of IService, but there’s no reason why one class can’t implement both interfaces (and we’ll see an example of that shortly).

Application

A Twisted Application is not defined by a separate interface. Rather, an Application object is required to implement both IService and IServiceCollection, as well as a few other interfaces we aren’t going to cover.

An Application is the top-level service that represents your entire Twisted application. All the other services in your daemon will be children (or grandchildren, etc.) of the Application object.

It is rare to actually implement your own Application. Twisted provides an implementation that we’ll use today.

Twisted Logging

Twisted includes its own logging infrastructure in the module twisted.python.log. The basic API for writing to the log is simple, so we’ll just include a short example located in basic-twisted/log.py, and you can skim the Twisted module for details if you are interested.

We won’t bother showing the API for installing logging handlers, since twistd will do that for us.

FastPoetry 2.0

Alright, let’s look at some code. We’ve updated the fast poetry server to run with twistd. The source is located in twisted-server-3/fastpoetry.py. First we have the poetry protocol:

class PoetryProtocol(Protocol):

    def connectionMade(self):
        poem = self.factory.service.poem
        log.msg('sending %d bytes of poetry to %s'
                % (len(poem), self.transport.getPeer()))
        self.transport.write(poem)
        self.transport.loseConnection()

Notice instead of using a print statement, we’re using the twisted.python.log.msg function to record each new connection.
Here’s the factory class:

class PoetryFactory(ServerFactory):

    protocol = PoetryProtocol

    def __init__(self, service):
        self.service = service

As you can see, the poem is no longer stored on the factory, but on a service object referenced by the factory. Notice how the protocol gets the poem from the service via the factory. Finally, here’s the service class itself:

class PoetryService(service.Service):

    def __init__(self, poetry_file):
        self.poetry_file = poetry_file

    def startService(self):
        service.Service.startService(self)
        self.poem = open(self.poetry_file).read()
        log.msg('loaded a poem from: %s' % (self.poetry_file,))

As with many other Interface classes, Twisted provides a base class we can use to make our own implementations, with helpful default behaviors. Here we use the twisted.application.service.Service class to implement our PoetryService.

The base class provides default implementations of all required methods, so we only need to implement the ones with custom behavior. In this case, we just override startService to load the poetry file. Note we still call the base class method (which sets the running attribute for us).

Another point is worth mentioning. The PoetryService object doesn’t know anything about the details of the PoetryProtocol. The service’s only job is to load the poem and provide access to it for any object that might need it. In other words, the PoetryService is entirely concerned with the higher-level details of providing poetry, rather than the lower-level details of sending a poem down a TCP connection. So this same service could be used by another protocol, say UDP or XML-RPC. While the benefit is rather small for our simple service, you can imagine the advantage for a more realistic service implementation.

If this were a typical Twisted program, all the code we’ve looked at so far wouldn’t actually be in this file. Rather, it would be in some other module(s) (perhaps fastpoetry.protocol and fastpoetry.service). But following our usual practice of making these examples self-contained, we’ve including everything we need in a single script.

Twisted tac files

The rest of the script contains what would normally be the entire content — a Twisted tac file. A tac file is a Twisted Application Configuration file that tells twistd how to construct an application. As a configuration file it is responsible for choosing settings (like port numbers, poetry file locations, etc.) to run the application in some particular way. In other words, a tac file represents a specific deployment of our service (serve that poem on this port) rather than a general script for starting any poetry server.

If we were running multiple poetry servers on the same host, we would have a tac file for each one (so you can see why tac files normally don’t contain any general-purpose code). In our example, the tac file is configured to serve poetry/ecstasy.txt run on port 10000 of the loopback interface:

# configuration parameters
port = 10000
iface = 'localhost'
poetry_file = 'poetry/ecstasy.txt'

Note that twistd doesn’t know anything about these particular variables, we just define them here to keep all our configuration values in one place. In fact, twistd only really cares about one variable in the entire file, as we’ll see shortly. Next we begin building up our application:

# this will hold the services that combine to form the poetry server
top_service = service.MultiService()

Our poetry server is going to consist of two services, the PoetryService we defined above, and a Twisted built-in service that creates the listening socket our poem will be served from. Since these two services are clearly related to each other, we’ll group them together using a MultiService, a Twisted class which implements both IService and IServiceCollection.

As a service collection, the MultiService will group our two poetry services together. And as a service, the MultiService will start both child services when the MultiService itself is started, and stop both child services when it is stopped. Let’s add the first poetry service to the collection:

# the poetry service holds the poem. it will load the poem when it is
# started
poetry_service = PoetryService(poetry_file)
poetry_service.setServiceParent(top_service)

This is pretty simple stuff. We just create the PoetryService and then add it to the collection with setServiceParent, a method we inherited from the Twisted base class. Next we add the TCP listener:

# the tcp service connects the factory to a listening socket. it will
# create the listening socket when it is started
factory = PoetryFactory(poetry_service)
tcp_service = internet.TCPServer(port, factory, interface=iface)
tcp_service.setServiceParent(top_service)

Twisted provides the TCPServer service for creating a TCP listening socket connected to an arbitrary factory (in this case our PoetryFactory). We don’t call reactor.listenTCP directly because the job of a tac file is to get our application ready to start, without actually starting it. The TCPServer will create the socket after it is started by twistd.

You might have noticed we didn’t bother to give any of our services names. Naming services is not required, but only an optional feature you can use if you want to ‘look up’ services at runtime. Since we don’t need to do that in our little application, we don’t bother with it here.

Ok, now we’ve got both our services combined into a collection. Now we just make our Application and add our collection to it:

# this variable has to be named 'application'
application = service.Application("fastpoetry")

# this hooks the collection we made to the application
top_service.setServiceParent(application)

The only variable in this script that twistd really cares about is the application variable. That is how twistd will find the application it’s supposed to start (and so the variable has to be named ‘application’). And when the application is started, all the services we added to it will be started as well.

Figure 34 shows the structure of the application we just built:

Figure 34: the structure of our fastpoetry application
Figure 34: the structure of our fastpoetry application

Running the Server

Let’s take our new server for a spin. As a tac file, we need to start it with twistd. Of course, it’s also just a regular Python file, too. So let’s run it with Python first and see what happens:

python twisted-server-3/fastpoetry.py

If you do this, you’ll find that what happens is nothing! As we said before, the job of a tac file is to get an application ready to run, without actually running it. As a reminder of this special purpose of tac files, some people name them with a .tac extension instead of .py. But the twistd script doesn’t actually care about the extension.

Let’s run our server for real, using twistd:

twistd --nodaemon --python twisted-server-3/fastpoetry.py

After running that command, you should see some output like this:

2010-06-23 20:57:14-0700 [-] Log opened.
2010-06-23 20:57:14-0700 [-] twistd 10.0.0 (/usr/bin/python 2.6.5) starting up.
2010-06-23 20:57:14-0700 [-] reactor class: twisted.internet.selectreactor.SelectReactor.
2010-06-23 20:57:14-0700 [-] __builtin__.PoetryFactory starting on 10000
2010-06-23 20:57:14-0700 [-] Starting factory <__builtin__.PoetryFactory instance at 0x14ae8c0>
2010-06-23 20:57:14-0700 [-] loaded a poem from: poetry/ecstasy.txt

Here’s a few things to notice:

  1. You can see the output of the Twisted logging system, including the PoetryFactory‘s call to log.msg. But we didn’t install a logger in our tac file, so twistd must have installed one for us.
  2. You can also see our two main services, the PoetryService and the TCPServer starting up.
  3. The shell prompt never came back. That means our server isn’t running as a daemon. By default, twistd does run a server as a daemon process (that’s the main reason twistd exists), but if you include the --nodaemon option then twistd will run your server as a regular shell process instead, and will direct the log output to standard output as well. This is useful for debugging your tac files.

Now test out the server by fetching a poem, either with one of our poetry clients or just netcat:

netcat localhost 10000

That should fetch the poem from the server and you should see a new log line like this:

2010-06-27 22:17:39-0700 [__builtin__.PoetryFactory] sending 3003 bytes of poetry to IPv4Address(TCP, '127.0.0.1', 58208)

That’s from the call to log.msg in PoetryProtocol.connectionMade. As you make more requests to the server, you will see additional log entries for each request.

Now stop the server by pressing Ctrl-C. You should see some output like this:

^C2010-06-29 21:32:59-0700 [-] Received SIGINT, shutting down.
2010-06-29 21:32:59-0700 [-] (Port 10000 Closed)
2010-06-29 21:32:59-0700 [-] Stopping factory <__builtin__.PoetryFactory instance at 0x28d38c0>
2010-06-29 21:32:59-0700 [-] Main loop terminated.
2010-06-29 21:32:59-0700 [-] Server Shut Down.

As you can see, Twisted does not simply crash, but shuts itself down cleanly and tells you about it with log messages. Notice our two main services shutting themselves down as well.

Ok, now start the server up once more:

twistd --nodaemon --python twisted-server-3/fastpoetry.py

Then open another shell and change to the twisted-intro directory. A directory listing should show a file called twistd.pid. This file is created by twistd and contains the process ID of our running server. Try executing this alternative command to shut down the server:

kill `cat twistd.pid`

Notice that twistd cleans up the process ID file when our server shuts down.

A Real Daemon

Now let’s start our server as an actual daemon process, which is even simpler to do as it’s twistd‘s default behavior:

twistd --python twisted-server-3/fastpoetry.py

This time we get our shell prompt back almost immediately. And if you list the contents of your directory you will see, in addition to the twistd.pid file for the server we just ran, a twistd.log file with the log entries that were formerly displayed at the shell prompt.

When starting a daemon process, twistd installs a log handler that writes entries to a file instead of standard output. The default log file is twistd.log, located in the same directory where you ran twistd, but you can change that with the --logfile option if you wish. The handler that twistd installs also rotates the log whenever the size exceeds one megabyte.

You should be able to see the server running by listing all the processes on your system. Go ahead and test out the server by fetching another poem. You should see new entries appear in the log file for each poem you request.

Since the server is no longer connected to the shell (or any other process except init), you cannot shut it down with Ctrl-C. As a true daemon process, it will continue to run even if you log out. But we can still use the twistd.pid file to stop the process:

kill `cat twistd.pid`

And when that happens the shutdown messages appear in the log, the twistd.pid file is removed, and our server stops running. Neato.

It’s a good idea to check out some of the other twistd startup options. For example, you can tell twistd to switch to a different user or group account before starting the daemon (typically a way to drop privileges your server doesn’t need as a security precaution). We won’t bother going into those extra options, you can find them using the --help switch to twistd.

The Twisted Plugin System

Ok, now we can use twistd to start up our servers as genuine daemon processes. This is all very nice, and the fact that our “configuration” files are really just Python source files gives us a great deal of flexibility in how we set things up. But we don’t always need that much flexibility. For our poetry servers, we typically only have a few options we might care about:

  1. The poem to serve.
  2. The port to serve it from.
  3. The interface to listen on.

Making new tac files for simple variations on those values seems rather excessive. It would be nice if we could just specify those values as options on the twistd command line. The Twisted plugin system allows us to do just that.

Twisted plugins provide a way of defining named Applications, with a custom set of command-line options, that twistd can dynamically discover and run. Twisted itself comes with a set of built-in plugins. You can see them all by running twistd without any arguments. Try running it now, but outside of the twisted-intro directory. After the help section, you should see some output like this:

    ...
    ftp                An FTP server.
    telnet             A simple, telnet-based remote debugging service.
    socks              A SOCKSv4 proxy service.
    ...

Each line shows one of the built-in plugins that come with Twisted. And you can run any of them using twistd.
Each plugin also comes with its own set of options, which you can discover using --help. Let’s see what the options for the ftp plugin are:

twistd ftp --help

Note that you need to put the --help switch after the ftp command, since you want the options for the ftp plugin rather than for twistd itself.
We can run the ftp server with twistd just like we ran our poetry server. But since it’s a plugin, we just run it by name:

twistd --nodaemon ftp --port 10001

That command runs the ftp plugin in non-daemon mode on port 10001. Note the twistd option nodaemon comes before the plugin name, while the plugin-specific option port comes after the plugin name. As with our poetry server, you can stop that plugin with Ctrl-C.

Ok, let’s turn our poetry server into a Twisted plugin. First we need to introduce a couple of new concepts.

IPlugin

Any Twisted plugin must implement the twisted.plugin.IPlugin interface. If you look at the declaration of that Interface, you’ll find it doesn’t actually specify any methods. Implementing IPlugin is simply a way for a plugin to say “Hello, I’m a plugin!” so twistd can find it. Of course, to be of any use, it will have to implement some other interface and we’ll get to that shortly.

But how do you know if an object actually implements an empty interface? The zope.interface package includes a function called implements that you can use to declare that a particular class implements a particular interface. We’ll see an example of that in the plugin version of our poetry server.

IServiceMaker

In addition to IPlugin, our plugin will implement the IServiceMaker interface. An object which implements IServiceMaker knows how to create an IService that will form the heart of a running application. IServiceMaker specifies three attributes and a method:

  1. tapname: a string name for our plugin. The “tap” stands for Twisted Application Plugin. Note: an older version of Twisted also made use of pickled application files called “tapfiles”, but that functionality is deprecated.
  2. description: a description of the plugin, which twistd will display as part of its help text.
  3. options: an object which describes the command-line options this plugin accepts.
  4. makeService: a method which creates a new IService object, given a specific set of command-line options

We’ll see how all this gets put together in the next version of our poetry server.

Fast Poetry 3.0

Now we’re ready to take a look at the plugin version of Fast Poetry, located in twisted/plugins/fastpoetry_plugin.py.

You might notice we’ve named these directories differently than any of the other examples. That’s because twistd requires plugin files to be located in a twisted/plugins directory, located in your Python module search path. The directory doesn’t have to be a package (i.e., you don’t need any __init__.py files) and you can have multiple twisted/plugins directories on your path and twistd will find them all. The actual filename you use for the plugin doesn’t matter either, but it’s still a good idea to name it according to the application it represents, like we have done here.

The first part of our plugin contains the same poetry protocol, factory, and service implementations as our tac file. And as before, this code would normally be in a separate module but we’ve placed it in the plugin to make the example self-contained.

Next comes the declaration of the plugin’s command-line options:

class Options(usage.Options):

    optParameters = [
        ['port', 'p', 10000, 'The port number to listen on.'],
        ['poem', None, None, 'The file containing the poem.'],
        ['iface', None, 'localhost', 'The interface to listen on.'],
        ]

This code specifies the plugin-specific options that a user can place after the plugin name on the twistd command line. We won’t go into details here as it should be fairly clear what is going on. Now we get to the main part of our plugin, the service maker class:

class PoetryServiceMaker(object):

    implements(service.IServiceMaker, IPlugin)

    tapname = "fastpoetry"
    description = "A fast poetry service."
    options = Options

    def makeService(self, options):
        top_service = service.MultiService()

        poetry_service = PoetryService(options['poem'])
        poetry_service.setServiceParent(top_service)

        factory = PoetryFactory(poetry_service)
        tcp_service = internet.TCPServer(int(options['port']), factory,
                                         interface=options['iface'])
        tcp_service.setServiceParent(top_service)

        return top_service

Here you can see how the zope.interface.implements function is used to declare that our class implements both IServiceMaker and IPlugin.

You should recognize the code in makeService from our earlier tac file implementation. But this time we don’t need to make an Application object ourselves, we just create and return the top level service that our application will run and twistd will take care of the rest. Notice how we use the options argument to retrieve the plugin-specific command-line options given to twistd.

After declaring that class, there’s only on thing left to do:

service_maker = PoetryServiceMaker()

The twistd script will discover that instance of our plugin and use it to construct the top level service. Unlike the tac file, the variable name we choose is irrelevant. What matters is that our object implements both IPlugin and IServiceMaker.

Now that we’ve created our plugin, let’s run it. Make sure that you are in the twisted-intro directory, or that the twisted-intro directory is in your python module search path. Then try running twistd by itself. You should now see that “fastpoetry” is one of the plugins listed, along with the description text from our plugin file.

You will also notice that a new file called dropin.cache has appeared in the twisted/plugins directory. This file is created by twistd to speed up subsequent scans for plugins.

Now let’s get some help on using our plugin:

twistd fastpoetry --help

You should see the options that are specific to the fastpoetry plugin in the help text. Finally, let’s run our plugin:

twistd fastpoetry --port 10000 --poem poetry/ecstasy.txt

That will start a fastpoetry server running as a daemon. As before, you should see both twistd.pid and twistd.log files in the current directory. After testing out the server, you can shut it down:

kill `cat twistd.pid`

And that’s how you make a Twisted plugin.

Summary

In this Part we learned about turning our Twisted servers into long-running daemons. We touched on the Twisted logging system and on how to use twistd to start a Twisted application as a daemon process, either from a tac configuration file or a Twisted plugin. In Part 17 we’ll return to the more fundamental topic of asynchronous programming and look at another way of structuring our callbacks in Twisted.

Suggested Exercises

  1. Modify the tac file to serve a second poem on another port. Keep the services for each poem separate by using another MultiService object.
  2. Create a new tac file that starts a poetry proxy server.
  3. Modify the plugin file to accept an optional second poetry file and second port to serve it on.
  4. Create a new plugin for the poetry proxy server.
Categories
Blather Programming Python Software

Tested Poetry

Part 15: Tested Poetry

This continues the introduction started here. You can find an index to the entire series here.

Introduction

We’ve written a lot of code in our exploration of Twisted, but so far we’ve neglected to write something important — tests. And you may be wondering how you can test asynchronous code using a synchronous framework like the unittest package that comes with Python. The short answer is you can’t. As we’ve discovered, synchronous and asynchronous code do not mix, at least not readily.

Fortunately, Twisted includes its own testing framework called trial that does support testing asynchronous code (and you can use it to test synchronous code, too).

We’ll assume you are already familiar with the basic mechanics of unittest and similar testing frameworks, in which you create tests by defining a class with a specific parent class (usually called something like TestCase), and each method of that class starting with the word “test” is considered a single test. The framework takes care of discovering all the tests, running them one after the other with optional setUp and tearDown steps, and then reporting the results.

The Example

You will find some example tests located in tests/test_poetry.py. To ensure all our examples are self-contained (so you don’t need to worry about PYTHONPATH settings), we have copied all the necessary code into the test module. Normally, of course, you would just import the modules you wanted to test.

The example is testing both the poetry client and server, by using the client to fetch a poem from a test server. To provide a poetry server for testing, we implement the setUp method in our test case:

class PoetryTestCase(TestCase):

    def setUp(self):
        factory = PoetryServerFactory(TEST_POEM)
        from twisted.internet import reactor
        self.port = reactor.listenTCP(0, factory, interface="127.0.0.1")
        self.portnum = self.port.getHost().port

The setUp method makes a poetry server with a test poem, and listens on a random, open port. We save the port number so the actual tests can use it, if they need to. And, of course, we clean up the test server in tearDown when the test is done:

    def tearDown(self):
        port, self.port = self.port, None
        return port.stopListening()

That brings us to our first test, test_client, where we use get_poetry to retrieve the poem from the test server and verify it’s the poem we expected:

    def test_client(self):
        """The correct poem is returned by get_poetry."""
        d = get_poetry('127.0.0.1', self.portnum)

        def got_poem(poem):
            self.assertEquals(poem, TEST_POEM)

        d.addCallback(got_poem)

        return d

Notice that our test function is returning a deferred. Under trial, each test method runs as a callback. That means the reactor is running and we can perform asynchronous operations as part of the test. We just need to let the framework know that our test is asynchronous and we do that in the usual Twisted way — return a deferred.

The trial framework will wait until the deferred fires before calling the tearDown method, and will fail the test if the deferred fails (i.e., if the last callback/errback pair fails). It will also fail the test if our deferred takes too long to fire, two minutes by default. And that means if the test finished, we know our deferred fired, and therefore our callback fired and ran the assertEquals test method.

Our second test, test_failure, verifies that get_poetry fails in the appropriate way if we can’t connect to the server:

    def test_failure(self):
        """The correct failure is returned by get_poetry when
        connecting to a port with no server."""
        d = get_poetry('127.0.0.1', 0)
        return self.assertFailure(d, ConnectionRefusedError)

Here we attempt to connect to an invalid port and then use the trial-provided assertFailure method. This method is like the familiar assertRaises method but for asynchronous code. It returns a deferred that succeeds if the given deferred fails with the given exception, and fails otherwise.

You can run the tests yourself using the trial script like this:

trial tests/test_poetry.py

And you should see some output showing each test case and an OK telling you each test passed.

Discussion

Because trial is so similar to unittest when it comes to the basic API, it’s pretty easy to get started writing tests. Just return a deferred if your test uses asynchronous code, and trial will take care of the rest. You can also return a deferred from the setUp and tearDown methods, if those need to be asynchronous as well.

Any log messages from your tests will be collected in a file inside a directory called _trial_temp that trial will create automatically if it doesn’t exist. In addition to the errors printed to the screen, the log is a useful starting point when debugging failing tests.

Figure 33 shows a hypothetical test run in progress:

Figure 33: a trial test in progress
Figure 33: a trial test in progress

If you’ve used similar frameworks before, this should be a familiar model, except that all the test-related methods may return deferreds.

The trial framework is also a good illustration of how “going asynchronous” involves changes that cascade throughout the program. In order for a test (or any function or method) to be asynchronous, it must:

  1. Not block and, usually,
  2. return a deferred.

But that means that whatever calls that function must be willing to accept a deferred, and also not block (and thus likely return a deferred as well). And so it goes up and up. Thus, the need for a framework like trial which can handle asynchronous tests that return deferreds.

Summary

That’s it for our look at unit testing. If would like to see more examples of how to write unit tests for Twisted code, you need look no further than Twisted itself. The Twisted framework comes with a very large suite of unit tests, with new ones added in each release. Since these tests are scrutinized by Twisted experts during code reviews before being accepted into the codebase, they make excellent examples of how to test Twisted code the right way.

In Part 16 we will use a Twisted utility to turn our poetry server into a genuine daemon.

Suggested Exercises

  1. Change one of the tests to make it fail and run trial again to see the output.
  2. Read the online trial documentation.
  3. Write tests for some of the other poetry services we have created in this series.
  4. Explore some of the tests in Twisted.
Categories
Books Erlang Programming Software

Book: Concurrent Programming in Erlang (2nd Edition)

Although out of date in its technical details, the later chapters contain some excellent examples of distributed software construction in Erlang.

Categories
Books Erlang Programming Software

Book: Erlang Programming

This is a nice complement to the Armstrong book. In particular, it has a great discussion of the Erlang tracing mechanisms.

Categories
Blather Programming Python Software

When a Deferred Isn’t

Part 14: When a Deferred Isn’t

This continues the introduction started here. You can find an index to the entire series here.

Introduction

In this part we’re going to learn another aspect of the Deferred class. To motivate the discussion, we’ll add one more server to our stable of poetry-related services. Suppose we have a large number of internal clients who want to get poetry from the same external server. But this external server is slow and already over-burdened by the insatiable demand for poetry across the Internet. We don’t want to contribute to that poor server’s problems by sending all our clients there too.

So instead we’ll make a caching proxy server. When a client connects to the proxy, the proxy will either fetch the poem from the external server or return a cached copy of a previously retrieved poem. Then we can point all our clients at the proxy and our contribution to the external server’s load will be negligible. We illustrate this setup in Figure 30:

Figure 30: a caching proxy server
Figure 30: a caching proxy server

Consider what happens when a client connects to the proxy to get a poem. If the proxy’s cache is empty, the proxy must wait (asynchronously) for the external server to respond before sending a poem back. So far so good, we already know how to handle that situation with an asynchronous function that returns a deferred. On the other hand, if there’s already a poem in the cache, the proxy can send it back immediately, no need to wait at all.  So the proxy’s internal mechanism for getting a poem will sometimes be asynchronous and sometimes synchronous.

So what do we do if we have a function that is only asynchronous some of the time? Twisted provides a couple of options, and they both depend on a feature of the Deferred class we haven’t used yet: you can fire a deferred before you return it to the caller.

This works because, although you cannot fire a deferred twice, you can add callbacks and errbacks to a deferred after it has fired. And when you do so, the deferred simply continues firing the chain from where it last left off. One important thing to note is an already-fired deferred may fire the new callback (or errback, depending on the state of the deferred) immediately, i.e., right when you add it.

Consider Figure 31, showing a deferred that has been fired:

Figure 31: a deferred that has been fired
Figure 31: a deferred that has been fired

If we were to add another callback/errback pair at this point, then the deferred would immediately fire the new callback, as in Figure 32:

Figure 32: the same deferred with a new callback
Figure 32: the same deferred with a new callback

The callback (not the errback) is fired because the previous callback succeeded. If it had failed (raised an Exception or returned a Failure) then the new errback would have been called instead.

We can test out this new feature with the example code in twisted-deferred/defer-11.py. Read and run that script to see how a deferred behaves when you fire it and then add callbacks. Note how in the first example each new callback is invoked immediately (you can tell from the order of the print output).

The second example in that script shows how we can pause() a deferred so it doesn’t fire the callbacks right away. When we are ready for the callbacks to fire, we call unpause(). That’s actually the same mechanism the deferred uses to pause itself when one of its callbacks returns another deferred. Nifty!

Proxy 1.0

Now let’s look at the first version of the poetry proxy in twisted-server-1/poetry-proxy.py. Since the proxy acts as both a client and a server, it has two pairs of Protocol/Factory classes, one for serving up poetry, and one for getting a poem from the external server. We won’t bother looking at the code for the client pair, it’s the same as in previous poetry clients.

But before we look at the server pair, we’ll look at the ProxyService, which the server-side protocol uses to get a poem:

class ProxyService(object):

    poem = None # the cached poem

    def __init__(self, host, port):
        self.host = host
        self.port = port

    def get_poem(self):
        if self.poem is not None:
            print 'Using cached poem.'
            return self.poem

        print 'Fetching poem from server.'
        factory = PoetryClientFactory()
        factory.deferred.addCallback(self.set_poem)
        from twisted.internet import reactor
        reactor.connectTCP(self.host, self.port, factory)
        return factory.deferred

    def set_poem(self, poem):
        self.poem = poem
        return poem

The key method there is get_poem. If there’s already a poem in the cache, that method just returns the poem itself. On the other hand, if we haven’t got a poem yet, we initiate a connection to the external server and return a deferred that will fire when the poem comes back. So get_poem is a function that is only asynchronous some of the time.

How do you handle a function like that? Let’s look at the server-side protocol/factory pair:

class PoetryProxyProtocol(Protocol):

    def connectionMade(self):
        d = maybeDeferred(self.factory.service.get_poem)
        d.addCallback(self.transport.write)
        d.addBoth(lambda r: self.transport.loseConnection())

class PoetryProxyFactory(ServerFactory):

    protocol = PoetryProxyProtocol

    def __init__(self, service):
        self.service = service

The factory is straightforward — it’s just saving a reference to the proxy service so that protocol instances can call the get_poem method. The protocol is where the action is. Instead of calling get_poem directly, the protocol uses a wrapper function from the twisted.internet.defer module named maybeDeferred.

The maybeDeferred function takes a reference to another function, plus some optional arguments to call that function with (we aren’t using any here). Then maybeDeferred will actually call that function and:

  • If the function returns a deferred, maybeDeferred returns that same deferred, or
  • If the function returns a Failure, maybeDeferred returns a new deferred that has been fired (via .errback) with that Failure, or
  • If the function returns a regular value, maybeDeferred returns a deferred that has already been fired with that value as the result, or
  • If the function raises an exception, maybeDeferred returns a deferred that has already been fired (via .errback()) with that exception wrapped in a Failure.

In other words, the return value from maybeDeferred is guaranteed to be a deferred, even if the function you pass in never returns a deferred at all. This allows us to safely call a synchronous function (even one that fails with an exception) and treat it like an asynchronous function returning a deferred.

Note 1: There will still be a subtle difference, though. A deferred returned by a synchronous function has already been fired, so any callbacks or errbacks you add will run immediately, rather than in some future iteration of the reactor loop.

Note 2: In hindsight, perhaps naming a function that always returns a deferred “maybeDeferred” was not the best choice, but there you go.

Once the protocol has a real deferred in hand, it can just add some callbacks that send the poem to the client and then close the connection. And that’s it for our first poetry proxy!

Running the Proxy

To try out the proxy, start up a poetry server, like this:

python twisted-server-1/fastpoetry.py --port 10001 poetry/fascination.txt

And now start a proxy server like this:

python twisted-server-1/poetry-proxy.py --port 10000 10001

It should tell you that it’s proxying poetry on port 10000 for the server on port 10001.
Now you can point a client at the proxy:

python twisted-client-4/get-poetry.py 10000

We’ll use an earlier version of the client that isn’t concerned with poetry transformations. You should see the poem appear in the client window and some text in the proxy window saying it’s fetching the poem from the server. Now run the client again and the proxy should confirm it is using the cached version of the poem, while the client should show the same poem as before.

Proxy 2.0

As we mentioned earlier, there’s an alternative way to implement this scheme. This is illustrated in Poetry Proxy 2.0, located in twisted-server-2/poetry-proxy.py. Since we can fire deferreds before we return them, we can make the proxy service return an already-fired deferred when there’s already a poem in the cache. Here’s the new version of the get_poem method on the proxy service:

    def get_poem(self):
        if self.poem is not None:
            print 'Using cached poem.'
            # return an already-fired deferred
            return succeed(self.poem)

        print 'Fetching poem from server.'
        factory = PoetryClientFactory()
        factory.deferred.addCallback(self.set_poem)
        from twisted.internet import reactor
        reactor.connectTCP(self.host, self.port, factory)
        return factory.deferred

The defer.succeed function is just a handy way to make an already-fired deferred given a result. Read the implementation for that function and you’ll see it’s simply a matter of making a new deferred and then firing it with .callback(). If we wanted to return an already-failed deferred we could use defer.fail instead.

In this version, since get_poem always returns a deferred, the protocol class no longer needs to use maybeDeferred (though it would still work if it did, as we learned above):

class PoetryProxyProtocol(Protocol):

    def connectionMade(self):
        d = self.factory.service.get_poem()
        d.addCallback(self.transport.write)
        d.addBoth(lambda r: self.transport.loseConnection())

Other than these two changes, the second version of the proxy is just like the first, and you can run it in the same way we ran the original version.

Summary

In this Part we learned how deferreds can be fired before they are returned, and thus we can use them in synchronous (or sometimes synchronous) code. And we have two ways to do that:

  • We can use maybeDeferred to handle a function that sometimes returns a deferred and other times returns a regular value (or throws an exception), or
  • We can pre-fire our own deferreds, using defer.succeed and defer.fail, so our “semi-synchronous” functions always return a deferred no matter what.

Which technique we choose is really up to us. The former emphasizes the fact that our functions aren’t always asynchronous while the latter makes the client code simpler. Perhaps there’s not a definitive argument for choosing one over the other.

Both techniques are made possible because we can add callbacks and errbacks to a deferred after it has fired. And that explains the curious fact we discovered in Part 9 and the twisted-deferred/defer-unhandled.py example. We learned that an “unhandled error” in a deferred, in which either the last callback or errback fails, isn’t reported until the deferred is garbage collected (i.e., there are no more references to it in user code). Now we know why — since we could always add another callback pair to a deferred which does handle that error, it’s not until the last reference to a deferred is dropped that Twisted can say the error was not handled.

Now that you’ve spent so much time exploring the Deferred class, which is located in the twisted.internet package, you may have noticed it doesn’t actually have anything to do with the Internet. It’s just an abstraction for managing callbacks. So what’s it doing there? That is an artifact of Twisted’s history. In the best of all possible worlds (where I am paid millions of dollars to play in the World Ultimate Frisbee League), the defer module would probably be in twisted.python. Of course, in that world you would probably be too busy fighting crime with your super-powers to read this introduction. I suppose that’s life.

So is that it for deferreds? Do we finally know all their features? For the most part, we do. But Twisted includes alternate ways of using deferreds that we haven’t explored yet (we’ll get there!). And in the meantime, the Twisted developers have been beavering away adding new stuff. In an upcoming release, the Deferred class will acquire a brand new capability. We’ll introduce it in a future Part, but first we’ll take a break from deferreds and look at some other aspects of Twisted, including testing in Part 15.

Suggested Exercises

  1. Modify the twisted-deferred/defer-11.py example to illustrate pre-failing deferreds using .errback(). Read the documentation and implementation of the defer.fail function.
  2. Modify the proxy so that a cached poem older than 2 hours is discarded, causing the next poetry request to re-request it from the server
  3. The proxy is supposed to avoid contacting the server more than once, but if several client requests come in at the same time when there is no poem in the cache, the proxy will make multiple poetry requests. It’s easier to see if you use a slow server to test it out.

    Modify the proxy service so that only one request is generated. Right now the service only has two states: either the poem is in the cache or it isn’t. You will need to recognize a third state indicating a request has been made but not completed. When the get_poem method is called in the third state, add a new deferred to a list of ‘waiters’. That new deferred will be the result of the get_poem method. When the poem finally comes back, fire all the waiting deferreds with the poem and transition to the cached state. On the other hand, if the poem fails, fire the .errback() method of all the waiters and transition to the non-cached state.

  4. Add a transformation proxy to the proxy service. This service should work like the original transformation service, but use an external server to do the transformations.
  5. Consider this hypothetical piece of code:
    d = some_async_function() # d is a Deferred
    d.addCallback(my_callback)
    d.addCallback(my_other_callback)
    d.addErrback(my_errback)
    

    Suppose that when the deferred d is returned on line 1, it has not been fired. Is it possible for that deferred
    to fire while we are adding our callbacks and errback on lines 2-4? Why or why not?

Categories
Blather Programming Python Software

Deferred All The Way Down

Part 13: Deferred All The Way Down

This continues the introduction started here. You can find an index to the entire series here.

Introduction

Recall poetry client 5.1 from Part 10.The client used a Deferred to manage a callback chain that included a call to a poetry transformation engine. In client 5.1, the engine was implemented as a synchronous function call implemented in the client itself.

Now we want to make a new client that uses the networked poetry transformation service we wrote in Part 12. But here’s the wrinkle: since the transformation service is accessed over the network, we’ll need to use asynchronous I/O. And that means our API for requesting a transformation will have to be asynchronous, too. In other words, the try_to_cummingsify callback is going to return a Deferred in our new client.

So what happens when a callback in a deferred’s chain returns another deferred? Let’s call the first deferred the ‘outer’ deferred and the second the ‘inner’ one. Suppose callback N in the outer deferred returns the inner deferred. That callback  is saying “I’m asynchronous, my result isn’t here yet”. Since the outer deferred needs to call the next callback or errback in the chain with the result, the outer deferred needs to wait until the inner deferred is fired. Of course, the outer deferred can’t block either, so instead the outer deferred suspends the execution of the callback chain and returns control to the reactor (or whatever fired the outer deferred).

And how does the outer deferred know when to resume? Simple — by adding a callback/errback pair to the inner deferred. Thus, when the inner deferred is fired the outer deferred will resume executing its chain. If the inner deferred succeeds (i.e., it calls the callback added by the outer deferred), then the outer deferred calls its N+1 callback with the result. And if the inner deferred fails (calls the errback added by the outer deferred), the outer deferred calls the N+1 errback with the failure.

That’s a lot to digest, so let’s illustrate the idea in Figure 28:

Figure 28: outer and inner deferred processing
Figure 28: outer and inner deferred processing

In this figure the outer deferred has 4 layers of callback/errback pairs. When the outer deferred fires, the first callback in the chain returns a deferred (the inner deferred). At that point, the outer deferred will stop firing its chain and return control to the reactor (after adding a callback/errback pair to the inner deferred). Then, some time later, the inner deferred fires and the outer deferred resumes processing its callback chain. Note the outer deferred does not fire the inner deferred itself. That would be impossible, since the outer deferred cannot know when the inner deferred’s result is available, or what that result might be. Rather, the outer deferred simply waits (asynchronously) for the inner deferred to fire.

Notice how the line connecting the callback to the inner deferred in Figure 28 is black instead of green or red. That’s because we don’t know whether the callback succeeded or failed until the inner deferred is fired. Only then can the outer deferred decide whether to call the next callback or the next errback in its own chain.

Figure 29 shows the same outer/inner deferred firing sequence in Figure 28 from the point of view of the reactor:

Figure 29: the thread of control in Figure 28
Figure 29: the thread of control in Figure 28

This is probably the most complicated feature of the Deferred class, so don’t worry if you need some time to absorb it. We’ll illustrate it one more way using the example code in twisted-deferred/defer-10.py. That example creates two outer deferreds, one with plain callbacks, and one where a single callback returns an inner deferred. By studying the code and the output you can see how the second outer deferred stops running its chain when the inner deferred is returned, and then starts up again when the inner deferred is fired.

Client 6.0

Let’s use our new knowledge of nested deferreds and re-implement our poetry client to use the network transformation service from Part 12. You can find the code in twisted-client-6/get-poetry.py. The poetry Protocol and Factory are unchanged from the previous version. But now we have a Protocol and Factory for making transformation requests. Here’s the transform client Protocol:

class TransformClientProtocol(NetstringReceiver):

    def connectionMade(self):
        self.sendRequest(self.factory.xform_name, self.factory.poem)

    def sendRequest(self, xform_name, poem):
        self.sendString(xform_name + '.' + poem)

    def stringReceived(self, s):
        self.transport.loseConnection()
        self.poemReceived(s)

    def poemReceived(self, poem):
        self.factory.handlePoem(poem)

Using the NetstringReceiver as a base class makes this implementation pretty simple. As soon as the connection is established we send the transform request to the server, retrieving the name of the transform and the poem from our factory. And when we get the poem back, we pass it on to the factory for processing. Here’s the code for the Factory:

class TransformClientFactory(ClientFactory):

    protocol = TransformClientProtocol

    def __init__(self, xform_name, poem):
        self.xform_name = xform_name
        self.poem = poem
        self.deferred = defer.Deferred()

    def handlePoem(self, poem):
        d, self.deferred = self.deferred, None
        d.callback(poem)

    def clientConnectionLost(self, _, reason):
        if self.deferred is not None:
            d, self.deferred = self.deferred, None
            d.errback(reason)

    clientConnectionFailed = clientConnectionLost

This factory is designed for clients and handles a single transformation request, storing both the transform name and the poem for use by the Protocol. The Factory creates a single Deferred which represents the result of the transformation request. Notice how the Factory handles two error cases: a failure to connect and a connection that is closed before the poem is received. Also note the clientConnectionLost method is called even if we receive the poem, but in that case self.deferred will be None, thanks to the handlePoem method.

This Factory class creates the Deferred that it also fires. That’s a good rule to follow in Twisted programming, so let’s highlight it:

In general, an object that makes a Deferred should also be in charge of firing that Deferred.

This “you make it, you fire it” rule helps ensure a given deferred is only fired once and makes it easier to follow the flow of control in a Twisted program.

In addition to the transform Factory, there is also a Proxy class which hides the details of making the TCP connection to a particular transform server:

class TransformProxy(object):
    """
    I proxy requests to a transformation service.
    """

    def __init__(self, host, port):
        self.host = host
        self.port = port

    def xform(self, xform_name, poem):
        factory = TransformClientFactory(xform_name, poem)
        from twisted.internet import reactor
        reactor.connectTCP(self.host, self.port, factory)
        return factory.deferred

This class presents a single xform() interface that other code can use to request transformations. So that other code can just request a transform and get a deferred back without mucking around with hostnames and port numbers.

The rest of the program is unchanged except for the try_to_cummingsify callback:

    def try_to_cummingsify(poem):
        d = proxy.xform('cummingsify', poem)

        def fail(err):
            print >>sys.stderr, 'Cummingsify failed!'
            return poem

        return d.addErrback(fail)

This callback now returns a deferred, but we didn’t have to change the rest of the main function at all, other than to create the Proxy instance. Since try_to_cummingsify was part of a deferred chain (the deferred returned by get_poetry), it was already being used asynchronously and nothing else need change.

You’ll note we are returning the result of d.addErrback(fail). That’s just a little bit of syntactic sugar. The addCallback and addErrback methods return the original deferred. We might just as well have written:

        d.addErrback(fail)
        return d

The first version is the same thing, just shorter.

Testing out the Client

The new client has a slightly different syntax than the others. If you have a transformation service running on port 10001 and two poetry servers running on ports 10002 and 10003, you would run:

python twisted-client-6/get-poetry.py 10001 10002 10003

To download two poems and transform them both. You can start the transform server like this:

python twisted-server-1/transformedpoetry.py --port 10001

And the poetry servers like this:

python twisted-server-1/fastpoetry.py --port 10002 poetry/fascination.txt
python twisted-server-1/fastpoetry.py --port 10003 poetry/science.txt

Then you can run the poetry client as above. After that, try crashing the transform server and re-running the client with the same command.

Wrapping Up

In this Part we learned how deferreds can transparently handle other deferreds in a callback chain, and thus we can safely add asynchronous callbacks to an ‘outer’ deferred without worrying about the details. That’s pretty handy since lots of our functions are going to end up being asynchronous.

Do we know everything there is to know about deferreds yet? Not quite! There’s one more important feature to talk about, but we’ll save it for Part 14.

Suggested Exercises

  1. Modify the client so we can ask for a specific kind of transformation by name.
  2. Modify the client so the transformation server address is an optional argument. If it’s not provided, skip the transformation step.
  3. The PoetryClientFactory currently violates the “you make it, you fire it” rule for deferreds. Refactor get_poetry and PoetryClientFactory to remedy that.
  4. Although we didn’t demonstrate it, the case where an errback returns a deferred is symmetrical. Modify the twisted-deferred/defer-10.py example to verify it.
  5. Find the place in the Deferred implementation that handles the case where a callback/errback returns another Deferred.
Categories
Blather Programming Python Software

A Poetry Transformation Server

Part 12: A Poetry Transformation Server

This continues the introduction started here. You can find an index to the entire series here.

One More Server

Alright, we’ve written one Twisted server so let’s write another, and then we’ll get back to learning some more about Deferreds.

In Parts 9 and 10 we introduced the idea of a poetry transformation engine. The one we eventually implemented, the cummingsifier, was so simple we had to add random exceptions to simulate a failure. But if the transformation engine was located on another server, providing a network “poetry transformation service”, then there is a much more realistic failure mode: the transformation server is down.

So in Part 12 we’re going to implement a poetry transformation server and then, in the next Part, we’ll update our poetry client to use the external transformation service and learn a few new things about Deferreds in the process.

Designing the Protocol

Up till now the interactions between client and server have been strictly one-way. The server sends a poem to the client while the client never sends anything at all to the server. But a transformation service is two-way — the client sends a poem to the server and then the server sends a transformed poem back. So we’ll need to use, or invent, a protocol to handle that interaction.

While we’re at it, let’s allow the server to support multiple kinds of transformations and allow the client to select which one to use. So the client will send two pieces of information: the name of the transformation and the complete text of the poem. And the server will return a single piece of information, namely the text of the transformed poem. So we’ve got a very simple sort of Remote Procedure Call.

Twisted includes support for several protocols we could use to solve this problem, including XML-RPC, Perspective Broker, and AMP.

But introducing any of these full-featured protocols would require us to go too far afield, so we’ll roll our own humble protocol instead. Let’s have the client send a string of the form (without the angle brackets):

<transform-name>.<text of the poem>

That’s just the name of the transform, followed by a period, followed by the complete text of the poem itself. And we’ll encode the whole thing in the form of a netstring. And the server will send back the text of the transformed poem, also in a netstring. Since netstrings use length-encoding, the client will be able to detect the case where the server fails to send back a complete result (maybe it crashed in the middle of the operation). If you recall, our original poetry protocol has trouble detecting aborted poetry deliveries.

So much for the protocol design. It’s not going to win any awards, but it’s good enough for our purposes.

The Code

Let’s look at the code of our transformation server, located in twisted-server-1/transformedpoetry.py. First, we define a TransformService class:

class TransformService(object):

    def cummingsify(self, poem):
        return poem.lower()

The transform service currently implements one transformation, cummingsify, via a method of the same name. We could add additional algorithms by adding additional methods. Here’s something important to notice: the transformation service is entirely independent of the particular details of the protocol we settled on earlier. Separating the protocol logic from the service logic is a common pattern in Twisted programming. Doing so makes it easy to provide the same service via multiple protocols without duplicating code.

Now let’s look at the protocol factory (we’ll look at the protocol right after):

class TransformFactory(ServerFactory):

    protocol = TransformProtocol

    def __init__(self, service):
        self.service = service

    def transform(self, xform_name, poem):
        thunk = getattr(self, 'xform_%s' % (xform_name,), None)

        if thunk is None: # no such transform
            return None

        try:
            return thunk(poem)
        except:
            return None # transform failed

    def xform_cummingsify(self, poem):
        return self.service.cummingsify(poem)

This factory provides a transform method which a protocol instance can use to request a poetry transformation on behalf of a connected client. The method returns None if there is no such transformation or if the transformation fails. And like the TransformService, the protocol factory is independent of the wire-level protocol, the details of which are delegated to the protocol class itself.

One thing to notice is the way we guard access to the service though the xform_-prefixed methods. This is a pattern you will find in the Twisted sources, although the prefixes vary and they are usually on an object separate from the factory. It’s one way of preventing client code from executing an arbitrary method on the service object, since the client can send any transform name they want. It also provides a place to perform protocol-specific adaptation to the API provided by the service object.

Now we’ll take a look at the protocol implementation:

class TransformProtocol(NetstringReceiver):

    def stringReceived(self, request):
        if '.' not in request: # bad request
            self.transport.loseConnection()
            return

        xform_name, poem = request.split('.', 1)

        self.xformRequestReceived(xform_name, poem)

    def xformRequestReceived(self, xform_name, poem):
        new_poem = self.factory.transform(xform_name, poem)

        if new_poem is not None:
            self.sendString(new_poem)

        self.transport.loseConnection()

In the protocol implementation we take advantage of the fact that Twisted supports netstrings via the NetstringReceiver protocol. That base class takes care of decoding (and encoding) the netstrings and all we have to do is implement the stringReceived method. In other words, stringReceived is called with the content of a netstring sent by the client, without the extra bytes added by the netstring encoding. The base class also takes care of buffering the incoming bytes until we have enough to decode a complete string.

If everything goes ok (and if it doesn’t we just close the connection) we send the transformed poem back to the client using the sendString method provided by NetstringReceiver (and which ultimately calls transport.write()). And that’s all there is to it. We won’t bother listing the main function since it’s similar to the ones we’ve seen before.

Notice how we continue the Twisted pattern of translating the incoming byte stream to higher and higher levels of abstraction by defining the xformRequestReceived method, which is passed the name of the transform and the poem as two separate arguments.

A Simple Client

We’ll implement a Twisted client for the transformation service in the next Part. For now we’ll just make do with a simple script located in twisted-server-1/transform-test. It uses the netcat program to send a poem to the server and then prints out the response (which will be encoded as a netstring). Let’s say you run the transformation server on port 11000 like this:

python twisted-server-1/transformedpoetry.py --port 11000

Then you could run the test script against that server like this:

./twisted-server-1/transform-test 11000

And you should see some output like this:

15:here is my poem,

That’s the netstring-encoded transformed poem (the original is in all upper case).

Discussion

We introduced a few new ideas in this Part:

  1. Two-way communication.
  2. Building on an existing protocol implementation provided by Twisted.
  3. Using a service object to separate functional logic from protocol logic.

The basic mechanics of two-way communication are simple. We used the same techniques for reading and writing data in previous clients and servers; the only difference is we used them both together. Of course, a more complex protocol will require more complex code to process the byte stream and format outgoing messages. And that’s a great reason to use an existing protocol implementation like we did today.

Once you start getting comfortable writing basic protocols, it’s a good idea to take a look at the different protocol implementations provided by Twisted. You might start by perusing the twisted.protocols.basic module and going from there. Writing simple protocols is a great way to familiarize yourself with the Twisted style of programming, but in a “real” program it’s probably a lot more common to use a ready-made implementation, assuming there is one available for the protocol you want to use.

The last new idea we introduced, the use of a Service object to separate functional and protocol logic, is a really important design pattern in Twisted programming. Although the service object we made today is trivial, you can imagine a more realistic network service could be quite complex. And by making the Service independent of protocol-level details, we can quickly provide the same service on a new protocol without duplicating code.

Figure 27 shows a transformation server that is providing poetry transformations via two different protocols (the version of the server we presented above only has one protocol):

Figure 27: a transformation server with two protocols
Figure 27: a transformation server with two protocols

Although we need two separate protocol factories in Figure 27, they might differ only in their protocol class attribute and would be otherwise identical. The factories would share the same Service object and only the Protocols themselves would require separate implementations. Now that’s code re-use!

Looking Ahead

So much for our transformation server. In Part 13, we’ll update our poetry client to use the transform server instead of implementing transformations in the client itself.

Suggested Exercises

  1. Read the source code for the NetstringReceiver class. What happens if the client sends a malformed netstring? What happens if the client tries to send a huge netstring?
  2. Invent another transformation algorithm and add it to the transformation service and the protocol factory. Test it out by modifying the netcat client.
  3. Invent another protocol for requesting poetry transformations and modify the server to handle both protocols (on two different ports). Use the same instance of the TransformService for both.
  4. How would the code need to change if the methods on the TransformService were asynchronous (i.e., they returned Deferreds)?
  5. Write a synchronous client for the transformation server.
  6. Update the original client and server to use netstrings when sending poetry.
Categories
Blather Programming Python Software

Your Poetry is Served

Part 11: Your Poetry is Served

This continues the introduction started here. You can find an index to the entire series here.

A Twisted Poetry Server

Now that we’ve learned so much about writing clients with Twisted, let’s turn around and re-implement our poetry server with Twisted too. And thanks to the generality of Twisted’s abstractions, it turns out we’ve already learned almost everything we need to know. Take a look at our Twisted poetry server located in twisted-server-1/fastpoetry.py. It’s called fastpoetry because this server sends the poetry as fast as possible, without any delays at all. Note there’s significantly less code than in the client!

Let’s take the pieces of the server one at a time. First, the PoetryProtocol:

class PoetryProtocol(Protocol):

    def connectionMade(self):
        self.transport.write(self.factory.poem)
        self.transport.loseConnection()

Like the client, the server uses a separate Protocol instance to manage each different connection (in this case, connections that clients make to the server). Here the Protocol is implementing the server-side portion of our poetry protocol. Since our wire protocol is strictly one-way, the server’s Protocol instance only needs to be concerned with sending data. If you recall, our wire protocol requires the server to start sending the poem immediately after the connection is made, so we implement the connectionMade method, a callback that is invoked after a Protocol instance is connected to a Transport.

Our method tells the Transport to do two things: send the entire text of the poem (self.transport.write) and close the connection (self.transport.loseConnection). Of course, both of those operations are asynchronous. So the call to write() really means “eventually send all this data to the client” and the call to loseConnection() really means “close this connection once all the data I’ve asked you to write has been written”.

As you can see, the Protocol retrieves the text of the poem from the Factory, so let’s look at that next:

class PoetryFactory(ServerFactory):

    protocol = PoetryProtocol

    def __init__(self, poem):
        self.poem = poem

Now that’s pretty darn simple. Our factory’s only real job, besides making PoetryProtocol instances on demand, is storing the poem that each PoetryProtocol sends to a client.

Notice that we are sub-classing ServerFactory instead of ClientFactory. Since our server is passively listening for connections instead of actively making them, we don’t need the extra methods ClientFactory provides. How can we be sure of that? Because we are using the listenTCP reactor method and the documentation for that method explains that the factory argument should be an instance of ServerFactory.

Here’s the main function where we call listenTCP:

def main():
    options, poetry_file = parse_args()

    poem = open(poetry_file).read()

    factory = PoetryFactory(poem)

    from twisted.internet import reactor

    port = reactor.listenTCP(options.port or 0, factory,
                             interface=options.iface)

    print 'Serving %s on %s.' % (poetry_file, port.getHost())

    reactor.run()

It basically does three things:

  1. Read the text of the poem we are going to serve.
  2. Create a PoetryFactory with that poem.
  3. Use listenTCP to tell Twisted to listen for connections on a port, and use our factory to make the protocol instances for each new connection.

After that, the only thing left to do is tell the reactor to start running the loop. You can use any of our previous poetry clients (or just netcat) to test out the server.

Discussion

Recall Figure 8 and Figure 9 from Part 5. Those figures illustrated how a new Protocol instance is created and initialized after Twisted makes a new connection on our behalf. It turns out the same mechanism is used when Twisted accepts a new incoming connection on a port we are listening on. That’s why both connectTCP and listenTCP require factory arguments.

One thing we didn’t show in Figure 9 is that the connectionMade callback is also called as part of Protocol initialization. This happens no matter what, but we didn’t need to use it in the client code. And the Protocol methods that we did use in the client aren’t used in the server’s implementation. So if we wanted to, we could make a shared library with a single PoetryProtocol that works for both clients and servers. That’s actually the way things are typically done in Twisted itself. For example, the NetstringReceiver Protocol can both read and write netstrings from and to a Transport.

We skipped writing a low-level version of our server, but let’s think about what sort of things are going on under the hood. First, calling listenTCP tells Twisted to create a listening socket and add it to the event loop. An “event” on a listening socket doesn’t mean there is data to read; instead it means there is a client waiting to connect to us.

Twisted will automatically accept incoming connection requests, thus creating a new client socket that links the server directly to an individual client. That client socket is also added to the event loop, and Twisted creates a new Transport and (via the PoetryFactory) a new PoetryProtocol instance to service that specific client. So the Protocol instances are always connected to client sockets, never to the listening socket.

We can visualize all of this in Figure 26:

Figure 26: the poetry server in action

In the figure there are three clients currently connected to the poetry server. Each Transport represents a single client socket, and the listening socket makes a total of four file descriptors for the select loop to monitor. When a client is disconnected the associated Transport and PoetryProtocol will be dereferenced and garbage-collected (assuming we haven’t stashed a reference to one of them somewhere, a practice we should avoid to prevent memory leaks). The PoetryFactory, meanwhile, will stick around as long as we keep listening for new connections which, in our poetry server, is forever. Like the beauty of poetry. Or something. At any rate, Figure 26 certainly cuts a fine figure of a Figure, doesn’t it?

The client sockets and their associated Python objects won’t live very long if the poem we are serving is relatively short. But with a large poem and a really busy poetry server we could end up with hundreds or thousands of simultaneous clients. And that’s OK — Twisted has no built-in limits on the number of connections it can handle. Of course, as you increase the load on any server, at some point you will find it cannot keep up or some internal OS limit is reached. For highly-loaded servers, careful measurement and testing is the order of the day.

Twisted also imposes no limit on the number of ports we can listen on. In fact, a single Twisted process could listen on dozens of ports and provide a different service on each one (by using a different factory class for each listenTCP call). And with careful design, whether you provide multiple services with a single Twisted process or several is a decision you could potentially even postpone to the deployment phase.

There’s a couple things our server is missing. First of all, it doesn’t generate any logs that might help us debug problems or analyze our network traffic. Furthermore, the server doesn’t run as a daemon, making it vulnerable to death by accidental Ctrl-C (or just logging out). We’ll fix both those problems in a future Part but first, in Part 12, we’ll write another server to perform poetry transformation.

Suggested Exercises

  1. Write an asynchronous poetry server without using Twisted, like we did for the client in Part 2. Note that listening sockets need to be monitored for reading and a “readable” listening socket means we can accept a new client socket.
  2. Write a low-level asynchronous poetry server using Twisted, but without using listenTCP or protocols, transports, and factories, like we did for the client in Part 4. So you’ll still be making your own sockets, but you can use the Twisted reactor instead of your own select loop.
  3. Make the high-level version of the Twisted poetry server a “slow server” by using callLater or LoopingCall to make multiple calls to transport.write(). Add the --num-bytes and --delay command line options supported by the blocking server. Don’t forget to handle the case where the client disconnects before receiving the whole poem.
  4. Extend the high-level Twisted server so it can serve multiple poems (on different ports).
  5. What are some reasons to serve multiple services from the same Twisted process? What are some reasons not to?
%d