Categories
Blather Programming Python Software

Deferred Poetry

Part 8: Deferred Poetry

This continues the introduction started here. You can find an index to the entire series here.

Client 4.0

Now that we have know something about deferreds, we can rewrite our Twisted poetry client to use them. You can find client 4.0 in twisted-client-4/get-poetry.py.

Our get_poetry function no longer needs callback or errback arguments. Instead, it returns a new deferred to which the user may attach callbacks and errbacks as needed.

def get_poetry(host, port):
    """
    Download a poem from the given host and port. This function
    returns a Deferred which will be fired with the complete text of
    the poem or a Failure if the poem could not be downloaded.
    """
    d = defer.Deferred()
    from twisted.internet import reactor
    factory = PoetryClientFactory(d)
    reactor.connectTCP(host, port, factory)
    return d

Our factory object is initialized with a deferred instead of a callback/errback pair. Once we have the poem, or we find out we couldn’t connect to the server, the deferred is fired with either a poem or a failure:

class PoetryClientFactory(ClientFactory):

    protocol = PoetryProtocol

    def __init__(self, deferred):
        self.deferred = deferred

    def poem_finished(self, poem):
        if self.deferred is not None:
            d, self.deferred = self.deferred, None
            d.callback(poem)

    def clientConnectionFailed(self, connector, reason):
        if self.deferred is not None:
            d, self.deferred = self.deferred, None
            d.errback(reason)

Notice the way we release our reference to the deferred after it is fired. This is a pattern found in several places in the Twisted source code and helps to ensure we do not fire the same deferred twice. It makes life a little easier for the Python garbage collector, too.

Once again, there is no need to change the PoetryProtocol, it’s just fine as is. All that remains is to update the poetry_main function:

def poetry_main():
    addresses = parse_args()

    from twisted.internet import reactor

    poems = []
    errors = []

    def got_poem(poem):
        poems.append(poem)

    def poem_failed(err):
        print >>sys.stderr, 'Poem failed:', err
        errors.append(err)

    def poem_done(_):
        if len(poems) + len(errors) == len(addresses):
            reactor.stop()

    for address in addresses:
        host, port = address
        d = get_poetry(host, port)
        d.addCallbacks(got_poem, poem_failed)
        d.addBoth(poem_done)

    reactor.run()

    for poem in poems:
        print poem

Notice how we take advantage of the chaining capabilities of the deferred to refactor the poem_done invocation out of our primary callback and errback.

Because deferreds are used so much in Twisted code, it’s common practice to use the single-letter local variable d to hold the deferred you are currently working on. For longer term storage, like object attributes, the name “deferred” is often used.

Discussion

With our new client the asynchronous version of get_poetry accepts the same information as our synchronous version, just the address of the poetry server. The synchronous version returns a poem, while the asynchronous version returns a deferred. Returning a deferred is typical of the asynchronous APIs in Twisted and programs written with Twisted, and this points to another way of conceptualizing deferreds:

A Deferred object represents an “asynchronous result” or a “result that has not yet come”.

We can contrast these two styles of programming in Figure 13:

Figure 13: sync versus async
Figure 13: sync versus async

By returning a deferred, an asynchronous API is giving this message to the user:

I’m an asynchronous function. Whatever you want me to do might not be done yet. But when it is done, I’ll fire the callback chain of this deferred with the result. On the other hand, if something goes wrong, I’ll fire the errback chain of this deferred instead.

Of course, that function itself won’t literally fire the deferred, it has already returned. Rather, the function has set in motion a chain of events that will eventually result in the deferred being fired.

So deferreds are a way of “time-shifting” the results of functions to accommodate the needs of the asynchronous model. And a deferred returned by a function is a notice that the function is asynchronous, the embodiment of the future result, and a promise that the result will be delivered.

It is possible for a synchronous function to return a deferred, so technically a deferred return value means the function is potentially asynchronous. We’ll see examples of synchronous functions returning deferreds in future Parts.

Because the behavior of deferreds is well-defined and well-known (to folks with some experience programming with Twisted), by returning deferreds from your own APIs you are making it easier for other Twisted programmers to understand and use your code. Without deferreds, each Twisted program, or even each internal Twisted component, might have its own unique method for managing callbacks that you would have to learn in order to use it.

When You’re Using Deferreds, You’re Still Using Callbacks, and They’re Still Invoked by the Reactor

When first learning Twisted, it is a common mistake to attribute more functionality to deferreds than they actually have. Specifically, it is often assumed that adding a function to a deferred’s chain automatically makes that function asynchronous. This might lead you to think you could use, say, os.system with Twisted by adding it to a deferred with addCallback.

I think this mistake is caused by trying to learn Twisted without first learning the asynchronous model. Since typical Twisted code uses lots of deferreds and only occasionally refers to the reactor, it can appear that deferreds are doing all the work. If you have read this introduction from the beginning, it should be clear this is far from the case. Although Twisted is composed of many parts that work together, the primary responsibility for implementing the asynchronous model falls to the reactor. Deferreds are a useful abstraction, but we wrote several versions of our Twisted client without using them in any way.

Let’s look at a stack trace at the point when our first callback is invoked. Run the example program in twisted-client-4/get-poetry-stack.py with the address of a running poetry server. You should get some output like this:

  File "twisted-client-4/get-poetry-stack.py", line 129, in
    poetry_main()
  File "twisted-client-4/get-poetry-stack.py", line 122, in poetry_main
    reactor.run()

  ... # some more Twisted function calls

    protocol.connectionLost(reason)
  File "twisted-client-4/get-poetry-stack.py", line 59, in connectionLost
    self.poemReceived(self.poem)
  File "twisted-client-4/get-poetry-stack.py", line 62, in poemReceived
    self.factory.poem_finished(poem)
  File "twisted-client-4/get-poetry-stack.py", line 75, in poem_finished
    d.callback(poem) # here's where we fire the deferred

  ... # some more methods on Deferreds

  File "twisted-client-4/get-poetry-stack.py", line 105, in got_poem
    traceback.print_stack()

That’s pretty similar to the stack trace we created for client 2.0. We can visualize the latest trace in Figure 14:

Figure 13: A callback with a deferred
Figure 14: A callback with a deferred

Again, this is similar to our previous Twisted clients, though the visual representation is starting to become vaguely disturbing. We probably won’t be showing any more of these, for the sake of the children. One wrinkle not reflected in the figure: the callback chain above doesn’t return control to the reactor until the second callback in the deferred (poem_done) is invoked, which happens right after the first callback (got_poem) returns.

There’s one more difference with our new stack trace. The line separating “Twisted code” from “our code” is a little fuzzier, since the methods on deferreds are really Twisted code. This interleaving of Twisted and user code in a callback chain is common in larger Twisted programs which make extensive use of other Twisted abstractions.

By using a deferred we’ve added a few more steps in the callback chain that starts in the Twisted reactor, but we haven’t changed the fundamental mechanics of the asynchronous model. Recall these facts about callback programming:

  1. Only one callback runs at a time.
  2. When the reactor is running our callbacks are not.
  3. And vice-versa.
  4. If our callback blocks then the whole program blocks.

Attaching a callback to a deferred doesn’t change these facts in any way. In particular, a callback that blocks will still block if it’s attached to a deferred. So that deferred will block when it is fired (d.callback), and thus Twisted will block. And we conclude:

Deferreds are a solution (a particular one invented by the Twisted developers) to the problem of managing callbacks. They are neither a way of avoiding callbacks nor a way to turn blocking callbacks into non-blocking callbacks.

We can confirm the last point by constructing a deferred with a blocking callback. Consider the example code in twisted-deferred/defer-block.py. The second callback blocks using the time.sleep function. If you run that script and examine the order of the print statements, it will be clear that a blocking callback also blocks inside a deferred.

Summary

By returning a Deferred, a function tells the user “I’m asynchronous” and provides a mechanism (add your callbacks and errbacks here!) to obtain the asynchronous result when it arrives. Deferreds are used extensively throughout the Twisted codebase and as you explore Twisted’s APIs you are bound to keep encountering them. So it will pay to become familiar with deferreds and comfortable in their use.

Client 4.0 is the first version of our Twisted poetry client that’s truly written in the “Twisted style”, using a deferred as the return value of an asynchronous function call. There are a few more Twisted APIs we could use to make it a little cleaner, but I think it represents a pretty good example of how simple Twisted programs are written, at least on the client side. Eventually we’ll re-write our poetry server using Twisted, too.

But we’re not quite finished with deferreds. For a relatively short piece of code, the Deferred class provides a surprising number of features. We’ll talk about some more of those features, and their motivation, in Part 9.

Suggested Exercises

  1. Update client 4.0 to timeout if the poem isn’t received after a given period of time. Fire the deferred’s errback with a custom exception in that case. Don’t forget to close the connection when you do.
  2. Update client 4.0 to print out the appropriate server address when a poem download fails, so the user can tell which server is the culprit. Don’t forget you can add extra positional- and keyword-arguments when you attach callbacks and errbacks.

53 replies on “Deferred Poetry”

Thank you for your article, I read it word by word.
I am waiting Part 9 now. Maybe my puzzle will solve by it—-How to organize a defferred with your work.

Thank you for your article, I read it word by word.
I am waiting Part 9 now. Maybe my puzzle will solve by it—-How to organize a defferred with your work implictly.

“One wrinkle not reflected in the figure: the callback chain above doesn’t return control to the reactor until the second callback in the deferred (poem_done) is invoked, which happens right after the first callback (got_poem) returns.”

Does it mean, if one define, say twenty callbacks to one deferred, would reactor do short pause before processing each callback in the chain, to look if there are any other jobs to do?

It depends, but given what we’ve covered up to part 8, it’s actually the other way around. All twenty callbacks will run before the reactor gets control again. You’ll see in a later part you can return control to the reactor before the entire chain is finished. But it’s the deferred’s decision to make, not the reactor’s. The reactor doesn’t really know anything about deferreds, it’s just invoking callbacks and a deferred is just a fancy callback.

Hello.

I don’t understand why you check, if the attribute self.deferred is a None-type, for example in “poem_finished” (see if-clause). How can it happen that the attribute becomes a None-type? And you said, that the release of the reference is good for the Python garbage collector…and why?

Another question: You explained, why you make a thing like “d, self.deferred = self.deferred, None” to ensure that this Deferred won’t be fired twice.

But I can’t imagine such a situation. I mean, after this statement, you fire the callback by calling “d.callback(…)”. Thus, after this we “leave” the factory and “switch” to the main-function, where the fired callback is (I know, it sounds a bit non-Twisted, but I hope, you understand what I mean^^). So, another firing of this deferred can’t happen.

Hi Pingu, as you discovered, the factory sets the deferred attributed to None before firing the deferred, that’s
how it can become None.

In this simple program, assuming I’ve written it correctly, it probably can’t be fired twice. But in more complex
scenarios it’s not always obvious that this is the case, especially if firing the deferred might cause another of
the factory’s method’s to fire. So it’s just pretty standard Twisted practice to use this pattern.

As far as the garbage collector goes, dropping references never hurts 🙂

Dave,
I’ve been working through your articles over the past few days and they are excellent and have provided the first real explanation of how asynchronous programming works that I’ve read.

So far the suggested exercises have been very good at enforcing what I’ve learned, however I was wondering if you could create more exercises, but also offer and example on how you think the problems should be solved as I’m unsure if the my implementations are the best or even correct the method of tackling the problems.

Hey Thomas, glad you like the introduction. I plan on working up solutions for at least some of the exercises eventually.
Right now I’m taking a little break from it, I’ve been working on it for over a year and a half now 🙂

Hi, first of all, great work on this introduction 🙂 It makes me want to use twisted the next time I write something for which it would be remotely applicable, just to try it out 😀

There is one thing that I didn’t see immediately in this part (although it was a mental facepalm once I figured it out).

Why can’t the deferred fire after it is returned, but before callbacks are added to it?

I think that that would make a nice exercise, to “test” whether people really got what asynchronous means at this point. It seems trivial, but I don’t think it is for most people when they hear about asynchronous programming for the first time. Also, I’d guess that understanding the answer is a great way to understand the entire concept.

cheers, and thanks for the great work! 🙂

Thanks very much! That’s a good point you bring up. I will add an exercise for that thought experiment. There is one wrinkle you will discover as you read further, which is that a deferred may be fired before it is returned. Because of that, I’m going to add the exercise in a later Part.

PoetryClientFactory doesn’t seem to behave like a factory. (It’s treated as a single-use object, contrary to the purpose of factories.) The Twisted docs indicate this is indeed how ClientFactory is used. Looking around a little more produced http://comments.gmane.org/gmane.comp.python.twisted/22806 where they end up discussing this. What I got from it was that ClientFactory had a use once but isn’t necessary anymore.

That’s true. You definitely can use it like a factory, but the use I present does not. The ‘endpoint’ API, released
after I wrote this, is a new alternative.

In the solution to this part, in the method startedConnecting for the PoetryClientFactory, the timeout callback is given as self.timeout, but later in the class there is no timeout method – there is a cancel_timeout method though. Should the first arg to callLater be cancel_timeout?

It’s interesting to note that the ClientFactory winds up keeping track of two deferreds – one for the normal poetry processing (for lack of a better term) and one for the timeout. Continuing the observation, the normal poetry processing deferred is passed up into the application code, and the timeout one isn’t – it’s handled strictly by the factory. Actually an argument is passed in to the factory to make it configurable, but that’s it as far as interaction with the app code goes.

I was wondering if there was a way to make a single callLater that would apply to all the downloads (if there were more than one in a run). Hmm, I haven’t tried out this code with more than one download yet …

Hi Brenda, in the solution, the first argument to callLater is actually the number of seconds to delay the call. The second argument (self.on_timeout) is the actual callback.

Your observation on the ClientFactory is correct.

It would be possible to have a single callLater callback apply to all the downloads, with a bit of code refactoring.

When doing the second exercise I just realized that I don’t quite know what the “interface” for a callback in the deferred context is. It seems I have to take err as the first argument and return that as well, correct?

I simply used this:
def showServer(err, h, p):
print “Server at %s:%s” % (h, p)
return err

and added this line to the factory:
d.addErrback(showServer, h=host, p=port)

Not sure if this is appropriate.

You are correct, the first argument of an errback is always the ‘Failure’
object representing the error. and if you want to propagate that same
error, you return it from the errback.

Your solution looks perfect to me!

I was looking at your solution to problem #1, and noticed that you handle the timeout as part of the PoetryClientFactory, rather than in the PoetryProtocol. In this small example your technique works well, because the factory has access to the deferred and the connector. However, I think this approach means the timeout held by the factory is shared by all the protocol objects built; meaning that you need a separate factory object for each protocol object built.

Am I wrong in this line of thought? Is it desireable to push the timeout logic into the protocol? and What’s the easiest way to do so?

That is correct, though it’s really the fact that the Factory has a single Deferred object which makes it a ‘one-shot’ object. The ‘connector’ instance is per-connection and that is passed to the timeout callback as an argument, so each different connection would in effect have a different timeout callback. Client factories are generally one-shot as you just need to make one connection anyway.

But let’s suppose you did want to make multiple connections with the factory. In the case of a server, there’s not going to be a Deferred instance on the factory anyway, so I think my solution would handle multiple connections just fine (each new connection would result in a new call to startedConnecting with a new connector instance).

You could move the timeout into the protocol instance, where you would probably start the timer in the connectionMade callback and call self.transport.loseConnection when the timeout expires. However, there is a wrinkle. The Protocol isn’t created until after the connection is made, and do you want the timeout to apply to the entire attempt, connection included, or only after the connection is made? The way I have written it, it applies to the whole process. By the way, there is a default timeout of 30 seconds for TCP connection attempts in Twisted, so technically I should really pass my configured timeout value to connectTCP as well. If you wanted to handle your own timeout in the Protocol but still have it apply to the entire process, you would need to set the timeout on connectTCP and then subtract the time it took to make a connection when you set the timer in the Protocol.

Your series continues to be a great help. Thanks again!

While I didn’t precisely know that what I needed was an introduction to asynchronous programming, apparently you did, so it all works out… 😉

So here’s my solution to exercise 1: http://pastebin.com/Yz4C4AA9

As it turns out, it’s pretty much what you describe in your response to Eric above. I was, in fact, conceiving of the timeout as applying to just the time spent receiving data, not to the whole connection attempt. So while I didn’t account for the wrinkle you mention (and I see why that might be a more usual conception), at least I did implement what I was trying to.

The solution I ended up with here is actually identical to the one I had for the timeout exercise in part 6 except that the call to self.factory.errback has become self.factory.deferred.errback. I tried to find ways to make more active use of the deferred, but I kept running up against the problem that connectionLost would always be called, and would always need some way of knowing the reason the connection closed.

I’ve looked through the solution you’ve provided, and (once I proved to myself that connector.closeConnection does not stop connectionLost from being fired) finally figured out was going on.

I actually tried to implement something similar at one point, though I never thought it through to the point where it would have worked – probably because my instincts tell against it. I balk at the idea of setting up something that will result in a chain of function calls that necessarily do nothing. It’s sort of like Lucy and the football, only with attribute references. Also, since I was thinking of the timeout as applying to the time spent receiving data, the protocol seemed a more natural location for the timeout mechanism than the factory.

What I really wanted was a way to send a message to loseConnection that would end up in the reason sent to connectionLost. Is there a reason such a feature would be undesirable?

I guess the main question I have about your solution is this: is a pattern of leaving chains of function calls in place while pulling the rug out from under them a common pattern in asynchronous programming?

Looks good, but I guess I don’t understand what you mean by “a chain of calls that necessarily do nothing”.

Which chain of calls are you referring to? Whether the timeout is fired or not is indeterminate when you launch
the client, it is a function of the speed of the server and the network and the specific timeout, etc. There is
no dead code involved that I can see. In each case either the timeout is fired or not, but you can’t know which
it will be ahead of time unless you know all the other pieces involved. In our toy example we do, of course, but
in general you will not. It’s no different really than an if statement where only one clause will be
executed each time, but that doesn’t mean the other clauses serve no purpose at other times.

I’m sorry if I was unclear. I especially did not mean to imply that any of the existing functions were pointless.

What I was thinking was this. If the connection times out, on_timeout dereferences the factory’s timeout_call and deferred attributes. However, the entire chain of function calls connectionLost -> poemReceived -> poem_finished -> cancel_timeout is still in place, even though those attribute references have been yanked away and it now does nothing.

You can’t know whether the timeout will fire or not, but once it does, you can know that this chain of calls no longer does anything — indeed, they should do nothing (or at least, do nothing with the received data), since this implementation just drops unfinished downloads. My instinct is to stop the chain from firing.

I tried to find a way to stop connectionLost from calling poemReceived without setting an explicit flag attribute, but I could not. This is why I wanted to be able to attach an error message to loseConnection, and I am still puzzled as to why it is not possible to send a status flag through loseConnection to connectionLost.

The obvious virtue of your implementation is that the protocol class stays very simple and it does not need to be involved in any of the logic of the timeout process. However, this raises the question of how to think about what belongs in the protocol, in the factory, etc., which are the questions I started this series trying to get a grasp of.

Perhaps another way to put all this might be that I’m trying to figure out to what degree the difference between your approach and mine is a neutral matter of aesthetics, and to what degree it follows from principles that you are aware of and I am not.

Oh, no offense taken, I assure you 🙂 I think I see what you are getting at now. Basically this all comes from the fact that I chose to have the end of the poem represented by the close of the connection. Not really great practice, frankly, since there are so many reasons why that might happen. But it does make things simpler to start with. If there were an explicit way to mark the end of a poem then I think this issue would not even arise. I’m not claiming my answer is definitive by the way.

Hey Indradhanush, that is a common pattern in Twisted. A Deferred can only be fired once,
so code that fires a deferred will often drop the reference to the Deferred (allowing it
to be garbage collected quicker as well) and replace it with None. The if
statement is checking to see if the Deferred has already been fired. Now in the particular
case of this example, it should never happen that both poem_finished and
clientConnectionFailed are called and there is more than one school of thought
on whether that sort of “defensive programming” is a good idea. But it is a common pattern
in the Twisted source code.

Dave, thanks for your efforts. Would not have began connecting the dots without your insight. So I tried implementing my own script based off your model but I want my connection to stay (or attempt to stay) connected forever. However, I don’t understand why you set self.deferred =None. After I receive my first data I get the message passed via the deferred but then it gets set to None so subsequent times through the if will always evaluate False and I don’t get a passed message. I don’t understand what the intent is here. Please enlighten.

class DeviceClientFactory(ReconnectingClientFactory):

protocol = DeviceProtocol

maxDelay = 1
initialDelay = 1
factor = 1
jitter = .1

def __init__(self, deferred):
self.deferred = deferred

”’def buildProtocol(self, addr):
print ‘Connected.’
print ‘Resetting reconnection delay’
self.resetDelay()
return DeviceProtocol()”’

def message_completed(self, message):
if self.deferred is not None:
d = self.deferred
d, self.deferred = self.deferred, None
d.callback(message)

def clientConnectionLost(self, connector, reason):
if self.deferred is not None:
d, self.deferred = self.deferred, None
d.errback(reason)

def clientConnectionFailed(self, connector, reason):
if self.deferred is not None:
d, self.deferred = self.deferred, None
d.errback(reason)

This seems to work for Exc 1:

class PoetryProtocol(Protocol):

poem = ''

def dataReceived(self, data):
self.poem += data

def connectionLost(self, reason):
if not self.factory.deferred.called:
self.poemReceived(self.poem)

def poemReceived(self, poem):
self.factory.poem_finished(poem)

class PoetryClientFactory(ClientFactory):

protocol = PoetryProtocol
callLaterReturn = 0
timedout = False

def __init__(self, deferred):
self.deferred = deferred

def poem_finished(self, poem):
if not self.deferred.called:
self.callLaterReturn.cancel()
print'Cancelling timeout'

if self.deferred is not None:
d, self.deferred = self.deferred, None
d.callback(poem)

def clientConnectionFailed(self, connector, reason):
if self.deferred is not None:
d, self.deferred = self.deferred, None
d.errback(reason)

def get_poetry(host, port):
“””
Download a poem from the given host and port. This function
returns a Deferred which will be fired with the complete text of
the poem or a Failure if the poem could not be downloaded.
“””
d = defer.Deferred()
from twisted.internet import reactor
factory = PoetryClientFactory(d)
reactor.connectTCP(host, port, factory)
return d, factory

def poetry_main():
addresses = {}

addresses[0] = ('localhost', 10000)

from twisted.internet import reactor

poems = []
errors = []

def got_poem(poem):
poems.append(poem)

def poem_failed(err):
print >>sys.stderr, 'Poem failed:', err
errors.append(err)

def poem_done(_):
if len(poems) + len(errors) == len(addresses):
reactor.stop()

def timeout(deferred, timedout):
if not timedout:
deferred.errback(Exception('Timed out'))
timedout = True

for address in addresses:
host, port = addresses[0]

d, factory = get_poetry(host, port)
factory.callLaterReturn = reactor.callLater(3, timeout, d, factory.timedout)
d.addCallbacks(got_poem, poem_failed)
d.addBoth(poem_done)

reactor.run()
for poem in poems:
print poem

if name == ‘main‘:
poetry_main()

Thanks for this very well written tutorial. Best I had seen on Twisted & Deferreds. You have a talent for writing explanatory material.

You should write a book on Twisted, and anything else you teach. Your style is very very good, and gets to the bottom of “why”. The only twisted book I have read is nowhere close to what you have written. I would buy it yesterday if you had written a book. Extremely lucid, and deep at the same time.

Hi Dave, enjoying the tutorial and attempting the exercises. I find that I am not completely comfortable about what functionality should(?) go in the PoetryProtocol and what in the PoetryClientFactory. For example in my solution to client 4.0 exercises 1 plus 2, I left the factory unchanged from your client 4.0 code, and made changes only to the protocol, and to poetry_main(). It is here, Solution to client 4.0 exercise 1 plus 2 in https://pastebin.com/XWCNzRm5. It seems to perform correctly.

In general what are the benefits of and requirements for a factory class in twisted applications?

The main question is what state needs to be shared across individual client connections, if any. Cross-connection state is a good candidate for state on the factory. For example, keeping track of how many current client connections you are handling.

Leave a Reply to Thomas KolarCancel reply

Discover more from krondo

Subscribe now to keep reading and get access to the full archive.

Continue reading