Categories
Blather Programming Python Software

I Thought I Wanted It But I Changed My Mind

Part 19: I Thought I Wanted It But I Changed My Mind

This continues the introduction started here. You can find an index to the entire series here.

Introduction

Twisted is an ongoing project and the Twisted developers regularly add new features and extend old ones. With the release of Twisted 10.1.0, the developers added a new capability — cancellation — to the Deferred class which we’re going to investigate today.

Asynchronous programming decouples requests from responses and thus raises a new possibility: between asking for the result and getting it back you might decide you don’t want it anymore. Consider the poetry proxy server from Part 14. Here’s how the proxy worked, at least for the first request of a poem:

  1. A request for a poem comes in.
  2. The proxy contacts the real server to get the poem.
  3. Once the poem is complete, send it to the original client.

Which is all well and good, but what if the client hangs up before getting the poem? Maybe they requested the complete text of Paradise Lost and then decided they really wanted a haiku by Kojo. Now our proxy is stuck with downloading the first one and that slow server is going to take a while. Better to close the connection and let the slow server go back to sleep.

Recall Figure 15, a diagram that shows the conceptual flow of control in a synchronous program. In that figure we see function calls going down, and exceptions going back up. If we wanted to cancel a synchronous function call (and this is just hypothetical) the flow control would go in the same direction as the function call, from high-level code to low-level code as in Figure 38:

Figure 38: synchronous program flow, with hypothetical cancellation
Figure 38: synchronous program flow, with hypothetical cancellation

Of course, in a synchronous program that isn’t possible because the high-level code doesn’t even resume running until the low-level operation is finished, at which point there is nothing to cancel. But in an asynchronous program the high-level code gets control of the program before the low-level code is done, which at least raises the possibility of canceling the low-level request before it finishes.

In a Twisted program, the lower-level request is embodied by a Deferred object, which you can think of as a “handle” on the outstanding asynchronous operation. The normal flow of information in a deferred is downward, from low-level code to high-level code, which matches the flow of return information in a synchronous program. Starting in Twisted 10.1.0, high-level code can send information back the other direction — it can tell the low-level code it doesn’t want the result anymore. See Figure 39:

Figure 39: Information flow in a deferred, including cancellation
Figure 39: Information flow in a deferred, including cancellation

Canceling Deferreds

Let’s take a look at a few sample programs to see how canceling deferreds actually works. Note, to run the examples and other code in this Part you will need a version of Twisted 10.1.0 or later. Consider deferred-cancel/defer-cancel-1.py:

from twisted.internet import defer

def callback(res):
    print 'callback got:', res

d = defer.Deferred()
d.addCallback(callback)
d.cancel()
print 'done'

With the new cancellation feature, the Deferred class got a new method called cancel. The example code makes a new deferred, adds a callback, and then cancels the deferred without firing it. Here’s the output:

done
Unhandled error in Deferred:
Traceback (most recent call last):
Failure: twisted.internet.defer.CancelledError:

Ok, so canceling a deferred appears to cause the errback chain to run, and our regular callback is never called at all. Also notice the error is a twisted.internet.defer.CancelledError, a custom Exception that means the deferred was canceled (but keep reading!). Let’s try adding an errback in deferred-cancel/defer-cancel-2.py:

from twisted.internet import defer

def callback(res):
    print 'callback got:', res

def errback(err):
    print 'errback got:', err

d = defer.Deferred()
d.addCallbacks(callback, errback)
d.cancel()
print 'done'

Now we get this output:

errback got: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.defer.CancelledError'>: 
]
done

So we can ‘catch’ the errback from a cancel just like any other deferred failure.

Ok, let’s try firing the deferred and then canceling it, as in deferred-cancel/defer-cancel-3.py:

from twisted.internet import defer

def callback(res):
    print 'callback got:', res

def errback(err):
    print 'errback got:', err

d = defer.Deferred()
d.addCallbacks(callback, errback)
d.callback('result')
d.cancel()
print 'done'

Here we fire the deferred normally with the callback method and then cancel it. Here’s the output:

callback got: result
done

Our callback was invoked (just as we would expect) and then the program finished normally, as if cancel was never called at all. So it seems canceling a deferred has no effect if it has already fired (but keep reading!).

What if we fire the deferred after we cancel it, as in deferred-cancel/defer-cancel-4.py?

from twisted.internet import defer

def callback(res):
    print 'callback got:', res

def errback(err):
    print 'errback got:', err

d = defer.Deferred()
d.addCallbacks(callback, errback)
d.cancel()
d.callback('result')
print 'done'

In that case we get this output:

errback got: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.defer.CancelledError'>: 
]
done

Interesting! That’s the same output as the second example, where we never fired the deferred at all. So if the deferred has been canceled, firing the deferred normally has no effect. But why doesn’t d.callback('result') raise an error, since you’re not supposed to be able to fire a deferred more than once, and the errback chain has clearly run?

Consider Figure 39 again. Firing a deferred with a result or failure is the job of lower-level code, while canceling a deferred is an action taken by higher-level code. Firing the deferred means “Here’s your result”, while canceling a deferred means “I don’t want it any more”. And remember that canceling is a new feature, so most existing Twisted code is not written to handle cancel operations. But the Twisted developers have made it possible for us to cancel any deferred we want to, even if the code we got the deferred from was written before Twisted 10.1.0.

To make that possible, the cancel method actually does two things:

  1. Tell the Deferred object itself that you don’t want the result if it hasn’t shown up yet (i.e, the deferred hasn’t been fired), and thus to ignore any subsequent invocation of callback or errback.
  2. And, optionally, tell the lower-level code that is producing the result to take whatever steps are required to cancel the operation.

Since older Twisted code is going to go ahead and fire that canceled deferred anyway, step #1 ensures our program won’t blow up if we cancel a deferred we got from an older library.

This means we are always free to cancel a deferred, and we’ll be sure not to get the result if it hasn’t arrived (even if it arrives later). But canceling the deferred might not actually cancel the asynchronous operation. Aborting an asynchronous operation requires a context-specific action. You might need to close a network connection, roll back a database transaction, kill a sub-process, et cetera. And since a deferred is just a general-purpose callback organizer, how is it supposed to know what specific action to take when you cancel it? Or, alternatively, how could it forward the cancel request to the lower-level code that created and returned the deferred in the first place? Say it with me now:

I know, with a callback!

Canceling Deferreds, Really

Alright, take a look at deferred-cancel/defer-cancel-5.py:

from twisted.internet import defer

def canceller(d):
    print "I need to cancel this deferred:", d

def callback(res):
    print 'callback got:', res

def errback(err):
    print 'errback got:', err

d = defer.Deferred(canceller) # created by lower-level code
d.addCallbacks(callback, errback) # added by higher-level code
d.cancel()
print 'done'

This code is basically like the second example, except there is a third callback (canceller) that’s passed to the Deferred when we create it, rather than added afterwards. This callback is in charge of performing the context-specific actions required to abort the asynchronous operation (only if the deferred is actually canceled, of course). The canceller callback is necessarily part of the lower-level code that returns the deferred, not the higher-level code that receives the deferred and adds its own callbacks and errbacks.

Running the example produces this output:

I need to cancel this deferred: <Deferred at 0xb7669d2cL>
errback got: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.defer.CancelledError'>: 
]
done

As you can see, the canceller callback is given the deferred whose result we no longer want. That’s where we would take whatever action we need to in order to abort the asynchronous operation. Notice that canceller is invoked before the errback chain fires. In fact, we may choose to fire the deferred ourselves at this point with any result or error of our choice (and thus preempting the CancelledError failure). Both possibilities are illustrated in deferred-cancel/defer-cancel-6.py and deferred-cancel/defer-cancel-7.py.

Let’s do one more simple test before we fire up the reactor. We’ll create a deferred with a canceller callback, fire it normally, and then cancel it. You can see the code in deferred-cancel/defer-cancel-8.py. By examining the output of that script, you can see that canceling a deferred after it has been fired does not invoke the canceller callback. And that’s as we would expect since there’s nothing to cancel.

The examples we’ve looked at so far haven’t had any actual asynchronous operations. Let’s make a simple program that invokes one asynchronous operation, then we’ll figure out how to make that operation cancellable. Consider the code in deferred-cancel/defer-cancel-9.py:

from twisted.internet.defer import Deferred

def send_poem(d):
    print 'Sending poem'
    d.callback('Once upon a midnight dreary')

def get_poem():
    """Return a poem 5 seconds later."""
    from twisted.internet import reactor
    d = Deferred()
    reactor.callLater(5, send_poem, d)
    return d


def got_poem(poem):
    print 'I got a poem:', poem

def poem_error(err):
    print 'get_poem failed:', err

def main():
    from twisted.internet import reactor
    reactor.callLater(10, reactor.stop) # stop the reactor in 10 seconds

    d = get_poem()
    d.addCallbacks(got_poem, poem_error)

    reactor.run()

main()

This example includes a get_poem function that uses the reactor’s callLater method to asynchronously return a poem five seconds after get_poem is called. The main function calls get_poem, adds a callback/errback pair, and then starts up the reactor. We also arrange (again using callLater) to stop the reactor in ten seconds. Normally we would do this by attaching a callback to the deferred, but you’ll see why we do it this way shortly.

Running the example produces this output (after the appropriate delay):

Sending poem
I got a poem: Once upon a midnight dreary

And after ten seconds our little program comes to a stop. Now let’s try canceling that deferred before the poem is sent. We’ll just add this bit of code to cancel the deferred after two seconds (well before the five second delay on the poem itself):

    reactor.callLater(2, d.cancel) # cancel after 2 seconds

The complete program is in deferred-cancel/defer-cancel-10.py, which produces the following output:

get_poem failed: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.defer.CancelledError'>: 
]
Sending poem

This example clearly illustrates that canceling a deferred does not necessarily cancel the underlying asynchronous request. After two seconds we see the output from our errback, printing out the CancelledError as we would expect. But then after five seconds will still see the output from send_poem (but the callback on the deferred doesn’t fire).

At this point we’re just in the same situation as deferred-cancel/defer-cancel-4.py. “Canceling” the deferred causes the eventual result to be ignored, but doesn’t abort the operation in any real sense. As we learned above, to make a truly cancelable deferred we must add a cancel callback when the deferred is created.

What does this new callback need to do? Take a look at the documentation for the callLater method. The return value of callLater is another object, implementing IDelayedCall, with a cancel method we can use to prevent the delayed call from being executed.

That’s pretty simple, and the updated code is in deferred-cancel/defer-cancel-11.py. The relevant changes are all in the get_poem function:

def get_poem():
    """Return a poem 5 seconds later."""

    def canceler(d):
        # They don't want the poem anymore, so cancel the delayed call
        delayed_call.cancel()

        # At this point we have three choices:
        #   1. Do nothing, and the deferred will fire the errback
        #      chain with CancelledError.
        #   2. Fire the errback chain with a different error.
        #   3. Fire the callback chain with an alternative result.

    d = Deferred(canceler)

    from twisted.internet import reactor
    delayed_call = reactor.callLater(5, send_poem, d)

    return d

In this new version, we save the return value from callLater so we can use it in our cancel callback. The only thing our callback needs to do is invoke delayed_call.cancel(). But as we discussed above, we could also choose to fire the deferred ourselves. The latest version of our example produces this output:

get_poem failed: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.defer.CancelledError'>: 
]

As you can see, the deferred is canceled and the asynchronous operation has truly been aborted (i.e., we don’t see the print output from send_poem).

Poetry Proxy 3.0

As we discussed in the Introduction, the poetry proxy server is a good candidate for implementing cancellation, as it allows us to abort the poem download if it turns out that nobody wants it (i.e., the client closes the connection before we send the poem). Version 3.0 of the proxy, located in twisted-server-4/poetry-proxy.py, implements deferred cancellation. The first change is in the PoetryProxyProtocol:

class PoetryProxyProtocol(Protocol):

    def connectionMade(self):
        self.deferred = self.factory.service.get_poem()
        self.deferred.addCallback(self.transport.write)
        self.deferred.addBoth(lambda r: self.transport.loseConnection())

    def connectionLost(self, reason):
        if self.deferred is not None:
            deferred, self.deferred = self.deferred, None
            deferred.cancel() # cancel the deferred if it hasn't fired

You might compare it to the older version. The two main changes are:

  1. Save the deferred we get from get_poem so we can cancel later if we need to.
  2. Cancel the deferred when the connection is closed. Note this also cancels the deferred after we actually get the poem, but as we discovered in the examples, canceling a deferred that has already fired has no effect.

Now we need to make sure that canceling the deferred actually aborts the poem download. For that we need to change the ProxyService:

class ProxyService(object):

    poem = None # the cached poem

    def __init__(self, host, port):
        self.host = host
        self.port = port

    def get_poem(self):
        if self.poem is not None:
            print 'Using cached poem.'
            # return an already-fired deferred
            return succeed(self.poem)

        def canceler(d):
            print 'Canceling poem download.'
            factory.deferred = None
            connector.disconnect()

        print 'Fetching poem from server.'
        deferred = Deferred(canceler)
        deferred.addCallback(self.set_poem)
        factory = PoetryClientFactory(deferred)
        from twisted.internet import reactor
        connector = reactor.connectTCP(self.host, self.port, factory)
        return factory.deferred

    def set_poem(self, poem):
        self.poem = poem
        return poem

Again, you may wish to compare this with the older version. This class has a few more changes:

  1. We save the return value from reactor.connectTCP, an IConnector object. We can use the disconnect method on that object to close the connection.
  2. We create the deferred with a canceler callback. That callback is a closure which uses the connector to close the connection. But first it sets the factory.deferred attribute to None. Otherwise, the factory might fire the deferred with a “connection closed” errback before the deferred itself fires with a CancelledError. Since this deferred was canceled, having the deferred fire with CancelledError seems more explicit.

You might also notice we now create the deferred in the ProxyService instead of the PoetryClientFactory. Since the canceler callback needs to access the IConnector object, the ProxyService ends up being the most convenient place to create the deferred.

And, as in one of our earlier examples, our canceler callback is implemented as a closure. Closures seem to be very useful when implementing cancel callbacks!

Let’s try out our new proxy. First start up a slow server. It needs to be slow so we actually have time to cancel:

python blocking-server/slowpoetry.py --port 10001 poetry/fascination.txt

Now we can start up our proxy (remember you need Twisted 10.1.0):

python twisted-server-4/poetry-proxy.py --port 10000 10001

Now we can start downloading a poem from the proxy using any client, or even just curl:

curl localhost:10000

After a few seconds, press Ctrl-C to stop the client, or the curl process. In the terminal running the proxy you should
see this output:

Fetching poem from server.
Canceling poem download.

And you should see the slow server has stopped printing output for each bit of poem it sends, since our proxy hung up. You can start and stop the client multiple times to verify each download is canceled each time. But if you let the poem run to completion, then the proxy caches the poem and sends it immediately after that.

One More Wrinkle

We said several times above that canceling an already-fired deferred has no effect. Well, that’s not quite true. In Part 13 we learned that the callbacks and errbacks attached to a deferred may return deferreds themselves. And in that case, the original (outer) deferred pauses the execution of its callback chains and waits for the inner deferred to fire (see Figure 28).

Thus, even though a deferred has fired the higher-level code that made the asynchronous request may not have received the result yet, because the callback chain is paused waiting for an inner deferred to finish. So what happens if the higher-level code cancels that outer deferred? In that case the outer deferred does not cancel itself (it has already fired after all); instead, the outer deferred cancels the inner deferred.

So when you cancel a deferred, you might not be canceling the main asynchronous operation, but rather some other asynchronous operation triggered as a result of the first. Whew!

We can illustrate this with one more example. Consider the code in deferred-cancel/defer-cancel-12.py:

from twisted.internet import defer

def cancel_outer(d):
    print "outer cancel callback."

def cancel_inner(d):
    print "inner cancel callback."

def first_outer_callback(res):
    print 'first outer callback, returning inner deferred'
    return inner_d

def second_outer_callback(res):
    print 'second outer callback got:', res

def outer_errback(err):
    print 'outer errback got:', err

outer_d = defer.Deferred(cancel_outer)
inner_d = defer.Deferred(cancel_inner)

outer_d.addCallback(first_outer_callback)
outer_d.addCallbacks(second_outer_callback, outer_errback)

outer_d.callback('result')

# at this point the outer deferred has fired, but is paused
# on the inner deferred.

print 'canceling outer deferred.'
outer_d.cancel()

print 'done'

In this example we create two deferreds, the outer and the inner, and have one of the outer callbacks return the inner deferred. First we fire the outer deferred, and then we cancel it. The example produces this output:

first outer callback, returning inner deferred
canceling outer deferred.
inner cancel callback.
outer errback got: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.defer.CancelledError'>: 
]
done

As you can see, canceling the outer deferred does not cause the outer cancel callback to fire. Instead, it cancels the inner deferred so the inner cancel callback fires, and then outer errback receives the CancelledError (from the inner deferred).

You may wish to stare at that code a while, and try out variations to see how they affect the outcome.

Discussion

Canceling a deferred can be a very useful operation, allowing our programs to avoid work they no longer need to do. And as we have seen, it can be a little bit tricky, too.

One very important fact to keep in mind is that canceling a deferred doesn’t necessarily cancel the underlying asynchronous operation. In fact, as of this writing, most deferreds won’t really “cancel”, since most Twisted code was written prior to Twisted 10.1.0 and hasn’t been updated. This includes many of the APIs in Twisted itself! Check the documentation and/or the source code to find out whether canceling the deferred will truly cancel the request, or simply ignore it.

And the second important fact is that simply returning a deferred from your asynchronous APIs will not necessarily make them cancelable in the complete sense of the word. If you want to implement canceling in your own programs, you should study the Twisted source code to find more examples. Cancellation is a brand new feature so the patterns and best practices are still being worked out.

Looking Ahead

At this point we’ve learned just about everything about Deferreds and the core concepts behind Twisted. Which means there’s not much more to introduce, as the rest of Twisted consists mainly of specific applications, like web programming or asynchronous database access. So in the next couple of Parts we’re going to take a little detour and look at two other systems that use asynchronous I/O to see how some of their ideas relate to the ideas in Twisted. Then, in the final Part, we will wrap up and suggest ways to continue your Twisted education.

Suggested Exercises

  1. Did you know you can spell canceled with one or two els? It’s true. It all depends on what sort of mood you’re in.
  2. Peruse the source code of the Deferred class, paying special attention to the implementation of cancellation.
  3. Search the Twisted 10.10 source code for examples of deferreds with cancel callbacks. Study their implementation.
  4. Make the deferred returned by the get_poetry method of one of our poetry clients cancelable.
  5. Make a reactor-based example that illustrates canceling an outer deferred which is paused on an inner deferred. If you use callLater you will need to choose the delays carefully to ensure the outer deferred is canceled at the right moment.
  6. Find an asynchronous API in Twisted that doesn’t support a true cancel and implement cancellation for it. Submit a patch to the Twisted project. Don’t forget unit tests!

7 replies on “I Thought I Wanted It But I Changed My Mind”

Thanks a ton for this article. I was updating my Twisted.Web code to support cancellation (or at least, not throw a bunch of errors on closed connections.) It was somewhat confusing, but your examples helped clarify the conditions under which each callback or error callback is run. I now have it properly working at my web request level, and hopefully soon will be able to actually have my actual database transactions abort properly on cancellation.

Okay this is more of a Python scoping question I suppose but I find it astonishing that your use of delayed_call in defer-cancel-11.py works.
It looks like a closure but delayed_call had not been defined get_poem when the function was created. I’m not even entirely sure why I find that weird but I do. :/
How does Python know to take this variable from the enclosing scope and store it with the closure if it has not been defined yet?
Does that make any sense? 🙂

It is definitely a closure and they can be a little brain-bendy 🙂 Closures
seem to work very well for the cancel callbacks.

I haven’t looked exactly at how Python implements closures, but evidently
it is doing some syntactic analysis of the closure code to identify the local
variables it will need to close over from the enclosing scope (I don’t think
it just blindly hangs on to all the locals, but I could be mistaken there).

Leave a Reply to IanCancel reply

Discover more from krondo

Subscribe now to keep reading and get access to the full archive.

Continue reading