An Introduction to Asynchronous Programming and Twisted

Part 17: Just Another Way to Spell “Callback”

This continues the introduction started here. You can find an index to the entire series here.

Introduction

In this Part we’re going to return to the subject of callbacks. We’ll introduce another technique for writing callbacks in Twisted that uses generators. We’ll show how the technique works and contrast it with using “pure” Deferreds. Finally we’ll rewrite one of our poetry clients using this technique. But first let’s review how generators work so we can see why they are a candidate for creating callbacks.

A Brief Review of Generators

As you probably know, a Python generator is a “restartable function” that you create by using the yield expression in the body of your function. By doing so, the function becomes a “generator function” that returns an iterator you can use to run the function in a series of steps. Each cycle of the iterator restarts the function, which proceeds to execute until it reaches the next yield.

Generators (and iterators) are often used to represent lazily-created sequences of values. Take a look at the example code in inline-callbacks/gen-1.py:

def my_generator():
    print 'starting up'
    yield 1
    print "workin'"
    yield 2
    print "still workin'"
    yield 3
    print 'done'

for n in my_generator():
    print n

Here we have a generator that creates the sequence 1, 2, 3. If you run the code, you will see the print statements in the generator interleaved with the print statement in the for loop as the loop cycles through the generator.

We can make this code more explicit by creating the generator ourselves (inline-callbacks/gen-2.py):

def my_generator():
    print 'starting up'
    yield 1
    print "workin'"
    yield 2
    print "still workin'"
    yield 3
    print 'done'

gen = my_generator()

while True:
    try:
        n = gen.next()
    except StopIteration:
        break
    else:
        print n

Considered as a sequence, the generator is just an object for getting successive values. But we can also view things from the point of view of the generator itself:

  1. The generator function doesn’t start running until “called” by the loop (using the next method).
  2. Once the generator is running, it keeps running until it “returns” to the loop (using yield).
  3. When the loop is running other code (like the print statement), the generator is not running.
  4. When the generator is running, the loop is not running (it’s “blocked” waiting for the generator).
  5. Once a generator yields control to the loop, an arbitrary amount of time may pass (and an arbitrary amount of other code may execute) until the generator runs again.

This is very much like the way callbacks work in an asynchronous system. We can think of the while loop as the reactor, and the generator as a series of callbacks separated by yield statements, with the interesting fact that all the callbacks share the same local variable namespace, and the namespace persists from one callback to the next.

Furthermore, you can have multiple generators active at once (see the example in inline-callbacks/gen-3.py), with their “callbacks” interleaved with each other, just as you can have independent asynchronous tasks running in a system like Twisted.

Something is still missing, though. Callbacks aren’t just called by the reactor, they also receive information. When part of a deferred’s chain, a callback either receives a result, in the form of a single Python value, or an error, in the form of a Failure.

Starting with Python 2.5, generators were extended in a way that allows you to send information to a generator when you restart it, as illustrated in inline-callbacks/gen-4.py:

class Malfunction(Exception):
    pass

def my_generator():
    print 'starting up'

    val = yield 1
    print 'got:', val

    val = yield 2
    print 'got:', val

    try:
        yield 3
    except Malfunction:
        print 'malfunction!'

    yield 4

    print 'done'

gen = my_generator()

print gen.next() # start the generator
print gen.send(10) # send the value 10
print gen.send(20) # send the value 20
print gen.throw(Malfunction()) # raise an exception inside the generator

try:
    gen.next()
except StopIteration:
    pass

In Python 2.5 and later versions, the yield statement is an expression that evaluates to a value. And the code that restarts the generator can determine that value using the send method instead of next (if you use next the value is None). What’s more, you can actually raise an arbitrary exception inside the generator using the throw method. How cool is that?

Inline Callbacks

Given what we just reviewed about sending and throwing values and exceptions into a generator, we can envision a generator as a series of callbacks, like the ones in a deferred, which receive either results or failures. The callbacks are separated by yields and the value of each yield expression is the result for the next callback (or the yield raises an exception and that’s the failure). Figure 35 shows the correspondence:

Figure 35: generator as a callback sequence
Figure 35: generator as a callback sequence

Now when a series of callbacks is chained together in a deferred, each callback receives the result from the one prior. That’s easy enough to do with a generator — just send the value you got from the previous run of the generator (the value it yielded) the next time you restart it. But that also seems a bit silly. Since the generator computed the value to begin with, why bother sending it back? The generator could just save the value in a variable for the next time it’s needed. So what’s the point?

Recall the fact we learned in Part 13, that the callbacks in a deferred can return deferreds themselves. And when that happens, the outer deferred is paused until the inner deferred fires, and then the next callback (or errback) in the outer deferred’s chain is called with the result (or failure) from the inner deferred.

So imagine that our generator yields a deferred object instead of an ordinary Python value. The generator is now “paused”, and that’s automatic; generators always pause after every yield statement until they are explicitly restarted. So we can delay restarting the generator until the deferred fires, at which point we either send the value (if the deferred succeeds) or throw the exception (if the deferred fails). That would make our generator a genuine sequence of asynchronous callbacks and that’s the idea behind the inlineCallbacks function in twisted.internet.defer.

inlineCallbacks

Consider the example program in inline-callbacks/inline-callbacks-1.py:

from twisted.internet.defer import inlineCallbacks, Deferred

@inlineCallbacks
def my_callbacks():
    from twisted.internet import reactor

    print 'first callback'
    result = yield 1 # yielded values that aren't deferred come right back

    print 'second callback got', result
    d = Deferred()
    reactor.callLater(5, d.callback, 2)
    result = yield d # yielded deferreds will pause the generator

    print 'third callback got', result # the result of the deferred

    d = Deferred()
    reactor.callLater(5, d.errback, Exception(3))

    try:
        yield d
    except Exception, e:
        result = e

    print 'fourth callback got', repr(result) # the exception from the deferred

    reactor.stop()

from twisted.internet import reactor
reactor.callWhenRunning(my_callbacks)
reactor.run()

Run the example and you will see the generator execute to the end and then stop the reactor. The example illustrates several aspects of the inlineCallbacks function. First, inlineCallbacks is a decorator and it always decorates generator functions, i.e., functions that use yield. The whole purpose of inlineCallbacks is turn a generator into a series of asynchronous callbacks according to the scheme we outlined before.

Second, when we invoke an inlineCallbacks-decorated function, we don’t need to call next or send or throw ourselves. The decorator takes care of those details for us and ensures the generator will run to the end (assuming it doesn’t raise an exception).

Third, if we yield a non-deferred value from the generator, it is immediately restarted with that same value as the result of the yield.

And finally, if we yield a deferred from the generator, it will not be restarted until that deferred fires. If the deferred succeeds, the result of the yield is just the result from the deferred. And if the deferred fails, the yield statement raises the exception. Note the exception is just an ordinary Exception object, rather than a Failure, and we can catch it with a try/except statement around the yield expression.

In the example we are just using callLater to fire the deferreds after a short period of time. While that’s a handy way to put in a non-blocking delay into our callback chain, normally we would be yielding a deferred returned by some other asynchronous operation (i.e., get_poetry) invoked from our generator.

Ok, now we know how an inlineCallbacks-decorated function runs, but what return value do you get if you actually call one? As you might have guessed, you get a deferred. Since we can’t know exactly when that generator will stop running (it might yield one or more deferreds), the decorated function itself is asynchronous and a deferred is the appropriate return value. Note the deferred that is returned isn’t one of the deferreds the generator may yield. Rather, it’s a deferred that fires only after the generator has completely finished (or throws an exception).

If the generator throws an exception, the returned deferred will fire its errback chain with that exception wrapped in a Failure. But if we want the generator to return a normal value, we must “return” it using the defer.returnValue function. Like the ordinary return statement, it will also stop the generator (it actually raises a special exception). The inline-callbacks/inline-callbacks-2.py example illustrates both possibilities.

Client 7.0

Let’s put inlineCallbacks to work with a new version of our poetry client. You can see the code in twisted-client-7/get-poetry.py. You may wish to compare it to client 6.0 in twisted-client-6/get-poetry.py. The relevant changes are in poetry_main:

def poetry_main():
    addresses = parse_args()

    xform_addr = addresses.pop(0)

    proxy = TransformProxy(*xform_addr)

    from twisted.internet import reactor

    results = []

    @defer.inlineCallbacks
    def get_transformed_poem(host, port):
        try:
            poem = yield get_poetry(host, port)
        except Exception, e:
            print >>sys.stderr, 'The poem download failed:', e
            raise

        try:
            poem = yield proxy.xform('cummingsify', poem)
        except Exception:
            print >>sys.stderr, 'Cummingsify failed!'

        defer.returnValue(poem)

    def got_poem(poem):
        print poem

    def poem_done(_):
        results.append(_)
        if len(results) == len(addresses):
            reactor.stop()

    for address in addresses:
        host, port = address
        d = get_transformed_poem(host, port)
        d.addCallbacks(got_poem)
        d.addBoth(poem_done)

    reactor.run()

In our new version the inlineCallbacks generator function get_transformed_poem is responsible for both fetching the poem and then applying the transformation (via the transform service). Since both operations are asynchronous, we yield a deferred each time and then (implicitly) wait for the result. As in client 6.0, if the transformation fails we just return the original poem. Notice we can use try/except statements to handle asynchronous errors inside the generator.

We can test the new client out in the same way as before. First start up a transform server:

python twisted-server-1/transformedpoetry.py --port 10001

Then start a couple of poetry servers:

python twisted-server-1/fastpoetry.py --port 10002 poetry/fascination.txt
python twisted-server-1/fastpoetry.py --port 10003 poetry/science.txt

Now you can run the new client:

python twisted-client-7/get-poetry.py 10001 10002 10003

Try turning off one or more of the servers to see how the client handles errors.

Discussion

Like the Deferred object, the inlineCallbacks function gives us a new way of organizing our asynchronous callbacks. And, as with deferreds, inlineCallbacks doesn’t change the rules of the game. Specifically, our callbacks still run one at a time, and they are still invoked by the reactor. We can confirm that fact in our usual way by printing out a traceback from an inline callback, as in the example script inline-callbacks/inline-callbacks-tb.py. Run that code and you will get a traceback with reactor.run() at the top, lots of helper functions in between, and our callback at the bottom.

We can adapt Figure 29, which explains what happens when one callback in a deferred returns another deferred, to show what happens when an inlineCallbacks generator yields a deferred. See Figure 36:

Figure 36: flow control in an inlineCallbacks function
Figure 36: flow control in an inlineCallbacks function

The same figure works in both cases because the idea being illustrated is the same — one asynchronous operation is waiting for another.

Since inlineCallbacks and deferreds solve many of the same problems, why choose one over the other? Here are some potential advantages of inlineCallbacks:

  • Since the callbacks share a namespace, there is no need to pass extra state around.
  • The callback order is easier to see, as they just execute from top to bottom.
  • With no function declarations for individual callbacks and implicit flow-control, there is generally less typing.
  • Errors are handled with the familiar try/except statement.

And here are some potential pitfalls:

  • The callbacks inside the generator cannot be invoked individually, which could make code re-use difficult. With a deferred, the code constructing the deferred is free to add arbitrary callbacks in an arbitrary order.
  • The compact form of a generator can obscure the fact that an asynchronous callback is even involved. Despite its visually similar appearance to an ordinary sequential function, a generator behaves in a very different manner. The inlineCallbacks function is not a way to avoid learning the asynchronous programming model.

As with any technique, practice will provide the experience necessary to make an informed choice.

Summary

In this Part we learned about the inlineCallbacks decorator and how it allows us to express a sequence of asynchronous callbacks in the form of a Python generator.

In Part 18 we will learn a technique for managing a set of “parallel” asynchronous operations.

Suggested Exercises

  1. Why is the inlineCallbacks function plural?
  2. Study the implementation of inlineCallbacks and its helper function _inlineCallbacks. Ponder the phrase “the devil is in the details”.
  3. How many callbacks are contained in a generator with N yield statements, assuming it has no loops or if statements?
  4. Poetry client 7.0 might have three generators running at once. Conceptually, how many different ways might they be interleaved with one another? Considering the way they are invoked in the poetry client and the implementation of inlineCallbacks, how many ways do you think are actually possible?
  5. Move the got_poem callback in client 7.0 inside the generator.
  6. Then move the poem_done callback inside the generator. Be careful! Make sure to handle all the failure cases so the reactor gets shutdown no matter what. How does the resulting code compare to using a deferred to shutdown the reactor?
  7. A generator with yield statements inside a while loop can represent a conceptually infinite sequence. What does such a generator decorated with inlineCallbacks represent?

21 thoughts on “An Introduction to Asynchronous Programming and Twisted”

  1. Hello!

    I have some script which should call a function which is asynchronious and I do it like in example Client 7.0 i.e. I would call in sync script poetry_main() function and the problem is that sync script doesn’t stop after poetry_main() is done. If I interrupt it with ctrl+c I get
    twisted.internet.error.ReactorNotRunning: Can’t stop reactor that isn’t running.

    could you give me some hints what is wrong here?

    Thanks!

      1. Hi Pet, is the asynchronous function you are calling also stopping the reactor? You can only stop the reactor once. Check out client 8.0, which uses DeferredList to wait for multiple asynchronous calls to complete. I think that’s a cleaner solution.

        1. yes it stops reactor after its job is done, basically it downloads a page.

          Sometimes scripts download one page, sometimes more. So it is not possible to call async function which uses reactor twice, if it stops reactor?

          1. That’s right, you can only start the reactor once (so you can only stop it once). So you’ll need to refactor
            your code so the async download function doesn’t stop the reactor (most async functions shouldn’t stop
            the reactor, only very top-level code needs to do that generally).

            Notice in client 7.0 how it keeps track of how many poems it has received and only shuts down the reactor
            after they are all done. That’s the idea you want, but it’s easier to express with a DeferredList (see client 8.0).

  2. Hello Dave,

    actually I’ve wanted offer this function downloadPage(url) as utility, so client code could use it everywhere. With DeferredList I could pass a list of urls and download them, but limitation of calling the function only once remains or I should stop reactor in client code after all tasks are done. Looks like using synchronious function would much better in this case

    Thanks for your help!

    1. You want something like this:

      ds = []
      for url in urls:
      ds.append(downloadPage(url))
      dlist = DeferredList(ds)
      dlist.addCallback(lambda r: reactor.stop())

      Make sense?

      1. Yes, I know the way with deferredList. I think I’d better explain with example code: there are 2 scripts, the first one is executed and use function from second script. script.py doesn’t know that downloader function is asynchronious, that’s my idea :)

        script.py
        from downloader import downloadPage
        def main():

        print ‘calling async function’
        print ‘which will return after its job has been done’
        downloadPage(‘http://www.google.com’, ‘google.txt’)
        print ‘page has been downloaded’
        print ‘now i can do other stuff’
        downloadPage(‘http://www.yahoo.com’, ‘yahoo.txt’)
        print ‘second page has been downloaded, not printed’

        main()

        downloader.py
        from twisted.web.client import getPage
        from twisted.internet import defer

        def _get(url):
        #from twisted.internet import reactor
        #I’m using her reactor fo downloading a page with Agent.request()
        #and saving it to file, but for this example i’m using getPage
        return getPage(url)

        def downloadPage(url, filename):
        def done(_):
        from twisted.internet import reactor
        print ‘stopping reactor’
        reactor.stop()

        @defer.inlineCallbacks
        def _getPage(url, filename):
        page = ”
        try:
        page = yield _get(url)
        except Exception, e:
        print “Exception: “, e
        if page:
        print ‘got page len’, len(page)
        with open(filename, ‘w’) as f:
        f.write(page)
        defer.returnValue(page)
        else:
        print ‘no page sorry’
        defer.returnValue(None)

        d = _getPage(url, filename)
        d.addBoth(done)

        from twisted.internet import reactor
        reactor.run()

          1. Ah, you’re trying to pretend that Twisted is really synchronous :) Yeah, that won’t work, as you have discovered. You can’t
            mix sync and async code in the same Python process, at least not as easily as you are trying to do. Now you might be able
            to run Twisted in a separate thread, I believe that has been done before. Then your main thread can still be synchronous.

    1. An excellent point to keep in mind in general. Since Twisted itself is single-threaded,
      it’s not an issue for the inlineCallbacks use case.

  3. Hello, Mr Dave. Now I am writing an application which uses txmongo(asynchronous mongo api). Ok, I have an @inlinecalback function which sets some values in a collection of mongodb asynchronously. What I exactly do in that function:
    1) I check if it necessary to reset the values in a db: should_be_reset – yields true or false
    2) If the above function yields me true=> I reset it with reset_function which yields ‘OK’
    3) after that I increment necessary value by updating a collection in mongo db.
    So, finally, if collection resets, then collection value should be 1 after incrementing. But sometimes it is 0. Is it possible that case 3 is earlier than case 2 ? I can’t understand, because the generator should sleep while case 2 is executed and then go to case 3. In pseudocode i can represent my function as the following:

    @inlineCallbacks
    def should_be_reset(self, *a, **kw):
    yield True

    @inlineCallbacks
    def reset(self, *a, **kw):
    yield some update operation in mongo

    def my_function(self, …):
    should_reset = yield should_be_reset
    if should_reset:
    yield reset(…)
    yield my increment

    So, is it possible that ‘my increment’ is earlier than ‘reset()’ block ?

    1. Hm, it’s hard to say. I guess this is pseudo-code, right?
      Did you mean to put @inlineCallbacks over the first two
      but not the last one? It’s really the last one that needs it,
      since it is doing multiple non-blocking operations.

      If, say, reset() only does a single mongodb operation then
      it doesn’t really need to be decorated with @inlineCallbacks,
      you can just return the deferred that comes back from the
      txmongo function.

Leave a Reply