Category Archives: Software

An Introduction To Asynchronous Programming and Twisted

Part 21: Lazy is as Lazy Doesn’t: Twisted and Haskell

This continues the introduction started here. You can find an index to the entire series here.

Introduction

In the last Part we compared Twisted with Erlang, giving most of our attention to some ideas they have in common. And that ended up being pretty simple, as asynchronous I/O and reactive programming are key components of the Erlang runtime and process model.

Today we are going to range further afield and look at Haskell, another functional language that is nevertheless quite different from Erlang (and, of course, Python). There won’t be as many parallels, but we will nevertheless find some asynchronous I/O hiding under the covers.

Functional with a Capital F

Although Erlang is also a functional language, its main focus is a reliable concurrency model. Haskell, on the other hand, is functional through and through, making unabashed use of concepts from category theory like functors and monads.

Don’t worry, we’re not going into any of that here (as if we could). Instead we’ll focus on one of Haskell’s more traditionally functional features: laziness. Like many functional languages (but unlike Erlang), Haskell supports lazy evaluation. In a lazily evaluated language the text of a program doesn’t so much describe how to compute something as what to compute. The details of actually performing the computation are generally left to the compiler and runtime system.

And, more to the point, as a lazily-evaluated computation proceeds the runtime may evaluate expressions only partially (lazily) instead of all at once. In general, the runtime will evaluate only as much of an expression as is needed to make progress on the current computation.

Here is a simple Haskell statement applying head, a function that retrieves the first element of a list, to the list [1,2,3] (Haskell and Python share some of their list syntax):

  head [1,2,3]

If you install the GHC Haskell runtime, you can try this out yourself:

[~] ghci
GHCi, version 6.12.1: http://www.haskell.org/ghc/  : ? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Prelude> head [1,2,3]
1
Prelude>

The result is the number 1, as expected.

The Haskell list syntax includes the handy ability to define a list from its first couple of elements. For example, the list [2,4 ..] is the sequence of even numbers starting with 2. Where does it end? Well, it doesn’t. The Haskell list [2,4 ..] and others like it represent (conceptually) infinite lists. You can see this if you try to evaluate one at the interactive Haskell prompt, which will attempt to print out the result of your expression:

Prelude> [2,4 ..]
[2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110,112,114,116,118,120,122,124,126,128,130,132,134,136,138,140,142,144,146,
...

You’ll have to press Ctrl-C to stop that computation as it will never actually terminate. But because of lazy evaluation, it is possible to use these infinite lists in Haskell with no trouble:

Prelude> head [2,4 ..]
2
Prelude> head (tail [2,4 ..])
4
Prelude> head (tail (tail [2,4 ..]))
6

Here we are accessing the first, second, and third elements of this infinite list respectively, with no infinite loop anywhere in sight. This is the essence of lazy evaluation. Instead of first evaluating the entire list (which would cause an infinite loop) and then giving that list to the head function, the Haskell runtime only constructs as much of the list as it needs for head to finish its work. The rest of the list is never constructed at all, because it is not needed to proceed with the computation.

When we bring the tail function into play, Haskell is forced to construct the list further, but again only as much as it needs to evaluate the next step of the computation. And once the computation is done, the (unfinished) list can be discarded.

Here’s some Haskell code that partially consumes three different infinite lists:

Prelude> let x = [1..]
Prelude> let y = [2,4 ..]
Prelude> let z = [3,6 ..]
Prelude> head (tail (tail (zip3 x y z)))
(3,6,9)

Here we zip all the lists together, then grab the head of the tail of the tail. Once again, Haskell has no trouble with this and only constructs as much of each list as it needs to finish evaluating our code. We can visualize the Haskell runtime “consuming” these infinite lists in Figure 46:

Figure 46: Haskell consuming some infinite lists
Figure 46: Haskell consuming some infinite lists

Although we’ve drawn the Haskell runtime as a simple loop, it might be implemented with multiple threads (and probably is if you are using the GHC version of Haskell). But the main point to notice is how this figure looks like a reactor loop consuming bits of data as they come in on network sockets.

You can think of asynchronous I/O and the reactor pattern as a very limited form of lazy evaluation. The asynchronous I/O motto is: “Only process as much data as you have”. And the lazy evaluation motto is: “Only process as much data as you need”. Furthermore, a lazily-evaluated language applies that motto almost everywhere, not just in the limited scope of I/O.

But the point is that, for a lazily-evaluated language, making use of asynchronous I/O is no big deal. The compiler and runtime are already designed to process data structures bit by bit, so lazily processing the incoming chunks of an I/O stream is just par for the course. And thus the Haskell runtime, like the Erlang runtime, simply incorporates asynchronous I/O as part of its socket abstractions. And we can show that by implementing a poetry client in Haskell.

Haskell Poetry

Our first Haskell poetry client is located in haskell-client-1/get-poetry.hs. As with Erlang, we’re going to jump straight to a finished client, and then suggest further reading if you’d like to learn more.

Haskell also supports light-weight threads or processes, though they aren’t as central to Haskell as they are to Erlang, and our Haskell client creates one process for each poem we want to download. The key function there is runTask which connects to a socket and starts the getPoetry function in a light-weight thread.

You’ll notice a lot of type declarations in this code. Haskell, unlike Python or Erlang, is statically typed. We don’t declare types for each and every variable because Haskell will automatically infer types not explicitly declared (or report an error if it can’t). A number of the functions include the IO type (technically a monad) because Haskell requires us to cleanly separate code with side-effects (i.e., code that performs I/O) from pure functions.

The getPoetry function includes this line:

poem <- hGetContents h

which appears to be reading the entire poem from the handle (i.e., the TCP socket) at once. But Haskell, as usual, is lazy. And the Haskell runtime includes one or more actual threads which perform asynchronous I/O in a select loop, thus preserving the possibilities for lazy evaluation of I/O streams.

Just to illustrate that asynchronous I/O is really going on, we have included a “callback” function, gotLine, that prints out some task information for each line in the poem. But it’s not really a callback function at all, and the program would use asynchronous I/O whether we included it or not. Even calling it “gotLine” reflects an imperative-language mindset that is out of place in a Haskell program. No matter, we’ll clean it up in a bit, but let’s take our first Haskell client out for a spin. Start up some slow poetry servers:

python blocking-server/slowpoetry.py --port 10001 poetry/fascination.txt
python blocking-server/slowpoetry.py --port 10002 poetry/science.txt
python blocking-server/slowpoetry.py --port 10003 poetry/ecstasy.txt --num-bytes 30

Now compile the Haskell client:

cd haskell-client-1/
ghc --make get-poetry.hs

This will create a binary called get-poetry. Finally, run the client against our servers:

./get-poetry 10001 10002 1000

And you should see some output like this:

Task 3: got 12 bytes of poetry from localhost:10003
Task 3: got 1 bytes of poetry from localhost:10003
Task 3: got 30 bytes of poetry from localhost:10003
Task 2: got 20 bytes of poetry from localhost:10002
Task 3: got 44 bytes of poetry from localhost:10003
Task 2: got 1 bytes of poetry from localhost:10002
Task 3: got 29 bytes of poetry from localhost:10003
Task 1: got 36 bytes of poetry from localhost:10001
Task 1: got 1 bytes of poetry from localhost:10001
...

The output is slightly different than previous asynchronous clients because we are printing one line for each line of poetry instead of each arbitrary chunk of data. But, as you can see, the client is clearly processing data from all the servers together, instead of one after the other. You’ll also notice that the client prints out the first poem as soon as it’s finished, without waiting for the others, which continue on at their own pace.

Alright, let’s clean the remaining bits of imperative cruft from our client and present a version which just grabs the poetry without bothering with task numbers. You can find it in haskell-client-2/get-poetry.hs. Notice that it’s much shorter and, for each server, just connects to the socket, grabs all the data, and sends it back.

Ok, let’s compile a new client:

cd haskell-client-2/
ghc --make get-poetry.hs

And run it against the same set of poetry servers:

./get-poetry 10001 10002 10003

And you should see the text of each poem appear, eventually, on the screen.

You will notice from the server output that each server is sending data to the client simultaneously. What’s more, the client prints out each line of the first poem as soon as possible, without waiting for the rest of the poem, even while it’s working on the other two. And then it quickly prints out the second poem, which it has been accumulating all along.

And all of that happens without us having to do much of anything. There are no callbacks, no messages being passed back and forth, just a concise description of what we want the program to do, and very little in the way of how it should go about doing it. The rest is taken care of by the Haskell compiler and runtime. Nifty.

Discussion and Further Reading

In moving from Twisted to Erlang to Haskell we can see a parallel movement, from the foreground to the background, of the ideas behind asynchronous programming. In Twisted, asynchronous programming is the central motivating idea behind Twisted’s existence. And Twisted’s implementation as a framework separate from Python (and Python’s lack of core asynchronous abstractions like lightweight threads) keeps the asynchronous model front and center when you write programs using Twisted.

In Erlang, asynchronicity is still very visible to the programmer, but the details are now part of the fabric of the language and runtime system, enabling an abstraction in which asynchronous messages are exchanged between synchronous processes.

And finally, in Haskell, asynchronous I/O is just another technique inside the runtime, largely unseen by the programmer, for providing the lazy evaluation that is one of Haskell’s central ideas.

We don’t have any profound insight into this situation, we’re just pointing out the many and interesting places where the asynchronous model shows up, and the many different ways it can be expressed.

And if any of this has piqued your interest in Haskell, then we can recommend Real World Haskell to continue your studies. The book is a model of what a good language introduction should be. And while I haven’t read it, I’ve heard good things about Learn You a Haskell.

This brings us to the end of our tour of asynchronous systems outside of Twisted, and the penultimate part in our series. In Part 22 we will conclude, and suggest ways to learn more about Twisted.

Suggested Exercises for the Startlingly Motivated

  1. Compare the Twisted, Erlang, and Haskell clients with each other.
  2. Modify the Haskell clients to handle failures to connect to a poetry server so they download all the poetry they can and output reasonable error messages for the poems they can’t.
  3. Write Haskell versions of the poetry servers we made with Twisted.

An Introduction To Asynchronous Programming and Twisted

Part 20: Wheels within Wheels: Twisted and Erlang

This continues the introduction started here. You can find an index to the entire series here.

Introduction

One fact we’ve uncovered in this series is that mixing synchronous “plain Python” code with asynchronous Twisted code is not a straightforward task, since blocking for an indeterminate amount of time in a Twisted program will eliminate many of the benefits you are trying to achieve using the asynchronous model.

If this is your first introduction to asynchronous programming it may seem as if the knowledge you have gained is of somewhat limited applicability. You can use these new techniques inside of Twisted, but not in the much larger world of general Python code. And when working with Twisted, you are generally limited to libraries written specifically for use as part of a Twisted program, at least if you want to call them directly from the thread running the reactor.

But asynchronous programming techniques have been around for quite some time and are hardly confined to Twisted. There are in fact a startling number of asynchronous programming frameworks in Python alone. A bit of searching around will probably yield a couple dozen of them. They differ from Twisted in their details, but the basic ideas (asynchronous I/O, processing data in small chunks across multiple data streams) are the same. So if you need, or choose, to use an alternative framework you will already have a head start having learned Twisted.

And moving outside of Python, there are plenty of other languages and systems that are either based around, or make use of, the asynchronous programming model. Your knowledge of Twisted will continue to serve you as you explore the wider areas of this subject.

In this Part we’re going to take a very brief look at Erlang, a programming language and runtime system that makes extensive use of asynchronous programming concepts, but does so in a unique way. Please note this is not meant as a general introduction to Erlang. Rather, it is a short exploration of some of the ideas embedded in Erlang and how they connect with the ideas in Twisted. The basic theme is the knowledge you have gained learning Twisted can be applied when learning other technologies.

Callbacks Reimagined

Consider Figure 6, a graphical representation of a callback. The principle callback in Poetry Client 3.0, introduced in Part 6, and all subsequent poetry clients is the dataReceived method. That callback is invoked each time we get a bit more poetry from one of the poetry servers we have connected to.

Let’s say our client is downloading three poems from three different servers. Looking at things from the point of view of the reactor (and that’s the viewpoint we’ve emphasized the most in this series), we’ve got a single big loop which makes one or more callbacks each time it goes around. See Figure 40:

Figure 40: callbacks from the reactor viewpoint
Figure 40: callbacks from the reactor viewpoint

This figure shows the reactor happily spinning around, calling dataReceived as the poetry comes in. Each invocation of dataReceived is applied to one particular instance of our PoetryProtocol class. And we know there are three instances because we are downloading three poems (and so there must be three connections).

Let’s think about this picture from the point of view of one of those Protocol instances. Remember each Protocol is only concerned with a single connection (and thus a single poem). That instance “sees” a stream of method calls, each one bearing the next piece of the poem, like this:

dataReceived(self, "When I have fears")
dataReceived(self, " that I may cease to be")
dataReceived(self, "Before my pen has glea")
dataReceived(self, "n'd my teeming brain")
...

While this isn’t strictly speaking an actual Python loop, we can conceptualize it as one:

for data in poetry_stream(): # pseudo-code
    dataReceived(data)

We can envision this “callback loop” in Figure 41:

Figure 41: A virtual callback loop
Figure 41: A virtual callback loop

Again, this is not a for loop or a while loop. The only significant Python loop in our poetry clients is the reactor. But we can think of each Protocol as a virtual loop that ticks around once each time some poetry for that particular poem comes in. With that in mind we can re-imagine the entire client in Figure 42:

Figure 42: the reactor spinning some virtual loops
Figure 42: the reactor spinning some virtual loops

In this figure we have one big loop, the reactor, and three virtual loops, the individual poetry protocol instances. The big loop spins around and, in so doing, causes the virtual loops to tick over as well, like a set of interlocking gears.

Enter Erlang

Erlang, like Python, is a general purpose dynamically typed programming language originally created in the 80’s. Unlike Python, Erlang is functional rather than object-oriented, and has a syntax reminiscent of Prolog, the language in which Erlang was originally implemented. Erlang was designed for building highly reliable distributed telephony systems, and thus Erlang contains extensive networking support.

One of Erlang’s most distinctive features is a concurrency model involving lightweight processes. An Erlang process is neither an operating system process nor an operating system thread. Rather, it is an independently running function inside the Erlang runtime with its own stack. Erlang processes are not lightweight threads because Erlang processes cannot share state (and most data types are immutable anyway, Erlang being a functional programming language). An Erlang process can interact with other Erlang processes only by sending messages, and messages are always, at least conceptually, copied and never shared.

So an Erlang program might look like Figure 43:

Figure 43: An Erlang program with three processes
Figure 43: An Erlang program with three processes

In this figure the individual processes have become “real”, since processes are first-class constructs in Erlang, just like objects are in Python. And the runtime has become “virtual”, not because it isn’t there, but because it’s not necessarily a simple loop. The Erlang runtime may be multi-threaded and, as it has to implement a full-blown programming language, it’s in charge of a lot more than handling asynchronous I/O. Furthermore, a language runtime is not so much an extra construct, like the reactor in Twisted, as the medium in which the Erlang processes and code execute.

So an even better picture of an Erlang program might be Figure 44:

Figure 44: An Erlang program with several processes
Figure 44: An Erlang program with several processes

Of course, the Erlang runtime does have to use asynchronous I/O and one or more select loops, because Erlang allows you to create lots of processes. Large Erlang programs can start tens or hundreds of thousands of Erlang processes, so allocating an actual OS thread to each one is simply out of the question. If Erlang is going to allow multiple processes to perform I/O, and still allow other processes to run even if that I/O blocks, then asynchronous I/O will have to be involved.

Note that our picture of an Erlang program has each process running “under its own power”, rather than being spun around by callbacks. And that is very much the case. With the job of the reactor subsumed into the fabric of the Erlang runtime, the callback no longer has a central role to play. What would, in Twisted, be solved by using a callback would, in Erlang, be solved by sending an asynchronous message from one Erlang process to another.

An Erlang Poetry Client

Let’s look at an Erlang poetry client. We’re going to jump straight to a working version instead of building up slowly like we did with Twisted. Again, this isn’t meant as a complete Erlang introduction. But if it piques your interest, we suggest some more in-depth reading at the end of this Part.

The Erlang client is listed in erlang-client-1/get-poetry. In order to run it you will, of course, need Erlang installed. Here’s the code for the main function, which serves a similar purpose as the main functions in our Python clients:

main([]) ->
    usage();

main(Args) ->
    Addresses = parse_args(Args),
    Main = self(),
    [erlang:spawn_monitor(fun () -> get_poetry(TaskNum, Addr, Main) end)
     || {TaskNum, Addr} <- enumerate(Addresses)],
    collect_poems(length(Addresses), []).

If you’ve never seen Prolog or a similar language before then Erlang syntax is going to seem a little odd. But some people say that about Python, too. The main function is defined by two separate clauses, separated by a semicolon. Erlang chooses which clause to run by matching the arguments, so the first clause only runs if we execute the client without providing any command line arguments, and it just prints out a help message. The second clause is where all the action is.

Individual statements in an Erlang function are separated by commas, and all functions end with a period. Let’s take each line in the second clause one at a time. The first line is just parsing the command line arguments and binding them to a variable (all variables in Erlang must be capitalized). The second line is using the Erlang self function to get the process ID of the currently running Erlang process (not OS process). Since this is the main function you can kind of think of it as the equivalent of the __main__ module in Python. The third line is the most interesting:

[erlang:spawn_monitor(fun () -> get_poetry(TaskNum, Addr, Main) end)
     || {TaskNum, Addr} <- enumerate(Addresses)],

This statement is an Erlang list comprehension, with a syntax similar to that in Python. It is spawning new Erlang processes, one for each poetry server we need to contact. And each process will run the same function (get_poetry) but with different arguments specific to that server. We also pass the PID of the main process so the new processes can send the poetry back (you generally need the PID of a process to send a message to it).

The last statement in main calls the collect_poems function which waits for the poetry to come back and for the get_poetry processes to finish. We’ll look at the other functions in a bit, but first you might compare this Erlang
main function to the equivalent main in one of our Twisted clients.

Now let’s look at the Erlang get_poetry function. There are actually two functions in our script called get_poetry. In Erlang, a function is identified by both name and arity, so our script contains two separate functions, get_poetry/3 and get_poetry/4 which accept three and four arguments respectively. Here’s get_poetry/3, which is spawned by main:

get_poetry(Tasknum, Addr, Main) ->
    {Host, Port} = Addr,
    {ok, Socket} = gen_tcp:connect(Host, Port,
                                   [binary, {active, false}, {packet, 0}]),
    get_poetry(Tasknum, Socket, Main, []).

This function first makes a TCP connection, just like the Twisted client get_poetry. But then, instead of returning, it proceeds to use that TCP connection by calling get_poetry/4, listed below:

get_poetry(Tasknum, Socket, Main, Packets) ->
    case gen_tcp:recv(Socket, 0) of
        {ok, Packet} ->
            io:format("Task ~w: got ~w bytes of poetry from ~s\n",
                      [Tasknum, size(Packet), peername(Socket)]),
            get_poetry(Tasknum, Socket, Main, [Packet|Packets]);
        {error, _} ->
            Main ! {poem, list_to_binary(lists:reverse(Packets))}
    end.

This Erlang function is doing the work of the PoetryProtocol from our Twisted client, except it does so using blocking function calls. The gen_tcp:recv function waits until some data arrives on the socket (or the socket is closed), however long that might be. But a “blocking” function in Erlang only blocks the process running the function, not the entire Erlang runtime. That TCP socket isn’t really a blocking socket (you can’t make a true blocking socket in pure Erlang code). For each of those Erlang sockets there is, somewhere inside the Erlang runtime, a “real” TCP socket set to non-blocking mode and used as part of a select loop.

But the Erlang process doesn’t have to know about any of that. It just waits for some data to arrive and, if it blocks, some other Erlang process can run instead. And even if a process never blocks, the Erlang runtime is free to switch execution from that process to another at any time. In other words, Erlang has a non-cooperative concurrency model.

Notice that get_poetry/4, after receiving a bit of poem, proceeds by recursively calling itself. To an imperative language programmer this might seem like a recipe for running out of memory, but the Erlang compiler can optimize “tail” calls (function calls that are the last statement in a function) into loops. And this highlights another curious parallel between the Erlang and Twisted clients. In the Twisted client, the “virtual” loops are created by the reactor calling the same function (dataReceived) over and over again. And in the Erlang client, the “real” processes running (get_poetry/4) form loops by calling themselves over and over again via tail-call optimization. How about that.

If the connection is closed, the last thing get_poetry does is send the poem to the main process. That also ends the process that get_poetry is running, as there is nothing left for it to do.

The remaining key function in our Erlang client is collect_poems:

collect_poems(0, Poems) ->
    [io:format("~s\n", [P]) || P <- Poems];
collect_poems(N, Poems) ->
    receive
        {'DOWN', _, _, _, _} ->
            collect_poems(N-1, Poems);
        {poem, Poem} ->
            collect_poems(N, [Poem|Poems])
    end.

This function is run by the main process and, like get_poetry, it recursively loops on itself. It also blocks. The receive statement tells the process to wait for a message to arrive that matches one of the given patterns, and
then extract the message from its “mailbox”.

The collect_poems function waits for two kinds of messages: poems and “DOWN” notifications. The latter is a message sent to the main process when one of the get_poetry processes dies for any reason (this is the monitor part of spawn_monitor). By counting DOWN messages, we know when all the poetry has finished. The former is a message from one of the get_poetry processes containing one complete poem.

Ok, let’s take the Erlang client out for a spin. First start up three slow poetry servers:

python blocking-server/slowpoetry.py --port 10001 poetry/fascination.txt
python blocking-server/slowpoetry.py --port 10002 poetry/science.txt
python blocking-server/slowpoetry.py --port 10003 poetry/ecstasy.txt --num-bytes 30

Now we can run the Erlang client, which has a similar command-line syntax as the Python clients. If you are on a Linux or other UNIX-like system, then you should be able to run the client directly (assuming you have Erlang installed and available in your PATH). On Windows you will probably need to run the escript program, with the path to the Erlang client as the first argument (with the remaining arguments for the Erlang client itself).

./erlang-client-1/get-poetry 10001 10002 10003

After that you should see output like this:

Task 3: got 30 bytes of poetry from 127:0:0:1:10003
Task 2: got 10 bytes of poetry from 127:0:0:1:10002
Task 1: got 10 bytes of poetry from 127:0:0:1:10001
...

This is just like one of our earlier Python clients where we print a message for each little bit of poetry we get. When all the poems have finished the client should print out the complete text of each one. Notice the client is switching back and forth between all the servers depending on which one has some poetry to send.

Figure 45 shows the process structure of our Erlang client:

Figure 45: Erlang poetry client
Figure 45: Erlang poetry client

This figure shows three get_poetry processes (one per server) and one main process. You can also see the messages that flow from the poetry processes to main process.

So what happens if one of those servers is down? Let’s try it:

./erlang-client-1/get-poetry 10001 10005

The above command contains one active port (assuming you left all the earlier poetry servers running) and one inactive port (assuming you aren’t running any server on port 10005). And we get some output like this:

Task 1: got 10 bytes of poetry from 127:0:0:1:10001

=ERROR REPORT==== 25-Sep-2010::21:02:10 ===
Error in process <0.33.0> with exit value: {{badmatch,{error,econnrefused}},[{erl_eval,expr,3}]}

Task 1: got 10 bytes of poetry from 127:0:0:1:10001
Task 1: got 10 bytes of poetry from 127:0:0:1:10001
...

And eventually the client finishes downloading the poem from the active server, prints out the poem, and exits. So how did the main function know that both processes were done? That error message is the clue. The error happens when get_poetry tries to connect to the server and gets a connection refused error instead of the expected value ({ok, Socket}). The resulting exception is called badmatch because Erlang “assignment” statements are really pattern-matching operations.

An unhandled exception in an Erlang process causes the process to “crash”, which means the process stops running and all of its resources are garbage collected. But the main process, which is monitoring all of the get_poetry processes, will receive a DOWN message when any of those processes stops running for any reason. And thus our client exits when it should instead of running forever.

Discussion

Let’s take stock of some of the parallels between the Twisted and Erlang clients:

  1. Both clients connect (or try to connect) to all the poetry servers at once.
  2. Both clients receive data from the servers as soon as it comes in, regardless of which server delivers the data.
  3. Both clients process the poetry in little bits, and thus have to save the portion of the poems received thus far.
  4. Both clients create an “object” (either a Python object or an Erlang process) to handle all the work for one particular server.
  5. Both clients have to carefully determine when all the poetry has finished, regardless of whether a particular download succeeded or failed.

And finally, the main functions in both clients asynchronously receive poems and “task done” notifications. In the Twisted client this information is delivered via a Deferred while the Erlang client receives inter-process messages.

Notice how similar both clients are, in both their overall strategy and the structure of their code. The mechanics are a bit different, with objects, deferreds, and callbacks on the one hand and processes and messages on the other. But the high-level mental models of both clients are quite similar, and it’s pretty easy to move from one to the other once you are familiar with both.

Even the reactor pattern reappears in the Erlang client in miniaturized form. Each Erlang process in our poetry client eventually turns into a recursive loop that:

  1. Waits for something to happen (a bit of poetry comes in, a poem is delivered, another process finishes), and
  2. Takes some appropriate action.

You can think of an Erlang program as a big collection of little reactors, each spinning around and occasionally sending a message to another little reactor (which will process that message as just another event).

And if you delve deeper into Erlang you will find callbacks making an appearance. The Erlang gen_server process is a generic reactor loop that you “instantiate” by providing a fixed set of callback functions, a pattern repeated elsewhere in the Erlang system.

So if, having learned Twisted, you ever decide to give Erlang a try I think you will find yourself in familiar mental territory.

Further Reading

In this Part we’ve focused on the similarities between Twisted and Erlang, but there are of course many differences. One particularly unique feature of Erlang is its approach to error handling. A large Erlang program is structured as a tree of processes, with “supervisors” in the higher branches and “workers” in the leaves. And if a worker process crashes, a supervisor process will notice and take some action (typically restarting the failed worker).

If you are interested in learning more Erlang then you are in luck. Several Erlang books have either been published recently, or will be published shortly:

  • Programming Erlang — written by one of Erlang’s inventors. A great introduction to the language.
  • Erlang Programming — this complements the Armstrong book and goes into more detail in several key areas.
  • Erlang and OTP in Action — this hasn’t been published yet, but I am eagerly awaiting my copy. Neither of the first two books really addresses OTP, the Erlang framework for building large apps. Full disclosure: two of the authors are friends of mine.

Well that’s it for Erlang. In the next Part we will look at Haskell, another functional language with a very different feel from either Python or Erlang. Nevertheless, we shall endeavor to find some common ground.

Suggested Exercises for the Highly Motivated

  1. Go through the Erlang and Python clients and identify where they are similar and where they differ. How do they each handle errors (like a failure to connect to a poetry server)?
  2. Simplify the Erlang client so it no longer prints out each bit of poetry that comes in (so you don’t need to keep track of task numbers either).
  3. Modify the Erlang client to measure the time it takes to download each poem.
  4. Modify the Erlang client to print out the poems in the same order as they were given on the command line.
  5. Modify the Erlang client to print out a more readable error message when we can’t connect to a poetry server.
  6. Write Erlang versions of the poetry servers we made with Twisted.

An Introduction To Asynchronous Programming and Twisted

Part 19: I Thought I Wanted It But I Changed My Mind

This continues the introduction started here. You can find an index to the entire series here.

Introduction

Twisted is an ongoing project and the Twisted developers regularly add new features and extend old ones. With the release of Twisted 10.1.0, the developers added a new capability — cancellation — to the Deferred class which we’re going to investigate today.

Asynchronous programming decouples requests from responses and thus raises a new possibility: between asking for the result and getting it back you might decide you don’t want it anymore. Consider the poetry proxy server from Part 14. Here’s how the proxy worked, at least for the first request of a poem:

  1. A request for a poem comes in.
  2. The proxy contacts the real server to get the poem.
  3. Once the poem is complete, send it to the original client.

Which is all well and good, but what if the client hangs up before getting the poem? Maybe they requested the complete text of Paradise Lost and then decided they really wanted a haiku by Kojo. Now our proxy is stuck with downloading the first one and that slow server is going to take a while. Better to close the connection and let the slow server go back to sleep.

Recall Figure 15, a diagram that shows the conceptual flow of control in a synchronous program. In that figure we see function calls going down, and exceptions going back up. If we wanted to cancel a synchronous function call (and this is just hypothetical) the flow control would go in the same direction as the function call, from high-level code to low-level code as in Figure 38:

Figure 38: synchronous program flow, with hypothetical cancellation
Figure 38: synchronous program flow, with hypothetical cancellation

Of course, in a synchronous program that isn’t possible because the high-level code doesn’t even resume running until the low-level operation is finished, at which point there is nothing to cancel. But in an asynchronous program the high-level code gets control of the program before the low-level code is done, which at least raises the possibility of canceling the low-level request before it finishes.

In a Twisted program, the lower-level request is embodied by a Deferred object, which you can think of as a “handle” on the outstanding asynchronous operation. The normal flow of information in a deferred is downward, from low-level code to high-level code, which matches the flow of return information in a synchronous program. Starting in Twisted 10.1.0, high-level code can send information back the other direction — it can tell the low-level code it doesn’t want the result anymore. See Figure 39:

Figure 39: Information flow in a deferred, including cancellation
Figure 39: Information flow in a deferred, including cancellation

Canceling Deferreds

Let’s take a look at a few sample programs to see how canceling deferreds actually works. Note, to run the examples and other code in this Part you will need a version of Twisted 10.1.0 or later. Consider deferred-cancel/defer-cancel-1.py:

from twisted.internet import defer

def callback(res):
    print 'callback got:', res

d = defer.Deferred()
d.addCallback(callback)
d.cancel()
print 'done'

With the new cancellation feature, the Deferred class got a new method called cancel. The example code makes a new deferred, adds a callback, and then cancels the deferred without firing it. Here’s the output:

done
Unhandled error in Deferred:
Traceback (most recent call last):
Failure: twisted.internet.defer.CancelledError:

Ok, so canceling a deferred appears to cause the errback chain to run, and our regular callback is never called at all. Also notice the error is a twisted.internet.defer.CancelledError, a custom Exception that means the deferred was canceled (but keep reading!). Let’s try adding an errback in deferred-cancel/defer-cancel-2.py:

from twisted.internet import defer

def callback(res):
    print 'callback got:', res

def errback(err):
    print 'errback got:', err

d = defer.Deferred()
d.addCallbacks(callback, errback)
d.cancel()
print 'done'

Now we get this output:

errback got: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.defer.CancelledError'>: 
]
done

So we can ‘catch’ the errback from a cancel just like any other deferred failure.

Ok, let’s try firing the deferred and then canceling it, as in deferred-cancel/defer-cancel-3.py:

from twisted.internet import defer

def callback(res):
    print 'callback got:', res

def errback(err):
    print 'errback got:', err

d = defer.Deferred()
d.addCallbacks(callback, errback)
d.callback('result')
d.cancel()
print 'done'

Here we fire the deferred normally with the callback method and then cancel it. Here’s the output:

callback got: result
done

Our callback was invoked (just as we would expect) and then the program finished normally, as if cancel was never called at all. So it seems canceling a deferred has no effect if it has already fired (but keep reading!).

What if we fire the deferred after we cancel it, as in deferred-cancel/defer-cancel-4.py?

from twisted.internet import defer

def callback(res):
    print 'callback got:', res

def errback(err):
    print 'errback got:', err

d = defer.Deferred()
d.addCallbacks(callback, errback)
d.cancel()
d.callback('result')
print 'done'

In that case we get this output:

errback got: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.defer.CancelledError'>: 
]
done

Interesting! That’s the same output as the second example, where we never fired the deferred at all. So if the deferred has been canceled, firing the deferred normally has no effect. But why doesn’t d.callback('result') raise an error, since you’re not supposed to be able to fire a deferred more than once, and the errback chain has clearly run?

Consider Figure 39 again. Firing a deferred with a result or failure is the job of lower-level code, while canceling a deferred is an action taken by higher-level code. Firing the deferred means “Here’s your result”, while canceling a deferred means “I don’t want it any more”. And remember that canceling is a new feature, so most existing Twisted code is not written to handle cancel operations. But the Twisted developers have made it possible for us to cancel any deferred we want to, even if the code we got the deferred from was written before Twisted 10.1.0.

To make that possible, the cancel method actually does two things:

  1. Tell the Deferred object itself that you don’t want the result if it hasn’t shown up yet (i.e, the deferred hasn’t been fired), and thus to ignore any subsequent invocation of callback or errback.
  2. And, optionally, tell the lower-level code that is producing the result to take whatever steps are required to cancel the operation.

Since older Twisted code is going to go ahead and fire that canceled deferred anyway, step #1 ensures our program won’t blow up if we cancel a deferred we got from an older library.

This means we are always free to cancel a deferred, and we’ll be sure not to get the result if it hasn’t arrived (even if it arrives later). But canceling the deferred might not actually cancel the asynchronous operation. Aborting an asynchronous operation requires a context-specific action. You might need to close a network connection, roll back a database transaction, kill a sub-process, et cetera. And since a deferred is just a general-purpose callback organizer, how is it supposed to know what specific action to take when you cancel it? Or, alternatively, how could it forward the cancel request to the lower-level code that created and returned the deferred in the first place? Say it with me now:

I know, with a callback!

Canceling Deferreds, Really

Alright, take a look at deferred-cancel/defer-cancel-5.py:

from twisted.internet import defer

def canceller(d):
    print "I need to cancel this deferred:", d

def callback(res):
    print 'callback got:', res

def errback(err):
    print 'errback got:', err

d = defer.Deferred(canceller) # created by lower-level code
d.addCallbacks(callback, errback) # added by higher-level code
d.cancel()
print 'done'

This code is basically like the second example, except there is a third callback (canceller) that’s passed to the Deferred when we create it, rather than added afterwards. This callback is in charge of performing the context-specific actions required to abort the asynchronous operation (only if the deferred is actually canceled, of course). The canceller callback is necessarily part of the lower-level code that returns the deferred, not the higher-level code that receives the deferred and adds its own callbacks and errbacks.

Running the example produces this output:

I need to cancel this deferred: <Deferred at 0xb7669d2cL>
errback got: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.defer.CancelledError'>: 
]
done

As you can see, the canceller callback is given the deferred whose result we no longer want. That’s where we would take whatever action we need to in order to abort the asynchronous operation. Notice that canceller is invoked before the errback chain fires. In fact, we may choose to fire the deferred ourselves at this point with any result or error of our choice (and thus preempting the CancelledError failure). Both possibilities are illustrated in deferred-cancel/defer-cancel-6.py and deferred-cancel/defer-cancel-7.py.

Let’s do one more simple test before we fire up the reactor. We’ll create a deferred with a canceller callback, fire it normally, and then cancel it. You can see the code in deferred-cancel/defer-cancel-8.py. By examining the output of that script, you can see that canceling a deferred after it has been fired does not invoke the canceller callback. And that’s as we would expect since there’s nothing to cancel.

The examples we’ve looked at so far haven’t had any actual asynchronous operations. Let’s make a simple program that invokes one asynchronous operation, then we’ll figure out how to make that operation cancellable. Consider the code in deferred-cancel/defer-cancel-9.py:

from twisted.internet.defer import Deferred

def send_poem(d):
    print 'Sending poem'
    d.callback('Once upon a midnight dreary')

def get_poem():
    """Return a poem 5 seconds later."""
    from twisted.internet import reactor
    d = Deferred()
    reactor.callLater(5, send_poem, d)
    return d


def got_poem(poem):
    print 'I got a poem:', poem

def poem_error(err):
    print 'get_poem failed:', err

def main():
    from twisted.internet import reactor
    reactor.callLater(10, reactor.stop) # stop the reactor in 10 seconds

    d = get_poem()
    d.addCallbacks(got_poem, poem_error)

    reactor.run()

main()

This example includes a get_poem function that uses the reactor’s callLater method to asynchronously return a poem five seconds after get_poem is called. The main function calls get_poem, adds a callback/errback pair, and then starts up the reactor. We also arrange (again using callLater) to stop the reactor in ten seconds. Normally we would do this by attaching a callback to the deferred, but you’ll see why we do it this way shortly.

Running the example produces this output (after the appropriate delay):

Sending poem
I got a poem: Once upon a midnight dreary

And after ten seconds our little program comes to a stop. Now let’s try canceling that deferred before the poem is sent. We’ll just add this bit of code to cancel the deferred after two seconds (well before the five second delay on the poem itself):

    reactor.callLater(2, d.cancel) # cancel after 2 seconds

The complete program is in deferred-cancel/defer-cancel-10.py, which produces the following output:

get_poem failed: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.defer.CancelledError'>: 
]
Sending poem

This example clearly illustrates that canceling a deferred does not necessarily cancel the underlying asynchronous request. After two seconds we see the output from our errback, printing out the CancelledError as we would expect. But then after five seconds will still see the output from send_poem (but the callback on the deferred doesn’t fire).

At this point we’re just in the same situation as deferred-cancel/defer-cancel-4.py. “Canceling” the deferred causes the eventual result to be ignored, but doesn’t abort the operation in any real sense. As we learned above, to make a truly cancelable deferred we must add a cancel callback when the deferred is created.

What does this new callback need to do? Take a look at the documentation for the callLater method. The return value of callLater is another object, implementing IDelayedCall, with a cancel method we can use to prevent the delayed call from being executed.

That’s pretty simple, and the updated code is in deferred-cancel/defer-cancel-11.py. The relevant changes are all in the get_poem function:

def get_poem():
    """Return a poem 5 seconds later."""

    def canceler(d):
        # They don't want the poem anymore, so cancel the delayed call
        delayed_call.cancel()

        # At this point we have three choices:
        #   1. Do nothing, and the deferred will fire the errback
        #      chain with CancelledError.
        #   2. Fire the errback chain with a different error.
        #   3. Fire the callback chain with an alternative result.

    d = Deferred(canceler)

    from twisted.internet import reactor
    delayed_call = reactor.callLater(5, send_poem, d)

    return d

In this new version, we save the return value from callLater so we can use it in our cancel callback. The only thing our callback needs to do is invoke delayed_call.cancel(). But as we discussed above, we could also choose to fire the deferred ourselves. The latest version of our example produces this output:

get_poem failed: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.defer.CancelledError'>: 
]

As you can see, the deferred is canceled and the asynchronous operation has truly been aborted (i.e., we don’t see the print output from send_poem).

Poetry Proxy 3.0

As we discussed in the Introduction, the poetry proxy server is a good candidate for implementing cancellation, as it allows us to abort the poem download if it turns out that nobody wants it (i.e., the client closes the connection before we send the poem). Version 3.0 of the proxy, located in twisted-server-4/poetry-proxy.py, implements deferred cancellation. The first change is in the PoetryProxyProtocol:

class PoetryProxyProtocol(Protocol):

    def connectionMade(self):
        self.deferred = self.factory.service.get_poem()
        self.deferred.addCallback(self.transport.write)
        self.deferred.addBoth(lambda r: self.transport.loseConnection())

    def connectionLost(self, reason):
        if self.deferred is not None:
            deferred, self.deferred = self.deferred, None
            deferred.cancel() # cancel the deferred if it hasn't fired

You might compare it to the older version. The two main changes are:

  1. Save the deferred we get from get_poem so we can cancel later if we need to.
  2. Cancel the deferred when the connection is closed. Note this also cancels the deferred after we actually get the poem, but as we discovered in the examples, canceling a deferred that has already fired has no effect.

Now we need to make sure that canceling the deferred actually aborts the poem download. For that we need to change the ProxyService:

class ProxyService(object):

    poem = None # the cached poem

    def __init__(self, host, port):
        self.host = host
        self.port = port

    def get_poem(self):
        if self.poem is not None:
            print 'Using cached poem.'
            # return an already-fired deferred
            return succeed(self.poem)

        def canceler(d):
            print 'Canceling poem download.'
            factory.deferred = None
            connector.disconnect()

        print 'Fetching poem from server.'
        deferred = Deferred(canceler)
        deferred.addCallback(self.set_poem)
        factory = PoetryClientFactory(deferred)
        from twisted.internet import reactor
        connector = reactor.connectTCP(self.host, self.port, factory)
        return factory.deferred

    def set_poem(self, poem):
        self.poem = poem
        return poem

Again, you may wish to compare this with the older version. This class has a few more changes:

  1. We save the return value from reactor.connectTCP, an IConnector object. We can use the disconnect method on that object to close the connection.
  2. We create the deferred with a canceler callback. That callback is a closure which uses the connector to close the connection. But first it sets the factory.deferred attribute to None. Otherwise, the factory might fire the deferred with a “connection closed” errback before the deferred itself fires with a CancelledError. Since this deferred was canceled, having the deferred fire with CancelledError seems more explicit.

You might also notice we now create the deferred in the ProxyService instead of the PoetryClientFactory. Since the canceler callback needs to access the IConnector object, the ProxyService ends up being the most convenient place to create the deferred.

And, as in one of our earlier examples, our canceler callback is implemented as a closure. Closures seem to be very useful when implementing cancel callbacks!

Let’s try out our new proxy. First start up a slow server. It needs to be slow so we actually have time to cancel:

python blocking-server/slowpoetry.py --port 10001 poetry/fascination.txt

Now we can start up our proxy (remember you need Twisted 10.1.0):

python twisted-server-4/poetry-proxy.py --port 10000 10001

Now we can start downloading a poem from the proxy using any client, or even just curl:

curl localhost:10000

After a few seconds, press Ctrl-C to stop the client, or the curl process. In the terminal running the proxy you should
see this output:

Fetching poem from server.
Canceling poem download.

And you should see the slow server has stopped printing output for each bit of poem it sends, since our proxy hung up. You can start and stop the client multiple times to verify each download is canceled each time. But if you let the poem run to completion, then the proxy caches the poem and sends it immediately after that.

One More Wrinkle

We said several times above that canceling an already-fired deferred has no effect. Well, that’s not quite true. In Part 13 we learned that the callbacks and errbacks attached to a deferred may return deferreds themselves. And in that case, the original (outer) deferred pauses the execution of its callback chains and waits for the inner deferred to fire (see Figure 28).

Thus, even though a deferred has fired the higher-level code that made the asynchronous request may not have received the result yet, because the callback chain is paused waiting for an inner deferred to finish. So what happens if the higher-level code cancels that outer deferred? In that case the outer deferred does not cancel itself (it has already fired after all); instead, the outer deferred cancels the inner deferred.

So when you cancel a deferred, you might not be canceling the main asynchronous operation, but rather some other asynchronous operation triggered as a result of the first. Whew!

We can illustrate this with one more example. Consider the code in deferred-cancel/defer-cancel-12.py:

from twisted.internet import defer

def cancel_outer(d):
    print "outer cancel callback."

def cancel_inner(d):
    print "inner cancel callback."

def first_outer_callback(res):
    print 'first outer callback, returning inner deferred'
    return inner_d

def second_outer_callback(res):
    print 'second outer callback got:', res

def outer_errback(err):
    print 'outer errback got:', err

outer_d = defer.Deferred(cancel_outer)
inner_d = defer.Deferred(cancel_inner)

outer_d.addCallback(first_outer_callback)
outer_d.addCallbacks(second_outer_callback, outer_errback)

outer_d.callback('result')

# at this point the outer deferred has fired, but is paused
# on the inner deferred.

print 'canceling outer deferred.'
outer_d.cancel()

print 'done'

In this example we create two deferreds, the outer and the inner, and have one of the outer callbacks return the inner deferred. First we fire the outer deferred, and then we cancel it. The example produces this output:

first outer callback, returning inner deferred
canceling outer deferred.
inner cancel callback.
outer errback got: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.defer.CancelledError'>: 
]
done

As you can see, canceling the outer deferred does not cause the outer cancel callback to fire. Instead, it cancels the inner deferred so the inner cancel callback fires, and then outer errback receives the CancelledError (from the inner deferred).

You may wish to stare at that code a while, and try out variations to see how they affect the outcome.

Discussion

Canceling a deferred can be a very useful operation, allowing our programs to avoid work they no longer need to do. And as we have seen, it can be a little bit tricky, too.

One very important fact to keep in mind is that canceling a deferred doesn’t necessarily cancel the underlying asynchronous operation. In fact, as of this writing, most deferreds won’t really “cancel”, since most Twisted code was written prior to Twisted 10.1.0 and hasn’t been updated. This includes many of the APIs in Twisted itself! Check the documentation and/or the source code to find out whether canceling the deferred will truly cancel the request, or simply ignore it.

And the second important fact is that simply returning a deferred from your asynchronous APIs will not necessarily make them cancelable in the complete sense of the word. If you want to implement canceling in your own programs, you should study the Twisted source code to find more examples. Cancellation is a brand new feature so the patterns and best practices are still being worked out.

Looking Ahead

At this point we’ve learned just about everything about Deferreds and the core concepts behind Twisted. Which means there’s not much more to introduce, as the rest of Twisted consists mainly of specific applications, like web programming or asynchronous database access. So in the next couple of Parts we’re going to take a little detour and look at two other systems that use asynchronous I/O to see how some of their ideas relate to the ideas in Twisted. Then, in the final Part, we will wrap up and suggest ways to continue your Twisted education.

Suggested Exercises

  1. Did you know you can spell canceled with one or two els? It’s true. It all depends on what sort of mood you’re in.
  2. Peruse the source code of the Deferred class, paying special attention to the implementation of cancellation.
  3. Search the Twisted 10.10 source code for examples of deferreds with cancel callbacks. Study their implementation.
  4. Make the deferred returned by the get_poetry method of one of our poetry clients cancelable.
  5. Make a reactor-based example that illustrates canceling an outer deferred which is paused on an inner deferred. If you use callLater you will need to choose the delays carefully to ensure the outer deferred is canceled at the right moment.
  6. Find an asynchronous API in Twisted that doesn’t support a true cancel and implement cancellation for it. Submit a patch to the Twisted project. Don’t forget unit tests!

An Introduction to Asynchronous Programming and Twisted

Part 18: Deferreds En Masse

This continues the introduction started here. You can find an index to the entire series here.

Introduction

In the last Part we learned a new way of structuring sequential asynchronous callbacks using a generator. Thus, including deferreds, we now have two techniques for chaining asynchronous operations together.

Sometimes, though, we want to run a group of asynchronous operations in “parallel”. Since Twisted is single-threaded they won’t really run concurrently, but the point is we want to use asynchronous I/O to work on a group of tasks as fast as possible. Our poetry clients, for example, download poems from multiple servers at the same time, rather than one server after another. That was the whole point of using Twisted for getting poetry, after all.

And, as a result, all our poetry clients have had to solve this problem: how do you know when all the asynchronous operations you have started are done? So far we have solved this by collecting our results into a list (like the results list in client 7.0) and checking the length of the list. We have to be careful to collect failures as well as successful results, otherwise a single failure will cause the program to run forever, thinking there’s still work left to do.

As you might expect, Twisted includes an abstraction you can use to solve this problem and we’re going to take a look at it today.

The DeferredList

The DeferredList class allows us to treat a list of deferred objects as a single deferred. That way we can start a bunch of asynchronous operations and get notified only when all of them have finished (regardless of whether they succeeded or failed). Let’s look at some examples.

In deferred-list/deferred-list-1.py you will find this code:

from twisted.internet import defer

def got_results(res):
    print 'We got:', res

print 'Empty List.'
d = defer.DeferredList([])
print 'Adding Callback.'
d.addCallback(got_results)

And if you run it, you will get this output:

Empty List.
Adding Callback.
We got: []

Some things to notice:

  • A DeferredList is created from a Python list. In this case the list is empty, but we’ll soon see that the list elements must all be Deferred objects.
  • A DeferredList is itself a deferred (it inherits from Deferred). That means you can add callbacks and errbacks to it just like you would a regular deferred.
  • In the example above, our callback was fired as soon as we added it, so the DeferredList must have fired right away. We’ll discuss that more in a second.
  • The result of the deferred list was itself a list (empty).

Now look at deferred-list/deferred-list-2.py:

from twisted.internet import defer

def got_results(res):
    print 'We got:', res

print 'One Deferred.'
d1 = defer.Deferred()
d = defer.DeferredList([d1])
print 'Adding Callback.'
d.addCallback(got_results)
print 'Firing d1.'
d1.callback('d1 result')

Now we are creating our DeferredList with a 1-element list containing a single deferred. Here’s the output we get:

One Deferred.
Adding Callback.
Firing d1.
We got: [(True, 'd1 result')]

More things to notice:

  • This time the DeferredList didn’t fire its callback until we fired the deferred in the list.
  • The result is still a list, but now it has one element.
  • The element is a tuple whose second value is the result of the deferred in the list.

Let’s try putting two deferreds in the list (deferred-list/deferred-list-3.py):

from twisted.internet import defer

def got_results(res):
    print 'We got:', res

print 'Two Deferreds.'
d1 = defer.Deferred()
d2 = defer.Deferred()
d = defer.DeferredList([d1, d2])
print 'Adding Callback.'
d.addCallback(got_results)
print 'Firing d1.'
d1.callback('d1 result')
print 'Firing d2.'
d2.callback('d2 result')

And here’s the output:

Two Deferreds.
Adding Callback.
Firing d1.
Firing d2.
We got: [(True, 'd1 result'), (True, 'd2 result')]

At this point it’s pretty clear the result of a DeferredList, at least for the way we’ve been using it, is a list with the same number of elements as the list of deferreds we passed to the constructor. And the elements of that result list contain the results of the original deferreds, at least if the deferreds succeed. That means the DeferredList itself doesn’t fire until all the deferreds in the original list have fired. And a DeferredList created with an empty list fires right away since there aren’t any deferreds to wait for.

What about the order of the results in the final list? Consider deferred-list/deferred-list-4.py:

from twisted.internet import defer

def got_results(res):
    print 'We got:', res

print 'Two Deferreds.'
d1 = defer.Deferred()
d2 = defer.Deferred()
d = defer.DeferredList([d1, d2])
print 'Adding Callback.'
d.addCallback(got_results)
print 'Firing d2.'
d2.callback('d2 result')
print 'Firing d1.'
d1.callback('d1 result')

Now we are firing d2 first and then d1. Note the deferred list is still constructed with d1 and d2 in their original order. Here’s the output:

Two Deferreds.
Adding Callback.
Firing d2.
Firing d1.
We got: [(True, 'd1 result'), (True, 'd2 result')]

The output list has the results in the same order as the original list of deferreds, not the order those deferreds happened to fire in. Which is very nice, because we can easily associate each individual result with the operation that generated it (for example, which poem came from which server).

Alright, what happens if one or more of the deferreds in the list fails? And what are those True values doing there? Let’s try the example in deferred-list/deferred-list-5.py:

from twisted.internet import defer

def got_results(res):
    print 'We got:', res

d1 = defer.Deferred()
d2 = defer.Deferred()
d = defer.DeferredList([d1, d2], consumeErrors=True)
d.addCallback(got_results)
print 'Firing d1.'
d1.callback('d1 result')
print 'Firing d2 with errback.'
d2.errback(Exception('d2 failure'))

Now we are firing d1 with a normal result and d2 with an error. Ignore the consumerErrors option for now, we’ll get back to it. Here’s the output:

Firing d1.
Firing d2 with errback.
We got: [(True, 'd1 result'), (False, <twisted.python.failure.Failure <type 'exceptions.Exception'>>)]

Now the tuple corresponding to d2 has a Failure in slot two, and False in slot one. At this point it should be pretty clear how a DeferredList works (but see the Discussion below):

  • A DeferredList is constructed with a list of deferred objects.
  • A DeferredList is itself a deferred whose result is a list of the same length as the list of deferreds.
  • The DeferredList fires after all the deferreds in the original list have fired.
  • Each element of the result list corresponds to the deferred in the same position as the original list. If that deferred succeeded, the element is (True, result) and if the deferred failed, the element is (False, failure).
  • A DeferredList never fails, since the result of each individual deferred is collected into the list no matter what (but again, see the Discussion below).

Now let’s talk about that consumeErrors option we passed to the DeferredList. If we run the same code but without passing the option (deferred-list/deferred-list-6.py), we get this output:

Firing d1.
Firing d2 with errback.
We got: [(True, 'd1 result'), (False, >twisted.python.failure.Failure >type 'exceptions.Exception'<<)]
Unhandled error in Deferred:
Traceback (most recent call last):
Failure: exceptions.Exception: d2 failure

If you recall, the “Unhandled error in Deferred” message is generated when a deferred is garbage collected and the last callback in that deferred failed. The message is telling us we haven’t caught all the potential asynchronous failures in our program. So where is it coming from in our example? It’s clearly not coming from the DeferredList, since that succeeds. So it must be coming from d2.

A DeferredList needs to know when each deferred it is monitoring fires. And the DeferredList does that in the usual way — by adding a callback and errback to each deferred. And by default, the callback (and errback) return the original result (or failure) after putting it in the final list. And since returning the original failure from the errback triggers the next errback, d2 remains in the failed state after it fires.

But if we pass consumeErrors=True to the DeferredList, the errback added by the DeferredList to each deferred will instead return None, thus “consuming” the error and eliminating the warning message. We could also handle the error by adding our own errback to d2, as in deferred-list/deferred-list-7.py.

Client 8.0

Version 8.0 of our Get Poetry Now! client uses a DeferredList to find out when all the poetry has finished (or failed). You can find the new client in twisted-client-8/get-poetry.py. Once again the only change is in poetry_main. Let’s look at the important changes:

    ...
    ds = []

    for (host, port) in addresses:
        d = get_transformed_poem(host, port)
        d.addCallbacks(got_poem)
        ds.append(d)

    dlist = defer.DeferredList(ds, consumeErrors=True)
    dlist.addCallback(lambda res : reactor.stop())

You may wish to compare it to the same section of client 7.0.

In client 8.0, we don’t need the poem_done callback or the results list. Instead, we put each deferred we get back from get_transformed_poem into a list (ds) and then create a DeferredList. Since the DeferredList won’t fire until all the poems have finished or failed, we just add a callback to the DeferredList to shutdown the reactor. In this case, we aren’t using the result from the DeferredList, we just need to know when everything is finished. And that’s it!

Discussion

We can visualize how a DeferredList works in Figure 37:

Figure 37: the result of a DeferredList
Figure 37: the result of a DeferredList

Pretty simple, really. There are a couple options to DeferredList we haven’t covered, and which change the behavior from what we have described above. We will leave them for you to explore in the Exercises below.

In the next Part we will cover one more feature of the Deferred class, a feature recently introduced in Twisted 10.1.0.

Suggested Exercises

  1. Read the source code for the DeferredList.
  2. Modify the examples in deferred-list to experiment with the optional constructor arguments fireOnOneCallback and fireOnOneErrback. Come up with scenarios where you would use one or the other (or both).
  3. Can you create a DeferredList using a list of DeferredLists? If so, what would the result look like?
  4. Modify client 8.0 so that it doesn’t print out anything until all the poems have finished downloading. This time you will use the result from the DeferredList.
  5. Define the semantics of a DeferredDict and then implement it.

An Introduction to Asynchronous Programming and Twisted

Part 17: Just Another Way to Spell “Callback”

This continues the introduction started here. You can find an index to the entire series here.

Introduction

In this Part we’re going to return to the subject of callbacks. We’ll introduce another technique for writing callbacks in Twisted that uses generators. We’ll show how the technique works and contrast it with using “pure” Deferreds. Finally we’ll rewrite one of our poetry clients using this technique. But first let’s review how generators work so we can see why they are a candidate for creating callbacks.

A Brief Review of Generators

As you probably know, a Python generator is a “restartable function” that you create by using the yield expression in the body of your function. By doing so, the function becomes a “generator function” that returns an iterator you can use to run the function in a series of steps. Each cycle of the iterator restarts the function, which proceeds to execute until it reaches the next yield.

Generators (and iterators) are often used to represent lazily-created sequences of values. Take a look at the example code in inline-callbacks/gen-1.py:

def my_generator():
    print 'starting up'
    yield 1
    print "workin'"
    yield 2
    print "still workin'"
    yield 3
    print 'done'

for n in my_generator():
    print n

Here we have a generator that creates the sequence 1, 2, 3. If you run the code, you will see the print statements in the generator interleaved with the print statement in the for loop as the loop cycles through the generator.

We can make this code more explicit by creating the generator ourselves (inline-callbacks/gen-2.py):

def my_generator():
    print 'starting up'
    yield 1
    print "workin'"
    yield 2
    print "still workin'"
    yield 3
    print 'done'

gen = my_generator()

while True:
    try:
        n = gen.next()
    except StopIteration:
        break
    else:
        print n

Considered as a sequence, the generator is just an object for getting successive values. But we can also view things from the point of view of the generator itself:

  1. The generator function doesn’t start running until “called” by the loop (using the next method).
  2. Once the generator is running, it keeps running until it “returns” to the loop (using yield).
  3. When the loop is running other code (like the print statement), the generator is not running.
  4. When the generator is running, the loop is not running (it’s “blocked” waiting for the generator).
  5. Once a generator yields control to the loop, an arbitrary amount of time may pass (and an arbitrary amount of other code may execute) until the generator runs again.

This is very much like the way callbacks work in an asynchronous system. We can think of the while loop as the reactor, and the generator as a series of callbacks separated by yield statements, with the interesting fact that all the callbacks share the same local variable namespace, and the namespace persists from one callback to the next.

Furthermore, you can have multiple generators active at once (see the example in inline-callbacks/gen-3.py), with their “callbacks” interleaved with each other, just as you can have independent asynchronous tasks running in a system like Twisted.

Something is still missing, though. Callbacks aren’t just called by the reactor, they also receive information. When part of a deferred’s chain, a callback either receives a result, in the form of a single Python value, or an error, in the form of a Failure.

Starting with Python 2.5, generators were extended in a way that allows you to send information to a generator when you restart it, as illustrated in inline-callbacks/gen-4.py:

class Malfunction(Exception):
    pass

def my_generator():
    print 'starting up'

    val = yield 1
    print 'got:', val

    val = yield 2
    print 'got:', val

    try:
        yield 3
    except Malfunction:
        print 'malfunction!'

    yield 4

    print 'done'

gen = my_generator()

print gen.next() # start the generator
print gen.send(10) # send the value 10
print gen.send(20) # send the value 20
print gen.throw(Malfunction()) # raise an exception inside the generator

try:
    gen.next()
except StopIteration:
    pass

In Python 2.5 and later versions, the yield statement is an expression that evaluates to a value. And the code that restarts the generator can determine that value using the send method instead of next (if you use next the value is None). What’s more, you can actually raise an arbitrary exception inside the generator using the throw method. How cool is that?

Inline Callbacks

Given what we just reviewed about sending and throwing values and exceptions into a generator, we can envision a generator as a series of callbacks, like the ones in a deferred, which receive either results or failures. The callbacks are separated by yields and the value of each yield expression is the result for the next callback (or the yield raises an exception and that’s the failure). Figure 35 shows the correspondence:

Figure 35: generator as a callback sequence
Figure 35: generator as a callback sequence

Now when a series of callbacks is chained together in a deferred, each callback receives the result from the one prior. That’s easy enough to do with a generator — just send the value you got from the previous run of the generator (the value it yielded) the next time you restart it. But that also seems a bit silly. Since the generator computed the value to begin with, why bother sending it back? The generator could just save the value in a variable for the next time it’s needed. So what’s the point?

Recall the fact we learned in Part 13, that the callbacks in a deferred can return deferreds themselves. And when that happens, the outer deferred is paused until the inner deferred fires, and then the next callback (or errback) in the outer deferred’s chain is called with the result (or failure) from the inner deferred.

So imagine that our generator yields a deferred object instead of an ordinary Python value. The generator is now “paused”, and that’s automatic; generators always pause after every yield statement until they are explicitly restarted. So we can delay restarting the generator until the deferred fires, at which point we either send the value (if the deferred succeeds) or throw the exception (if the deferred fails). That would make our generator a genuine sequence of asynchronous callbacks and that’s the idea behind the inlineCallbacks function in twisted.internet.defer.

inlineCallbacks

Consider the example program in inline-callbacks/inline-callbacks-1.py:

from twisted.internet.defer import inlineCallbacks, Deferred

@inlineCallbacks
def my_callbacks():
    from twisted.internet import reactor

    print 'first callback'
    result = yield 1 # yielded values that aren't deferred come right back

    print 'second callback got', result
    d = Deferred()
    reactor.callLater(5, d.callback, 2)
    result = yield d # yielded deferreds will pause the generator

    print 'third callback got', result # the result of the deferred

    d = Deferred()
    reactor.callLater(5, d.errback, Exception(3))

    try:
        yield d
    except Exception, e:
        result = e

    print 'fourth callback got', repr(result) # the exception from the deferred

    reactor.stop()

from twisted.internet import reactor
reactor.callWhenRunning(my_callbacks)
reactor.run()

Run the example and you will see the generator execute to the end and then stop the reactor. The example illustrates several aspects of the inlineCallbacks function. First, inlineCallbacks is a decorator and it always decorates generator functions, i.e., functions that use yield. The whole purpose of inlineCallbacks is turn a generator into a series of asynchronous callbacks according to the scheme we outlined before.

Second, when we invoke an inlineCallbacks-decorated function, we don’t need to call next or send or throw ourselves. The decorator takes care of those details for us and ensures the generator will run to the end (assuming it doesn’t raise an exception).

Third, if we yield a non-deferred value from the generator, it is immediately restarted with that same value as the result of the yield.

And finally, if we yield a deferred from the generator, it will not be restarted until that deferred fires. If the deferred succeeds, the result of the yield is just the result from the deferred. And if the deferred fails, the yield statement raises the exception. Note the exception is just an ordinary Exception object, rather than a Failure, and we can catch it with a try/except statement around the yield expression.

In the example we are just using callLater to fire the deferreds after a short period of time. While that’s a handy way to put in a non-blocking delay into our callback chain, normally we would be yielding a deferred returned by some other asynchronous operation (i.e., get_poetry) invoked from our generator.

Ok, now we know how an inlineCallbacks-decorated function runs, but what return value do you get if you actually call one? As you might have guessed, you get a deferred. Since we can’t know exactly when that generator will stop running (it might yield one or more deferreds), the decorated function itself is asynchronous and a deferred is the appropriate return value. Note the deferred that is returned isn’t one of the deferreds the generator may yield. Rather, it’s a deferred that fires only after the generator has completely finished (or throws an exception).

If the generator throws an exception, the returned deferred will fire its errback chain with that exception wrapped in a Failure. But if we want the generator to return a normal value, we must “return” it using the defer.returnValue function. Like the ordinary return statement, it will also stop the generator (it actually raises a special exception). The inline-callbacks/inline-callbacks-2.py example illustrates both possibilities.

Client 7.0

Let’s put inlineCallbacks to work with a new version of our poetry client. You can see the code in twisted-client-7/get-poetry.py. You may wish to compare it to client 6.0 in twisted-client-6/get-poetry.py. The relevant changes are in poetry_main:

def poetry_main():
    addresses = parse_args()

    xform_addr = addresses.pop(0)

    proxy = TransformProxy(*xform_addr)

    from twisted.internet import reactor

    results = []

    @defer.inlineCallbacks
    def get_transformed_poem(host, port):
        try:
            poem = yield get_poetry(host, port)
        except Exception, e:
            print >>sys.stderr, 'The poem download failed:', e
            raise

        try:
            poem = yield proxy.xform('cummingsify', poem)
        except Exception:
            print >>sys.stderr, 'Cummingsify failed!'

        defer.returnValue(poem)

    def got_poem(poem):
        print poem

    def poem_done(_):
        results.append(_)
        if len(results) == len(addresses):
            reactor.stop()

    for address in addresses:
        host, port = address
        d = get_transformed_poem(host, port)
        d.addCallbacks(got_poem)
        d.addBoth(poem_done)

    reactor.run()

In our new version the inlineCallbacks generator function get_transformed_poem is responsible for both fetching the poem and then applying the transformation (via the transform service). Since both operations are asynchronous, we yield a deferred each time and then (implicitly) wait for the result. As in client 6.0, if the transformation fails we just return the original poem. Notice we can use try/except statements to handle asynchronous errors inside the generator.

We can test the new client out in the same way as before. First start up a transform server:

python twisted-server-1/transformedpoetry.py --port 10001

Then start a couple of poetry servers:

python twisted-server-1/fastpoetry.py --port 10002 poetry/fascination.txt
python twisted-server-1/fastpoetry.py --port 10003 poetry/science.txt

Now you can run the new client:

python twisted-client-7/get-poetry.py 10001 10002 10003

Try turning off one or more of the servers to see how the client handles errors.

Discussion

Like the Deferred object, the inlineCallbacks function gives us a new way of organizing our asynchronous callbacks. And, as with deferreds, inlineCallbacks doesn’t change the rules of the game. Specifically, our callbacks still run one at a time, and they are still invoked by the reactor. We can confirm that fact in our usual way by printing out a traceback from an inline callback, as in the example script inline-callbacks/inline-callbacks-tb.py. Run that code and you will get a traceback with reactor.run() at the top, lots of helper functions in between, and our callback at the bottom.

We can adapt Figure 29, which explains what happens when one callback in a deferred returns another deferred, to show what happens when an inlineCallbacks generator yields a deferred. See Figure 36:

Figure 36: flow control in an inlineCallbacks function
Figure 36: flow control in an inlineCallbacks function

The same figure works in both cases because the idea being illustrated is the same — one asynchronous operation is waiting for another.

Since inlineCallbacks and deferreds solve many of the same problems, why choose one over the other? Here are some potential advantages of inlineCallbacks:

  • Since the callbacks share a namespace, there is no need to pass extra state around.
  • The callback order is easier to see, as they just execute from top to bottom.
  • With no function declarations for individual callbacks and implicit flow-control, there is generally less typing.
  • Errors are handled with the familiar try/except statement.

And here are some potential pitfalls:

  • The callbacks inside the generator cannot be invoked individually, which could make code re-use difficult. With a deferred, the code constructing the deferred is free to add arbitrary callbacks in an arbitrary order.
  • The compact form of a generator can obscure the fact that an asynchronous callback is even involved. Despite its visually similar appearance to an ordinary sequential function, a generator behaves in a very different manner. The inlineCallbacks function is not a way to avoid learning the asynchronous programming model.

As with any technique, practice will provide the experience necessary to make an informed choice.

Summary

In this Part we learned about the inlineCallbacks decorator and how it allows us to express a sequence of asynchronous callbacks in the form of a Python generator.

In Part 18 we will learn a technique for managing a set of “parallel” asynchronous operations.

Suggested Exercises

  1. Why is the inlineCallbacks function plural?
  2. Study the implementation of inlineCallbacks and its helper function _inlineCallbacks. Ponder the phrase “the devil is in the details”.
  3. How many callbacks are contained in a generator with N yield statements, assuming it has no loops or if statements?
  4. Poetry client 7.0 might have three generators running at once. Conceptually, how many different ways might they be interleaved with one another? Considering the way they are invoked in the poetry client and the implementation of inlineCallbacks, how many ways do you think are actually possible?
  5. Move the got_poem callback in client 7.0 inside the generator.
  6. Then move the poem_done callback inside the generator. Be careful! Make sure to handle all the failure cases so the reactor gets shutdown no matter what. How does the resulting code compare to using a deferred to shutdown the reactor?
  7. A generator with yield statements inside a while loop can represent a conceptually infinite sequence. What does such a generator decorated with inlineCallbacks represent?

An Introduction to Asynchronous Programming and Twisted

Part 16: Twisted Daemonologie

This continues the introduction started here. You can find an index to the entire series here.

Introduction

The servers we have written so far have just run in a terminal window, with output going to the screen via print statements. This works alright for development, but it’s hardly a way to deploy services in production. A well-behaved production server ought to:

  1. Run as a daemon process, unconnected with any terminal or user session. You don’t want a service to shut down just because the administrator logs out.
  2. Send debugging and error output to a set of rotated log files, or to the syslog service.
  3. Drop excessive privileges, e.g., switching to a lower-privileged user before running.
  4. Record its pid in a file so that the administrator can easily send signals to the daemon.

We can get all of those features using the twistd script provided by Twisted. But first we’ll have to change our code a bit.

The Concepts

Understanding twistd will require learning a few new concepts in Twisted, the most important being a Service. As usual, several of the new concepts are accompanied by new Interfaces.

IService

The IService interface defines a named service that can be started and stopped. What does the service do? Whatever you like — rather than define the specific function of the service, the interface requires only that it provide a small set of generic attributes and methods.

There are two required attributes: name and running. The name attribute is just a string, like 'fastpoetry', or None if you don’t want to give your service a name. The running attribute is a Boolean value and is true if the service has been successfully started.

We’re only going to touch on some of the methods of IService. We’ll skip some that are obvious, and others that are more advanced and often go unused in simpler Twisted programs. The two principle methods of IService are startService and stopService:

    def startService():
        """
        Start the service.
        """

    def stopService():
        """
        Stop the service.

        @rtype: L{Deferred}
        @return: a L{Deferred} which is triggered when the service has
            finished shutting down. If shutting down is immediate, a
            value can be returned (usually, C{None}).
        """

Again, what these methods actually do will depend on the service in question. For example, the startService method might:

  • Load some configuration data, or
  • Initialize a database, or
  • Start listening on a port, or
  • Do nothing at all.

And the stopService method might:

  • Persist some state, or
  • Close open database connections, or
  • Stop listening on a port, or
  • Do nothing at all.

When we write our own custom services we’ll need to implement these methods appropriately. For some common behaviors, like listening on a port, Twisted provides ready-made services we can use instead.

Notice that stopService may optionally return a deferred, which is required to fire when the service has completely shut down. This allows our services to finish cleaning up after themselves before the entire application terminates. If your service shuts down immediately you can just return None instead of a deferred.

Services can be organized into collections that get started and stopped together. The last IService method we’re going to look at, setServiceParent, adds a Service to a collection:

    def setServiceParent(parent):
        """
        Set the parent of the service.

        @type parent: L{IServiceCollection}
        @raise RuntimeError: Raised if the service already has a parent
            or if the service has a name and the parent already has a child
            by that name.
        """

Any service can have a parent, which means services can be organized in a hierarchy. And that brings us to the next Interface we’re going to look at today.

IServiceCollection

The IServiceCollection interface defines an object which can contain IService objects. A service collection is a just plain container class with methods to:

Note that an implementation of IServiceCollection isn’t automatically an implementation of IService, but there’s no reason why one class can’t implement both interfaces (and we’ll see an example of that shortly).

Application

A Twisted Application is not defined by a separate interface. Rather, an Application object is required to implement both IService and IServiceCollection, as well as a few other interfaces we aren’t going to cover.

An Application is the top-level service that represents your entire Twisted application. All the other services in your daemon will be children (or grandchildren, etc.) of the Application object.

It is rare to actually implement your own Application. Twisted provides an implementation that we’ll use today.

Twisted Logging

Twisted includes its own logging infrastructure in the module twisted.python.log. The basic API for writing to the log is simple, so we’ll just include a short example located in basic-twisted/log.py, and you can skim the Twisted module for details if you are interested.

We won’t bother showing the API for installing logging handlers, since twistd will do that for us.

FastPoetry 2.0

Alright, let’s look at some code. We’ve updated the fast poetry server to run with twistd. The source is located in twisted-server-3/fastpoetry.py. First we have the poetry protocol:

class PoetryProtocol(Protocol):

    def connectionMade(self):
        poem = self.factory.service.poem
        log.msg('sending %d bytes of poetry to %s'
                % (len(poem), self.transport.getPeer()))
        self.transport.write(poem)
        self.transport.loseConnection()

Notice instead of using a print statement, we’re using the twisted.python.log.msg function to record each new connection.
Here’s the factory class:

class PoetryFactory(ServerFactory):

    protocol = PoetryProtocol

    def __init__(self, service):
        self.service = service

As you can see, the poem is no longer stored on the factory, but on a service object referenced by the factory. Notice how the protocol gets the poem from the service via the factory. Finally, here’s the service class itself:

class PoetryService(service.Service):

    def __init__(self, poetry_file):
        self.poetry_file = poetry_file

    def startService(self):
        service.Service.startService(self)
        self.poem = open(self.poetry_file).read()
        log.msg('loaded a poem from: %s' % (self.poetry_file,))

As with many other Interface classes, Twisted provides a base class we can use to make our own implementations, with helpful default behaviors. Here we use the twisted.application.service.Service class to implement our PoetryService.

The base class provides default implementations of all required methods, so we only need to implement the ones with custom behavior. In this case, we just override startService to load the poetry file. Note we still call the base class method (which sets the running attribute for us).

Another point is worth mentioning. The PoetryService object doesn’t know anything about the details of the PoetryProtocol. The service’s only job is to load the poem and provide access to it for any object that might need it. In other words, the PoetryService is entirely concerned with the higher-level details of providing poetry, rather than the lower-level details of sending a poem down a TCP connection. So this same service could be used by another protocol, say UDP or XML-RPC. While the benefit is rather small for our simple service, you can imagine the advantage for a more realistic service implementation.

If this were a typical Twisted program, all the code we’ve looked at so far wouldn’t actually be in this file. Rather, it would be in some other module(s) (perhaps fastpoetry.protocol and fastpoetry.service). But following our usual practice of making these examples self-contained, we’ve including everything we need in a single script.

Twisted tac files

The rest of the script contains what would normally be the entire content — a Twisted tac file. A tac file is a Twisted Application Configuration file that tells twistd how to construct an application. As a configuration file it is responsible for choosing settings (like port numbers, poetry file locations, etc.) to run the application in some particular way. In other words, a tac file represents a specific deployment of our service (serve that poem on this port) rather than a general script for starting any poetry server.

If we were running multiple poetry servers on the same host, we would have a tac file for each one (so you can see why tac files normally don’t contain any general-purpose code). In our example, the tac file is configured to serve poetry/ecstasy.txt run on port 10000 of the loopback interface:

# configuration parameters
port = 10000
iface = 'localhost'
poetry_file = 'poetry/ecstasy.txt'

Note that twistd doesn’t know anything about these particular variables, we just define them here to keep all our configuration values in one place. In fact, twistd only really cares about one variable in the entire file, as we’ll see shortly. Next we begin building up our application:

# this will hold the services that combine to form the poetry server
top_service = service.MultiService()

Our poetry server is going to consist of two services, the PoetryService we defined above, and a Twisted built-in service that creates the listening socket our poem will be served from. Since these two services are clearly related to each other, we’ll group them together using a MultiService, a Twisted class which implements both IService and IServiceCollection.

As a service collection, the MultiService will group our two poetry services together. And as a service, the MultiService will start both child services when the MultiService itself is started, and stop both child services when it is stopped. Let’s add the first poetry service to the collection:

# the poetry service holds the poem. it will load the poem when it is
# started
poetry_service = PoetryService(poetry_file)
poetry_service.setServiceParent(top_service)

This is pretty simple stuff. We just create the PoetryService and then add it to the collection with setServiceParent, a method we inherited from the Twisted base class. Next we add the TCP listener:

# the tcp service connects the factory to a listening socket. it will
# create the listening socket when it is started
factory = PoetryFactory(poetry_service)
tcp_service = internet.TCPServer(port, factory, interface=iface)
tcp_service.setServiceParent(top_service)

Twisted provides the TCPServer service for creating a TCP listening socket connected to an arbitrary factory (in this case our PoetryFactory). We don’t call reactor.listenTCP directly because the job of a tac file is to get our application ready to start, without actually starting it. The TCPServer will create the socket after it is started by twistd.

You might have noticed we didn’t bother to give any of our services names. Naming services is not required, but only an optional feature you can use if you want to ‘look up’ services at runtime. Since we don’t need to do that in our little application, we don’t bother with it here.

Ok, now we’ve got both our services combined into a collection. Now we just make our Application and add our collection to it:

# this variable has to be named 'application'
application = service.Application("fastpoetry")

# this hooks the collection we made to the application
top_service.setServiceParent(application)

The only variable in this script that twistd really cares about is the application variable. That is how twistd will find the application it’s supposed to start (and so the variable has to be named ‘application’). And when the application is started, all the services we added to it will be started as well.

Figure 34 shows the structure of the application we just built:

Figure 34: the structure of our fastpoetry application
Figure 34: the structure of our fastpoetry application

Running the Server

Let’s take our new server for a spin. As a tac file, we need to start it with twistd. Of course, it’s also just a regular Python file, too. So let’s run it with Python first and see what happens:

python twisted-server-3/fastpoetry.py

If you do this, you’ll find that what happens is nothing! As we said before, the job of a tac file is to get an application ready to run, without actually running it. As a reminder of this special purpose of tac files, some people name them with a .tac extension instead of .py. But the twistd script doesn’t actually care about the extension.

Let’s run our server for real, using twistd:

twistd --nodaemon --python twisted-server-3/fastpoetry.py

After running that command, you should see some output like this:

2010-06-23 20:57:14-0700 [-] Log opened.
2010-06-23 20:57:14-0700 [-] twistd 10.0.0 (/usr/bin/python 2.6.5) starting up.
2010-06-23 20:57:14-0700 [-] reactor class: twisted.internet.selectreactor.SelectReactor.
2010-06-23 20:57:14-0700 [-] __builtin__.PoetryFactory starting on 10000
2010-06-23 20:57:14-0700 [-] Starting factory <__builtin__.PoetryFactory instance at 0x14ae8c0>
2010-06-23 20:57:14-0700 [-] loaded a poem from: poetry/ecstasy.txt

Here’s a few things to notice:

  1. You can see the output of the Twisted logging system, including the PoetryFactory‘s call to log.msg. But we didn’t install a logger in our tac file, so twistd must have installed one for us.
  2. You can also see our two main services, the PoetryService and the TCPServer starting up.
  3. The shell prompt never came back. That means our server isn’t running as a daemon. By default, twistd does run a server as a daemon process (that’s the main reason twistd exists), but if you include the --nodaemon option then twistd will run your server as a regular shell process instead, and will direct the log output to standard output as well. This is useful for debugging your tac files.

Now test out the server by fetching a poem, either with one of our poetry clients or just netcat:

netcat localhost 10000

That should fetch the poem from the server and you should see a new log line like this:

2010-06-27 22:17:39-0700 [__builtin__.PoetryFactory] sending 3003 bytes of poetry to IPv4Address(TCP, '127.0.0.1', 58208)

That’s from the call to log.msg in PoetryProtocol.connectionMade. As you make more requests to the server, you will see additional log entries for each request.

Now stop the server by pressing Ctrl-C. You should see some output like this:

^C2010-06-29 21:32:59-0700 [-] Received SIGINT, shutting down.
2010-06-29 21:32:59-0700 [-] (Port 10000 Closed)
2010-06-29 21:32:59-0700 [-] Stopping factory <__builtin__.PoetryFactory instance at 0x28d38c0>
2010-06-29 21:32:59-0700 [-] Main loop terminated.
2010-06-29 21:32:59-0700 [-] Server Shut Down.

As you can see, Twisted does not simply crash, but shuts itself down cleanly and tells you about it with log messages. Notice our two main services shutting themselves down as well.

Ok, now start the server up once more:

twistd --nodaemon --python twisted-server-3/fastpoetry.py

Then open another shell and change to the twisted-intro directory. A directory listing should show a file called twistd.pid. This file is created by twistd and contains the process ID of our running server. Try executing this alternative command to shut down the server:

kill `cat twistd.pid`

Notice that twistd cleans up the process ID file when our server shuts down.

A Real Daemon

Now let’s start our server as an actual daemon process, which is even simpler to do as it’s twistd‘s default behavior:

twistd --python twisted-server-3/fastpoetry.py

This time we get our shell prompt back almost immediately. And if you list the contents of your directory you will see, in addition to the twistd.pid file for the server we just ran, a twistd.log file with the log entries that were formerly displayed at the shell prompt.

When starting a daemon process, twistd installs a log handler that writes entries to a file instead of standard output. The default log file is twistd.log, located in the same directory where you ran twistd, but you can change that with the --logfile option if you wish. The handler that twistd installs also rotates the log whenever the size exceeds one megabyte.

You should be able to see the server running by listing all the processes on your system. Go ahead and test out the server by fetching another poem. You should see new entries appear in the log file for each poem you request.

Since the server is no longer connected to the shell (or any other process except init), you cannot shut it down with Ctrl-C. As a true daemon process, it will continue to run even if you log out. But we can still use the twistd.pid file to stop the process:

kill `cat twistd.pid`

And when that happens the shutdown messages appear in the log, the twistd.pid file is removed, and our server stops running. Neato.

It’s a good idea to check out some of the other twistd startup options. For example, you can tell twistd to switch to a different user or group account before starting the daemon (typically a way to drop privileges your server doesn’t need as a security precaution). We won’t bother going into those extra options, you can find them using the --help switch to twistd.

The Twisted Plugin System

Ok, now we can use twistd to start up our servers as genuine daemon processes. This is all very nice, and the fact that our “configuration” files are really just Python source files gives us a great deal of flexibility in how we set things up. But we don’t always need that much flexibility. For our poetry servers, we typically only have a few options we might care about:

  1. The poem to serve.
  2. The port to serve it from.
  3. The interface to listen on.

Making new tac files for simple variations on those values seems rather excessive. It would be nice if we could just specify those values as options on the twistd command line. The Twisted plugin system allows us to do just that.

Twisted plugins provide a way of defining named Applications, with a custom set of command-line options, that twistd can dynamically discover and run. Twisted itself comes with a set of built-in plugins. You can see them all by running twistd without any arguments. Try running it now, but outside of the twisted-intro directory. After the help section, you should see some output like this:

    ...
    ftp                An FTP server.
    telnet             A simple, telnet-based remote debugging service.
    socks              A SOCKSv4 proxy service.
    ...

Each line shows one of the built-in plugins that come with Twisted. And you can run any of them using twistd.
Each plugin also comes with its own set of options, which you can discover using --help. Let’s see what the options for the ftp plugin are:

twistd ftp --help

Note that you need to put the --help switch after the ftp command, since you want the options for the ftp plugin rather than for twistd itself.
We can run the ftp server with twistd just like we ran our poetry server. But since it’s a plugin, we just run it by name:

twistd --nodaemon ftp --port 10001

That command runs the ftp plugin in non-daemon mode on port 10001. Note the twistd option nodaemon comes before the plugin name, while the plugin-specific option port comes after the plugin name. As with our poetry server, you can stop that plugin with Ctrl-C.

Ok, let’s turn our poetry server into a Twisted plugin. First we need to introduce a couple of new concepts.

IPlugin

Any Twisted plugin must implement the twisted.plugin.IPlugin interface. If you look at the declaration of that Interface, you’ll find it doesn’t actually specify any methods. Implementing IPlugin is simply a way for a plugin to say “Hello, I’m a plugin!” so twistd can find it. Of course, to be of any use, it will have to implement some other interface and we’ll get to that shortly.

But how do you know if an object actually implements an empty interface? The zope.interface package includes a function called implements that you can use to declare that a particular class implements a particular interface. We’ll see an example of that in the plugin version of our poetry server.

IServiceMaker

In addition to IPlugin, our plugin will implement the IServiceMaker interface. An object which implements IServiceMaker knows how to create an IService that will form the heart of a running application. IServiceMaker specifies three attributes and a method:

  1. tapname: a string name for our plugin. The “tap” stands for Twisted Application Plugin. Note: an older version of Twisted also made use of pickled application files called “tapfiles”, but that functionality is deprecated.
  2. description: a description of the plugin, which twistd will display as part of its help text.
  3. options: an object which describes the command-line options this plugin accepts.
  4. makeService: a method which creates a new IService object, given a specific set of command-line options

We’ll see how all this gets put together in the next version of our poetry server.

Fast Poetry 3.0

Now we’re ready to take a look at the plugin version of Fast Poetry, located in twisted/plugins/fastpoetry_plugin.py.

You might notice we’ve named these directories differently than any of the other examples. That’s because twistd requires plugin files to be located in a twisted/plugins directory, located in your Python module search path. The directory doesn’t have to be a package (i.e., you don’t need any __init__.py files) and you can have multiple twisted/plugins directories on your path and twistd will find them all. The actual filename you use for the plugin doesn’t matter either, but it’s still a good idea to name it according to the application it represents, like we have done here.

The first part of our plugin contains the same poetry protocol, factory, and service implementations as our tac file. And as before, this code would normally be in a separate module but we’ve placed it in the plugin to make the example self-contained.

Next comes the declaration of the plugin’s command-line options:

class Options(usage.Options):

    optParameters = [
        ['port', 'p', 10000, 'The port number to listen on.'],
        ['poem', None, None, 'The file containing the poem.'],
        ['iface', None, 'localhost', 'The interface to listen on.'],
        ]

This code specifies the plugin-specific options that a user can place after the plugin name on the twistd command line. We won’t go into details here as it should be fairly clear what is going on. Now we get to the main part of our plugin, the service maker class:

class PoetryServiceMaker(object):

    implements(service.IServiceMaker, IPlugin)

    tapname = "fastpoetry"
    description = "A fast poetry service."
    options = Options

    def makeService(self, options):
        top_service = service.MultiService()

        poetry_service = PoetryService(options['poem'])
        poetry_service.setServiceParent(top_service)

        factory = PoetryFactory(poetry_service)
        tcp_service = internet.TCPServer(int(options['port']), factory,
                                         interface=options['iface'])
        tcp_service.setServiceParent(top_service)

        return top_service

Here you can see how the zope.interface.implements function is used to declare that our class implements both IServiceMaker and IPlugin.

You should recognize the code in makeService from our earlier tac file implementation. But this time we don’t need to make an Application object ourselves, we just create and return the top level service that our application will run and twistd will take care of the rest. Notice how we use the options argument to retrieve the plugin-specific command-line options given to twistd.

After declaring that class, there’s only on thing left to do:

service_maker = PoetryServiceMaker()

The twistd script will discover that instance of our plugin and use it to construct the top level service. Unlike the tac file, the variable name we choose is irrelevant. What matters is that our object implements both IPlugin and IServiceMaker.

Now that we’ve created our plugin, let’s run it. Make sure that you are in the twisted-intro directory, or that the twisted-intro directory is in your python module search path. Then try running twistd by itself. You should now see that “fastpoetry” is one of the plugins listed, along with the description text from our plugin file.

You will also notice that a new file called dropin.cache has appeared in the twisted/plugins directory. This file is created by twistd to speed up subsequent scans for plugins.

Now let’s get some help on using our plugin:

twistd fastpoetry --help

You should see the options that are specific to the fastpoetry plugin in the help text. Finally, let’s run our plugin:

twistd fastpoetry --port 10000 --poem poetry/ecstasy.txt

That will start a fastpoetry server running as a daemon. As before, you should see both twistd.pid and twistd.log files in the current directory. After testing out the server, you can shut it down:

kill `cat twistd.pid`

And that’s how you make a Twisted plugin.

Summary

In this Part we learned about turning our Twisted servers into long-running daemons. We touched on the Twisted logging system and on how to use twistd to start a Twisted application as a daemon process, either from a tac configuration file or a Twisted plugin. In Part 17 we’ll return to the more fundamental topic of asynchronous programming and look at another way of structuring our callbacks in Twisted.

Suggested Exercises

  1. Modify the tac file to serve a second poem on another port. Keep the services for each poem separate by using another MultiService object.
  2. Create a new tac file that starts a poetry proxy server.
  3. Modify the plugin file to accept an optional second poetry file and second port to serve it on.
  4. Create a new plugin for the poetry proxy server.

An Introduction to Asynchronous Programming and Twisted

Part 15: Tested Poetry

This continues the introduction started here. You can find an index to the entire series here.

Introduction

We’ve written a lot of code in our exploration of Twisted, but so far we’ve neglected to write something important — tests. And you may be wondering how you can test asynchronous code using a synchronous framework like the unittest package that comes with Python. The short answer is you can’t. As we’ve discovered, synchronous and asynchronous code do not mix, at least not readily.

Fortunately, Twisted includes its own testing framework called trial that does support testing asynchronous code (and you can use it to test synchronous code, too).

We’ll assume you are already familiar with the basic mechanics of unittest and similar testing frameworks, in which you create tests by defining a class with a specific parent class (usually called something like TestCase), and each method of that class starting with the word “test” is considered a single test. The framework takes care of discovering all the tests, running them one after the other with optional setUp and tearDown steps, and then reporting the results.

The Example

You will find some example tests located in tests/test_poetry.py. To ensure all our examples are self-contained (so you don’t need to worry about PYTHONPATH settings), we have copied all the necessary code into the test module. Normally, of course, you would just import the modules you wanted to test.

The example is testing both the poetry client and server, by using the client to fetch a poem from a test server. To provide a poetry server for testing, we implement the setUp method in our test case:

class PoetryTestCase(TestCase):

    def setUp(self):
        factory = PoetryServerFactory(TEST_POEM)
        from twisted.internet import reactor
        self.port = reactor.listenTCP(0, factory, interface="127.0.0.1")
        self.portnum = self.port.getHost().port

The setUp method makes a poetry server with a test poem, and listens on a random, open port. We save the port number so the actual tests can use it, if they need to. And, of course, we clean up the test server in tearDown when the test is done:

    def tearDown(self):
        port, self.port = self.port, None
        return port.stopListening()

That brings us to our first test, test_client, where we use get_poetry to retrieve the poem from the test server and verify it’s the poem we expected:

    def test_client(self):
        """The correct poem is returned by get_poetry."""
        d = get_poetry('127.0.0.1', self.portnum)

        def got_poem(poem):
            self.assertEquals(poem, TEST_POEM)

        d.addCallback(got_poem)

        return d

Notice that our test function is returning a deferred. Under trial, each test method runs as a callback. That means the reactor is running and we can perform asynchronous operations as part of the test. We just need to let the framework know that our test is asynchronous and we do that in the usual Twisted way — return a deferred.

The trial framework will wait until the deferred fires before calling the tearDown method, and will fail the test if the deferred fails (i.e., if the last callback/errback pair fails). It will also fail the test if our deferred takes too long to fire, two minutes by default. And that means if the test finished, we know our deferred fired, and therefore our callback fired and ran the assertEquals test method.

Our second test, test_failure, verifies that get_poetry fails in the appropriate way if we can’t connect to the server:

    def test_failure(self):
        """The correct failure is returned by get_poetry when
        connecting to a port with no server."""
        d = get_poetry('127.0.0.1', 0)
        return self.assertFailure(d, ConnectionRefusedError)

Here we attempt to connect to an invalid port and then use the trial-provided assertFailure method. This method is like the familiar assertRaises method but for asynchronous code. It returns a deferred that succeeds if the given deferred fails with the given exception, and fails otherwise.

You can run the tests yourself using the trial script like this:

trial tests/test_poetry.py

And you should see some output showing each test case and an OK telling you each test passed.

Discussion

Because trial is so similar to unittest when it comes to the basic API, it’s pretty easy to get started writing tests. Just return a deferred if your test uses asynchronous code, and trial will take care of the rest. You can also return a deferred from the setUp and tearDown methods, if those need to be asynchronous as well.

Any log messages from your tests will be collected in a file inside a directory called _trial_temp that trial will create automatically if it doesn’t exist. In addition to the errors printed to the screen, the log is a useful starting point when debugging failing tests.

Figure 33 shows a hypothetical test run in progress:

Figure 33: a trial test in progress
Figure 33: a trial test in progress

If you’ve used similar frameworks before, this should be a familiar model, except that all the test-related methods may return deferreds.

The trial framework is also a good illustration of how “going asynchronous” involves changes that cascade throughout the program. In order for a test (or any function or method) to be asynchronous, it must:

  1. Not block and, usually,
  2. return a deferred.

But that means that whatever calls that function must be willing to accept a deferred, and also not block (and thus likely return a deferred as well). And so it goes up and up. Thus, the need for a framework like trial which can handle asynchronous tests that return deferreds.

Summary

That’s it for our look at unit testing. If would like to see more examples of how to write unit tests for Twisted code, you need look no further than Twisted itself. The Twisted framework comes with a very large suite of unit tests, with new ones added in each release. Since these tests are scrutinized by Twisted experts during code reviews before being accepted into the codebase, they make excellent examples of how to test Twisted code the right way.

In Part 16 we will use a Twisted utility to turn our poetry server into a genuine daemon.

Suggested Exercises

  1. Change one of the tests to make it fail and run trial again to see the output.
  2. Read the online trial documentation.
  3. Write tests for some of the other poetry services we have created in this series.
  4. Explore some of the tests in Twisted.