Part 2: Slow Poetry and the Apocalypse
This continues the introduction started here. And if you read it, welcome back. Now we’re going to get our hands dirty and write some code. But first, let’s get some assumptions out of the way.
My Assumptions About You
I will proceed as if you have a basic working knowledge of writing synchronous programs in Python, and know at least a little bit about Python socket programming. If you have never used sockets before, you might read the socket module documentation now, especially the example code towards the end. If you’ve never used Python before, then the rest of this introduction is probably going to be rather opaque.
My Assumptions About Your Computer
My experience with Twisted is mainly on Linux systems, and it is a Linux system on which I developed the examples. And while I won’t intentionally make the code Linux-dependent, some of it, and some of what I say, may only apply to Linux and other UNIX-like systems (like Mac OSX or FreeBSD). Windows is a strange, murky place and, if you are hacking in it, I can’t offer you much more beyond my heartfelt sympathies.
Also, you can run all the examples on a single computer, although you can configure them to run on a network of systems as well. But for learning the basic mechanics of asynchronous programming, a single computer will do fine.
Getting the example code
The example code is available as a zip or tar file or as a clone of my public git repository. If you can use git or another version control system that can read git repositories, then I recommend using that method as I will update the examples over time and it will be easier for you to stay current. As a bonus, it includes the SVG source files used to generate the figures. Here is the git command to clone the repository:
git clone git://github.com/jdavisp3/twisted-intro.git
The rest of this tutorial will assume you have the latest copy of the example code and you have multiple shells open in its top-level directory (the one with the README file).
Although CPUs are much faster than networks, most networks are still a lot faster than your brain, or at least faster than your eyeballs. So it can be challenging to get the “cpu’s-eye-view” of network latency, especially when there’s only one machine and the bytes are whizzing past at full speed on the loopback interface. What we need is a slow server, one with artificial delays we can vary to see the effect. And since servers have to serve something, ours will serve poetry. The example code includes a sub-directory called poetry with one poem each by John Donne, W.B. Yeats, and Edgar Allen Poe. Of course, you are free to substitute your own poems for the server to dish up.
The basic slow poetry server is implemented in blocking-server/slowpoetry.py. You can run one instance of the server like this:
python blocking-server/slowpoetry.py poetry/ecstasy.txt
That command will start up the blocking server with John Donne’s poem “Ecstasy” as the poem to serve. Go ahead and look at the source code to the blocking server now. As you can see, it does not use Twisted, only basic Python socket operations. It also sends a limited number of bytes at a time, with a fixed time delay between them. By default, it sends 10 bytes every 0.1 seconds, but you can change these parameters with the –num-bytes and –delay command line options. For example, to send 50 bytes every 5 seconds:
python blocking-server/slowpoetry.py --num-bytes 50 --delay 5 poetry/ecstasy.txt
When the server starts up it prints out the port number it is listening on. By default, this is a random port that happens to be available on your machine. When you start varying the settings, you will probably want to use the same port number over again so you don’t have to adjust the client command. You can specify a particular port like this:
python blocking-server/slowpoetry.py --port 10000 poetry/ecstasy.txt
If you have the netcat program available, you could test the above command like this:
netcat localhost 10000
If the server is working, you will see the poem slowly crawl its way down your screen. Ecstasy! You will also notice the server prints out a line each time it sends some bytes. Once the complete poem has been sent, the server closes the connection.
By default, the server only listens on the local “loopback” interface. If you want to access the server from another machine, you can specify the interface to listen on with the –iface option.
Not only does the server send each poem slowly, if you read the code you will find that while the server is sending poetry to one client, all other clients must wait for it to finish before getting even the first line. It is truly a slow server, and not much use except as a learning device.
Or is it?
On the other hand, if the more pessimistic of the Peak Oil folks are right and our world is heading for a global energy crisis and planet-wide societal meltdown, then perhaps one day soon a low-bandwidth, low-power poetry server could be just what we need. Imagine, after a long day of tending your self-sufficient gardens, making your own clothing, serving on your commune’s Central Organizing Committee, and fighting off the radioactive zombies that roam the post-apocalyptic wastelands, you could crank up your generator and download a few lines of high culture from a vanished civilization. That’s when our little server will really come into its own.
The Blocking Client
Also in the example code is a blocking client which can download poems from multiple servers, one after another. Let’s give our client three tasks to perform, as in Figure 1 from Part 1. First we’ll start three servers, serving three different poems. Run these commands in three different terminal windows:
python blocking-server/slowpoetry.py --port 10000 poetry/ecstasy.txt --num-bytes 30 python blocking-server/slowpoetry.py --port 10001 poetry/fascination.txt python blocking-server/slowpoetry.py --port 10002 poetry/science.txt
You can choose different port numbers if one or more of the ones I chose above are already being used on your system. Note I told the first server to use chunks of 30 bytes instead of the default 10 since that poem is about three times as long as the others. That way they all finish around the same time.
Now we can use the blocking client in blocking-client/get-poetry.py to grab some poetry. Run the client like this:
python blocking-client/get-poetry.py 10000 10001 10002
Change the port numbers here, too, if you used different ones for your servers. Since this is the blocking client, it will download one poem from each port number in turn, waiting until a complete poem is received until starting the next. Instead of printing out the poems, the blocking client produces output like this:
Task 1: get poetry from: 127.0.0.1:10000 Task 1: got 3003 bytes of poetry from 127.0.0.1:10000 in 0:00:10.126361 Task 2: get poetry from: 127.0.0.1:10001 Task 2: got 623 bytes of poetry from 127.0.0.1:10001 in 0:00:06.321777 Task 3: get poetry from: 127.0.0.1:10002 Task 3: got 653 bytes of poetry from 127.0.0.1:10002 in 0:00:06.617523 Got 3 poems in 0:00:23.065661
This is basically a text version of Figure 1, where each task is downloading a single poem. Your times may be a little different, and will vary as you change the timing parameters of the servers. Try changing those parameters to see the effect on the download times.
You might take a look at the source code to the blocking server and client now, and locate the points in the code where each blocks while sending or receiving network data.
The Asynchronous Client
Now let’s take a look at a simple asynchronous client written without Twisted. First let’s run it. Get a set of three servers going on the same ports like we did above. If the ones you ran earlier are still going, you can just use them again. Now we can run the asynchronous client, located in async-client/get-poetry.py, like this:
python async-client/get-poetry.py 10000 10001 10002
And you should get some output like this:
Task 1: got 30 bytes of poetry from 127.0.0.1:10000 Task 2: got 10 bytes of poetry from 127.0.0.1:10001 Task 3: got 10 bytes of poetry from 127.0.0.1:10002 Task 1: got 30 bytes of poetry from 127.0.0.1:10000 Task 2: got 10 bytes of poetry from 127.0.0.1:10001 ... Task 1: 3003 bytes of poetry Task 2: 623 bytes of poetry Task 3: 653 bytes of poetry Got 3 poems in 0:00:10.133169
This time the output is much longer because the asynchronous client prints a line each time it downloads some bytes from any server, and these slow poetry servers just dribble out the bytes little by little. Notice that the individual tasks are mixed together just like in Figure 3 from Part 1.
Try varying the delay settings for the servers (e.g., by making one server slower than the others) to see how the asynchronous client automatically “adjusts” to the speed of the slower servers while still keeping up with the faster ones. That’s asynchronicity in action.
Also notice that, for the server settings we chose above, the asynchronous client finishes in about 10 seconds while the synchronous client needs around 23 seconds to get all the poems. Now recall the differences between Figure 3 and Figure 4 in Part 1. By spending less time blocking, our asynchronous client can download all the poems in a shorter overall time. Now, our asynchronous client does block some of the time. Our slow server is slow. It’s just that the asynchronous client spends a lot less time blocking than the “blocking” client does, because it can switch back and forth between all the servers.
Technically, our asynchronous client is performing a blocking operation: it’s writing to the standard output file descriptor with those
A Closer Look
Now take a look at the source code for the asynchronous client. Notice the main differences between it and the synchronous client:
- Instead of connecting to one server at a time, the asynchronous client connects to all the servers at once.
- The socket objects used for communication are placed in non-blocking mode with the call to
selectmethod in the select module is used to wait (block) until any of the sockets are ready to give us some data.
- When reading data from the servers, we read only as much as we can until the socket would block, and then move on to the next socket with data to read (if any). This means we have to keep track of the poetry we’ve received from each server so far.
The core of the asynchronous client is the top-level loop in the
get_poetry function. This loop can be broken down into steps:
- Wait (block) on all open sockets using
selectuntil one (or more) sockets has data to be read.
- For each socket with data to be read, read it, but only as much as is available now. Don’t block.
- Repeat, until all sockets have been closed.
The synchronous client had a loop as well (in the
main function), but each iteration of the synchronous loop downloaded one complete poem. In one iteration of the asynchronous client we might download pieces of all the poems we are working on, or just some of them. And we don’t know which ones we will work on in a given iteration, or how much data we will get from each one. That all depends on the relative speeds of the servers and the state of the network. We just let
select tell us which ones are ready to go, and then read as much data as we can from each socket without blocking.
If the synchronous client always contacted a fixed number of servers (say 3), it wouldn’t need an outer loop at all, it could just call its blocking
get_poetry function three times in succession. But the asynchronous client can’t do without an outer loop — to gain the benefits of asynchronicity, we need to wait on all of our sockets at once, and only process as much data as each is capable of delivering in any given iteration.
This use of a loop which waits for events to happen, and then handles them, is so common that it has achieved the status of a design pattern: the reactor pattern. It is visualized in Figure 5 below:
The loop is a “reactor” because it waits for and then reacts to events. For that reason it is also known as an event loop. And since reactive systems are often waiting on I/O, these loops are also sometimes called select loops, since the select call is used to wait for I/O. So in a
select loop, an “event” is when a socket becomes available for reading or writing. Note that
select is not the only way to wait for I/O, it is just one of the oldest methods (and thus widely available). There are several newer APIs, available on different operating systems, that do the same thing as
select but offer (hopefully) better performance. But leaving aside performance, they all do the same thing: take a set of sockets (really file descriptors) and block until one or more of them is ready to do I/O.
Note that it’s possible to use
select and its brethren to simply check whether a set of file descriptors is ready for I/O without blocking. This feature permits a reactive system to perform non-I/O work inside the loop. But in reactive systems it is often the case that all work is I/O-bound, and thus blocking on all file descriptors conserves CPU resources.
Strictly speaking, the loop in our asynchronous client is not the reactor pattern because the loop logic is not implemented separately from the “business logic” that is specific to the poetry servers. They are all just mixed together. A real implementation of the reactor pattern would implement the loop as a separate abstraction with the ability to:
- Accept a set of file descriptors you are interested in performing I/O with.
- Tell you, repeatedly, when any file descriptors are ready for I/O.
And a really good implementation of the reactor pattern would also:
- Handle all the weird corner cases that crop up on different systems.
- Provide lots of nice abstractions to help you use the reactor with the least amount of effort.
- Provide implementations of public protocols that you can use out of the box.
Well that’s just what Twisted is — a robust, cross-platform implementation of the Reactor Pattern with lots of extras. And in Part 3 we will start writing some simple Twisted programs as we move towards a Twisted version of Get Poetry Now!.
- Do some timing experiments with the blocking and asynchronous clients by varying the number and settings of the poetry servers.
- Could the asynchronous client provide a
get_poetryfunction that returned the text of the poem? Why not?
- If you wanted a
get_poetryfunction in the asynchronous client that was analogous to the synchronous version of
get_poetry, how could it work? What arguments and return values might it have?