A fantastic book about the observations of evolution in modern-day animals, centered around the study of Darwin’s finches on Galapagos by Rosemary and Peter grant.
Part 5: Twistier Poetry
In Part 4 we made our first poetry client that uses Twisted. It works pretty well, but there is definitely room for improvement.
First of all, the client includes code for mundane details like creating network sockets and receiving data from those sockets. Twisted provides support for these sorts of things so we don’t have to implement them ourselves every time we write a new program. This is especially helpful because asynchronous I/O requires a few tricky bits involving exception handling as you can see in the client code. And there are even more tricky bits if you want your code to work on multiple platforms. If you have a free afternoon, search the Twisted sources for “win32″ to see how many corner cases that platform introduces.
Another problem with the current client is error handling. Try running version 1.0 of the Twisted client and tell it to download from a port with no server. It just crashes. We could fix the current client, but error handling is easier with the Twisted APIs we’ll be using today.
Finally, the client isn’t particularly re-usable. How would another module get a poem with our client? How would the “calling” module know when the poem had finished downloading? We can’t write a function that simply returns the text of the poem as that would require blocking until the entire poem is read. This is a real problem but we’re not going to fix it today — we’ll save that for future Parts.
We’re going to fix the first and second problems using a higher-level set of APIs and Interfaces. The Twisted framework is loosely composed of layers of abstractions and learning Twisted means learning what those layers provide, i.e, what APIs, Interfaces, and implementations are available for use in each one. Since this is an introduction we’re not going to study each abstraction in complete detail or do an exhaustive survey of every abstraction that Twisted offers. We’re just going to look at the most important pieces to get a better feel for how Twisted is put together. Once you become familiar with the overall style of Twisted’s architecture, learning new parts on your own will be much easier.
In general, each Twisted abstraction is concerned with one particular concept. For example, the 1.0 client from Part 4 uses
IReadDescriptor, the abstraction of a “file descriptor you can read bytes from”. A Twisted abstraction is usually defined by an Interface specifying how an object embodying that abstraction should behave. The most important thing to keep in mind when learning a new Twisted abstraction is this:
Most higher-level abstractions in Twisted are built by using lower-level ones, not by replacing them.
So when you are learning a new Twisted abstraction, keep in mind both what it does and what it does not do. In particular, if some earlier abstraction A implements feature F, then F is probably not implemented by any other abstraction. Rather, if another abstraction B needs feature F, it will use A rather than implement F itself. (In general, an implementation of B will either sub-class an implementation of A or refer to another object that implements A).
Networking is a complex subject, and thus Twisted contains lots of abstractions. By starting with lower levels first, we are hopefully getting a clearer picture of how they all get put together in a working Twisted program.
Loopiness in the Brain
The most important abstraction we have learned so far, indeed the most important abstraction in Twisted, is the reactor. At the center of every program built with Twisted, no matter how many layers that program might have, there is a reactor loop spinning around and making the whole thing go. Nothing else in Twisted provides the functionality the reactor offers. Much of the rest of Twisted, in fact, can be thought of as “stuff that makes it easier to do X using the reactor” where X might be “serve a web page” or “make a database query” or some other specific feature. Although it’s possible to stick with the lower-level APIs, like the client 1.0 does, we have to implement more things ourselves if we do. Moving to higher-level abstractions generally means writing less code (and letting Twisted handle the platform-dependent corner cases).
But when we’re working at the outer layers of Twisted it can be easy to forget the reactor is there. In any Twisted program of reasonable size, relatively few parts of our code will actually use the reactor APIs directly. The same is true for some of the other low-level abstractions. The file descriptor abstractions we used in client 1.0 are so thoroughly subsumed by higher-level concepts that they basically disappear in real Twisted programs (they are still used on the inside, we just don’t see them as such).
As far as the file descriptor abstractions go, that’s not really a problem. Letting Twisted handle the mechanics of asynchronous I/O frees us to concentrate on whatever problem we are trying to solve. But the reactor is different. It never really disappears. When you choose to use Twisted you are also choosing to use the Reactor Pattern, and that means programming in the “reactive style” using callbacks and cooperative multi-tasking. If you want to use Twisted correctly, you have to keep the reactor’s existence (and the way it works) in mind. We’ll have more to say about this in Part 6, but for now our message is this:
We’ll keep using diagrams to illustrate new concepts, but those two Figures are the ones that you need to burn into your brain, so to speak. Those are the pictures I constantly have in mind while writing programs with Twisted.
Before we dive into the code, there are three new abstractions to introduce: Transports, Protocols, and Protocol Factories.
The Transport abstraction is defined by
ITransport in the main Twisted
interfaces module. A Twisted Transport represents a single connection that can send and/or receive bytes. For our poetry clients, the Transports are abstracting TCP connections like the ones we have been making ourselves in earlier versions. But Twisted also supports I/O over UNIX Pipes and UDP sockets among other things. The Transport abstraction represents any such connection and handles the details of asynchronous I/O for whatever sort of connection it represents.
If you scan the methods defined for
ITransport, you won’t find any for receiving data. That’s because Transports always handle the low-level details of reading data asynchronously from their connections, and give the data to us via callbacks. Along similar lines, the write-related methods of Transport objects may choose not to write the data immediately to avoid blocking. Telling a Transport to write some data means “send this data as soon as you can do so, subject to the requirement to avoid blocking”. The data will be written in the order we provide it, of course.
We generally don’t implement our own Transport objects or create them in our code. Rather, we use the implementations that Twisted already provides and which are created for us when we tell the reactor to make a connection.
Twisted Protocols are defined by
IProtocol in the same
interfaces module. As you might expect, Protocol objects implement protocols. That is to say, a particular implementation of a Twisted Protocol should implement one specific networking protocol, like FTP or IMAP or some nameless protocol we invent for our own purposes. Our poetry protocol, such as it is, simply sends all the bytes of the poem as soon as a connection is established, while the close of the connection signifies the end of the poem.
Strictly speaking, each instance of a Twisted Protocol object implements a protocol for one specific connection. So each connection our program makes (or, in the case of servers, accepts) will require one instance of a Protocol. This makes Protocol instances the natural place to store both the state of “stateful” protocols and the accumulated data of partially received messages (since we receive the bytes in arbitrary-sized chunks with asynchronous I/O).
So how do Protocol instances know what connection they are responsible for? If you look at the
IProtocol definition, you will find a method called
makeConnection. This method is a callback and Twisted code calls it with a Transport instance as the only argument. The Transport is the connection the Protocol is going to use.
Twisted includes a large number of ready-built Protocol implementations for various common protocols. You can find a few simpler ones in
twisted.protocols.basic. It’s a good idea to check the Twisted sources before you write a new Protocol to see if there’s already an implementation you can use. But if there isn’t, it’s perfectly OK to implement your own, as we will do for our poetry clients.
So each connection needs its own Protocol and that Protocol might be an instance of a class we implement ourselves. Since we will let Twisted handle creating the connections, Twisted needs a way to make the appropriate Protocol “on demand” whenever a new connection is made. Making Protocol instances is the job of Protocol Factories.
As you’ve probably guessed, the Protocol Factory API is defined by IProtocolFactory, also in the
interfaces module. Protocol Factories are an example of the Factory design pattern and they work in a straightforward way. The
buildProtocol method is supposed to return a new Protocol instance each time it is called. This is the method that Twisted uses to make a new Protocol for each new connection.
Get Poetry 2.0: First Blood.0
Alright, let’s take a look at version 2.0 of the Twisted poetry client. The code is in
twisted-client-2/get-poetry.py. You can run it just like the others and get similar output so I won’t bother posting output here. This is also the last version of the client that prints out task numbers as it receives bytes. By now it should be clear that all Twisted programs work by interleaving tasks and processing relatively small chunks of data at a time. We’ll still use
In client 2.0, sockets have disappeared. We don’t even import the
socket module and we never refer to a socket object, or a file descriptor, in any way. Instead, we tell the reactor to make the connections to the poetry servers on our behalf like this:
factory = PoetryClientFactory(len(addresses)) from twisted.internet import reactor for address in addresses: host, port = address reactor.connectTCP(host, port, factory)
connectTCP method is the one to focus on. The first two arguments should be self-explanatory. The third is an instance of our
PoetryClientFactory class. This is the Protocol Factory for poetry clients and passing it to the reactor allows Twisted to create instances of our
PoetryProtocol on demand.
Notice that we are not implementing either the Factory or the Protocol from scratch, unlike the
PoetrySocket objects in our previous client. Instead, we are sub-classing the base implementations that Twisted provides in
twisted.internet.protocol. The primary Factory base class is
twisted.internet.protocol.Factory, but we are using the
ClientFactory sub-class which is specialized for clients (processes that make connections instead of listening for connections like a server).
We are also taking advantage of the fact that the Twisted
Factory class implements
buildProtocol for us. We call the base class implementation in our sub-class:
def buildProtocol(self, address): proto = ClientFactory.buildProtocol(self, address) proto.task_num = self.task_num self.task_num += 1 return proto
How does the base class know what Protocol to build? Notice we are also setting the class attribute
class PoetryClientFactory(ClientFactory): task_num = 1 protocol = PoetryProtocol # tell base class what proto to build
Factory class implements
buildProtocol by instantiating the class we set on
i.e., PoetryProtocol) and setting the
factory attribute on that new instance to be a reference to its “parent” Factory. This is illustrated in Figure 8:
As we mentioned above, the
factory attribute on Protocol objects allows Protocols created with the same Factory to share state. And since Factories are created by “user code”, that same attribute allows Protocol objects to communicate results back to the code that initiated the request in the first place, as we will see in Part 6.
Note that while the
factory attribute on Protocols refers to an instance of a Protocol Factory, the
protocol attribute on the Factory refers to the class of the Protocol. In general, a single Factory might create many Protocol instances.
The second stage of Protocol construction connects a Protocol with a Transport, using the
makeConnection method. We don’t have to implement this method ourselves since the Twisted base class provides a default implementation. By default,
makeConnection stores a reference to the Transport on the
transport attribute and sets the
connected attribute to a True value, as depicted in Figure 9:
Once initialized in this way, the Protocol can start performing its real job — translating a lower-level stream of data into a higher-level stream of protocol messages (and vice-versa for 2-way connections). The key method for processing incoming data is
dataReceived, which our client implements like this:
def dataReceived(self, data): self.poem += data msg = 'Task %d: got %d bytes of poetry from %s' print msg % (self.task_num, len(data), self.transport.getPeer())
dataReceived is called we get a new sequence of bytes (
data) in the form of a string. As always with asynchronous I/O, we don’t know how much data we are going to get so we have to buffer it until we receive a complete protocol message. In our case, the poem isn’t finished until the connection is closed, so we just keep adding the bytes to our
Note we are using the
getPeer method on our Transport to identify which server the data is coming from. We are only doing this to be consistent with earlier clients. Otherwise our code wouldn’t need to use the Transport explicitly at all, since we never send any data to the servers.
Let’s take a quick look at what’s going on when the
dataReceived method is called. In the same directory as our 2.0 client, there is another client called twisted-client-2/get-poetry-stack.py. This is just like the 2.0 client except the
dataReceived method has been changed like this:
def dataReceived(self, data): traceback.print_stack() os._exit(0)
With this change the program will print a stack trace and then quit the first time it receives some data. You could run this version like so:
python twisted-client-2/get-poetry-stack.py 10000
And you will get a stack trace like this:
File "twisted-client-2/get-poetry-stack.py", line 125, in poetry_main() ... # I removed a bunch of lines here File ".../twisted/internet/tcp.py", line 463, in doRead # Note the doRead callback return self.protocol.dataReceived(data) File "twisted-client-2/get-poetry-stack.py", line 58, in dataReceived traceback.print_stack()
doRead callback we used in client 1.0! As we noted before, Twisted builds new abstractions by using the old ones, not by replacing them. So there is still an
IReadDescriptor implementation hard at work, it’s just implemented by Twisted instead of our code. If you are curious, Twisted’s implementation is in twisted.internet.tcp. If you follow the code, you’ll find that the same object implements
ITransport too. So the
IReadDescriptor is actually the Transport object in disguise. We can visualize a
dataReceived callback with Figure 10:
Once a poem has finished downloading, the
PoetryProtocol object notifies its
def connectionLost(self, reason): self.poemReceived(self.poem) def poemReceived(self, poem): self.factory.poem_finished(self.task_num, poem)
connectionLost callback is invoked when the transport’s connection is closed. The
reason argument is a
twisted.python.failure.Failure object with additional information on whether the connection was closed cleanly or due to an error. Our client just ignores this value and assumes we received the entire poem.
The factory shuts down the reactor after all the poems are done. Once again we assume the only thing our program is doing is downloading poems, which makes
PoetryClientFactory objects less reusable. We’ll fix that in the next Part, but notice how the
poem_finished callback keeps track of the number of poems left to go:
... self.poetry_count -= 1 if self.poetry_count == 0: ...
If we were writing a multi-threaded program where each poem was downloaded in a separate thread we would need to protect this section of code with a lock in case two or more threads invoked
poem_finished at the same time. Otherwise we might end up shutting down the reactor twice (and getting a traceback for our troubles). But with a reactive system we needn’t bother. The reactor can only make one callback at a time, so this problem just can’t happen.
Our new client also handles a failure to connect with more grace than the 1.0 client. Here’s the callback on the
PoetryClientFactory class which does the job:
def clientConnectionFailed(self, connector, reason): print 'Failed to connect to:', connector.getDestination() self.poem_finished()
Note the callback is on the factory, not on the protocol. Since a protocol is only created after a connection is made, the factory gets the news when a connection cannot be established.
A simpler client
Although our new client is pretty simple already, we can make it simpler if we dispense with the task numbers. The client should really be about the poetry, after all. There is a simplified 2.1 version in twisted-client-2/get-poetry-simple.py.
Client 2.0 uses Twisted abstractions that should be familiar to any Twisted hacker. And if all we wanted was a command-line client that printed out some poetry and then quit, we could even stop here and call our program done. But if we wanted some re-usable code, some code that we could embed in a larger program that needs to download some poetry but also do other things, then we still have some work to do. In Part 6 we’ll take a first stab at it.
callLaterto make the client timeout if a poem hasn’t finished after a given interval. Use the
loseConnectionmethod on the transport to close the connection on a timeout, and don’t forget to cancel the timeout if the poem finishes on time.
- Use the stacktrace method to analyze the callback sequence that occurs when