An Introduction to Asynchronous Programming and Twisted

Part 5: Twistier Poetry

This continues the introduction started here. You can find an index to the entire series here.

Abstract Expressionism

In Part 4 we made our first poetry client that uses Twisted. It works pretty well, but there is definitely room for improvement.

First of all, the client includes code for mundane details like creating network sockets and receiving data from those sockets. Twisted provides support for these sorts of things so we don’t have to implement them ourselves every time we write a new program. This is especially helpful because asynchronous I/O requires a few tricky bits involving exception handling as you can see in the client code. And there are even more tricky bits if you want your code to work on multiple platforms. If you have a free afternoon, search the Twisted sources for “win32″ to see how many corner cases that platform introduces.

Another problem with the current client is error handling. Try running version 1.0 of the Twisted client and tell it to download from a port with no server. It just crashes. We could fix the current client, but error handling is easier with the Twisted APIs we’ll be using today.

Finally, the client isn’t particularly re-usable. How would another module get a poem with our client? How would the “calling” module know when the poem had finished downloading? We can’t write a function that simply returns the text of the poem as that would require blocking until the entire poem is read. This is a real problem but we’re not going to fix it today — we’ll save that for future Parts.

We’re going to fix the first and second problems using a higher-level set of APIs and Interfaces. The Twisted framework is loosely composed of layers of abstractions and learning Twisted means learning what those layers provide, i.e, what APIs, Interfaces, and implementations are available for use in each one. Since this is an introduction we’re not going to study each abstraction in complete detail or do an exhaustive survey of every abstraction that Twisted offers. We’re just going to look at the most important pieces to get a better feel for how Twisted is put together. Once you become familiar with the overall style of Twisted’s architecture, learning new parts on your own will be much easier.

In general, each Twisted abstraction is concerned with one particular concept. For example, the 1.0 client from Part 4 uses IReadDescriptor, the abstraction of a “file descriptor you can read bytes from”. A Twisted abstraction is usually defined by an Interface specifying how an object embodying that abstraction should behave. The most important thing to keep in mind when learning a new Twisted abstraction is this:

Most higher-level abstractions in Twisted are built by using lower-level ones, not by replacing them.

So when you are learning a new Twisted abstraction, keep in mind both what it does and what it does not do. In particular, if some earlier abstraction A implements feature F, then F is probably not implemented by any other abstraction. Rather, if another abstraction B needs feature F, it will use A rather than implement F itself.  (In general, an implementation of B will either sub-class an implementation of A or refer to another object that implements A).

Networking is a complex subject, and thus Twisted contains lots of abstractions. By starting with lower levels first, we are hopefully getting a clearer picture of how they all get put together in a working Twisted program.

Loopiness in the Brain

The most important abstraction we have learned so far, indeed the most important abstraction in Twisted, is the reactor. At the center of every program built with Twisted, no matter how many layers that program might have, there is a reactor loop spinning around and making the whole thing go. Nothing else in Twisted provides the functionality the reactor offers. Much of the rest of Twisted, in fact, can be thought of as “stuff that makes it easier to do X using the reactor” where X might be “serve a web page” or “make a database query” or some other specific feature. Although it’s possible to stick with the lower-level APIs, like the client 1.0 does, we have to implement more things ourselves if we do. Moving to higher-level abstractions generally means writing less code (and letting Twisted handle the platform-dependent corner cases).

But when we’re working at the outer layers of Twisted it can be easy to forget the reactor is there. In any Twisted program of reasonable size, relatively few parts of our code will actually use the reactor APIs directly. The same is true for some of the other low-level abstractions. The file descriptor abstractions we used in client 1.0 are so thoroughly subsumed by higher-level concepts that they basically disappear in real Twisted programs (they are still used on the inside, we just don’t see them as such).

As far as the file descriptor abstractions go, that’s not really a problem. Letting Twisted handle the mechanics of asynchronous I/O frees us to concentrate on whatever problem we are trying to solve. But the reactor is different. It never really disappears. When you choose to use Twisted you are also choosing to use the Reactor Pattern, and that means programming in the “reactive style” using callbacks and cooperative multi-tasking. If you want to use Twisted correctly, you have to keep the reactor’s existence (and the way it works) in mind. We’ll have more to say about this in Part 6, but for now our message is this:

Figure 5 and Figure 6 are the most important diagrams in this introduction.

We’ll keep using diagrams to illustrate new concepts, but those two Figures are the ones that you need to burn into your brain, so to speak. Those are the pictures I constantly have in mind while writing programs with Twisted.

Before we dive into the code, there are three new abstractions to introduce: Transports, Protocols, and Protocol Factories.

Transports

The Transport abstraction is defined by ITransport in the main Twisted interfaces module. A Twisted Transport represents a single connection that can send and/or receive bytes. For our poetry clients, the Transports are abstracting TCP connections like the ones we have been making ourselves in earlier versions. But Twisted also supports I/O over UNIX Pipes and UDP sockets among other things. The Transport abstraction represents any such connection and handles the details of asynchronous I/O for whatever sort of connection it represents.

If you scan the methods defined for ITransport, you won’t find any for receiving data. That’s because Transports always handle the low-level details of reading data asynchronously from their connections, and give the data to us via callbacks. Along similar lines, the write-related methods of Transport objects may choose not to write the data immediately to avoid blocking. Telling a Transport to write some data means “send this data as soon as you can do so,  subject to the requirement to avoid blocking”. The data will be written in the order we provide it, of course.

We generally don’t implement our own Transport objects or create them in our code. Rather, we use the implementations that Twisted already provides and which are created for us when we tell the reactor to make a connection.

Protocols

Twisted Protocols are defined by IProtocol in the same interfaces module. As you might expect, Protocol objects implement protocols. That is to say, a particular implementation of a Twisted Protocol should implement one specific networking protocol, like FTP or IMAP or some nameless protocol we invent for our own purposes. Our poetry protocol, such as it is, simply sends all the bytes of the poem as soon as a connection is established, while the close of the connection signifies the end of the poem.

Strictly speaking, each instance of a Twisted Protocol object implements a protocol for one specific connection. So each connection our program makes (or, in the case of servers, accepts) will require one instance of a Protocol. This makes Protocol instances the natural place to store both the state of “stateful” protocols and the accumulated data of partially received messages (since we receive the bytes in arbitrary-sized chunks with asynchronous I/O).

So how do Protocol instances know what connection they are responsible for? If you look at the IProtocol definition, you will find a method called makeConnection. This method is a callback and Twisted code calls it with a Transport instance as the only argument. The Transport is the connection the Protocol is going to use.

Twisted includes a large number of ready-built Protocol implementations for various common protocols. You can find a few simpler ones in twisted.protocols.basic. It’s a good idea to check the Twisted sources before you write a new Protocol to see if there’s already an implementation you can use. But if there isn’t, it’s perfectly OK to implement your own, as we will do for our poetry clients.

Protocol Factories

So each connection needs its own Protocol and that Protocol might be an instance of a class we implement ourselves. Since we will let Twisted handle creating the connections, Twisted needs a way to make the appropriate Protocol “on demand” whenever a new connection is made. Making Protocol instances is the job of Protocol Factories.

As you’ve probably guessed, the Protocol Factory API is defined by IProtocolFactory, also in the interfaces module. Protocol Factories are an example of the Factory design pattern and they work in a straightforward way. The buildProtocol method is supposed to return a new Protocol instance each time it is called. This is the method that Twisted uses to make a new Protocol for each new connection.

Get Poetry 2.0: First Blood.0

Alright, let’s take a look at version 2.0 of the Twisted poetry client. The code is in twisted-client-2/get-poetry.py. You can run it just like the others and get similar output so I won’t bother posting output here. This is also the last version of the client that prints out task numbers as it receives bytes. By now it should be clear that all Twisted programs work by interleaving tasks and processing relatively small chunks of data at a time. We’ll still use print statements to show what is going on at key moments, but the clients won’t be quite as verbose in the future.

In client 2.0, sockets have disappeared. We don’t even import the socket module and we never refer to a socket object, or a file descriptor, in any way. Instead, we tell the reactor to make the connections to the poetry servers on our behalf like this:

factory = PoetryClientFactory(len(addresses))

from twisted.internet import reactor

for address in addresses:
    host, port = address
    reactor.connectTCP(host, port, factory)

The connectTCP method is the one to focus on. The first two arguments should be self-explanatory. The third is an instance of our PoetryClientFactory class. This is the Protocol Factory for poetry clients and passing it to the reactor allows Twisted to create instances of our PoetryProtocol on demand.

Notice that we are not implementing either the Factory or the Protocol from scratch, unlike the PoetrySocket objects in our previous client. Instead, we are sub-classing the base implementations that Twisted provides in twisted.internet.protocol. The primary Factory base class is twisted.internet.protocol.Factory, but we are using the ClientFactory sub-class which is specialized for clients (processes that make connections instead of listening for connections like a server).

We are also taking advantage of the fact that the Twisted Factory class implements buildProtocol for us. We call the base class implementation in our sub-class:

def buildProtocol(self, address):
    proto = ClientFactory.buildProtocol(self, address)
    proto.task_num = self.task_num
    self.task_num += 1
    return proto

How does the base class know what Protocol to build? Notice we are also setting the class attribute protocol on PoetryClientFactory:

class PoetryClientFactory(ClientFactory):

    task_num = 1

    protocol = PoetryProtocol # tell base class what proto to build

The base Factory class implements buildProtocol by instantiating the class we set on protocol (i.e., PoetryProtocol) and setting the factory attribute on that new instance to be a reference to its “parent” Factory. This is illustrated in Figure 8:

Figure 8: a Protocol is born
Figure 8: a Protocol is born

As we mentioned above, the factory attribute on Protocol objects allows Protocols created with the same Factory to share state. And since Factories are created by “user code”, that same attribute allows Protocol objects to communicate results back to the code that initiated the request in the first place, as we will see in Part 6.

Note that while the factory attribute on Protocols refers to an instance of a Protocol Factory, the protocol attribute on the Factory refers to the class of the Protocol. In general, a single Factory might create many Protocol instances.

The second stage of Protocol construction connects a Protocol with a Transport, using the makeConnection method. We don’t have to implement this method ourselves since the Twisted base class provides a default implementation. By default, makeConnection stores a reference to the Transport on the transport attribute and sets the connected attribute to a True value, as depicted in Figure 9:

Figure 9: a Protocol meets its Transport
Figure 9: a Protocol meets its Transport

Once initialized in this way, the Protocol can start performing its real job — translating a lower-level stream of data into a higher-level stream of protocol messages (and vice-versa for 2-way connections). The key method for processing incoming data is dataReceived, which our client implements like this:

def dataReceived(self, data):
    self.poem += data
    msg = 'Task %d: got %d bytes of poetry from %s'
    print  msg % (self.task_num, len(data), self.transport.getPeer())

Each time dataReceived is called we get a new sequence of bytes (data) in the form of a string. As always with asynchronous I/O, we don’t know how much data we are going to get so we have to buffer it until we receive a complete protocol message. In our case, the poem isn’t finished until the connection is closed, so we just keep adding the bytes to our .poem attribute.

Note we are using the getPeer method on our Transport to identify which server the data is coming from. We are only doing this to be consistent with earlier clients. Otherwise our code wouldn’t need to use the Transport explicitly at all, since we never send any data to the servers.

Let’s take a quick look at what’s going on when the dataReceived method is called. In the same directory as our 2.0 client, there is another client called twisted-client-2/get-poetry-stack.py. This is just like the 2.0 client except the dataReceived method has been changed like this:

def dataReceived(self, data):
    traceback.print_stack()
    os._exit(0)

With this change the program will print a stack trace and then quit the first time it receives some data. You could run this version like so:

python twisted-client-2/get-poetry-stack.py 10000

And you will get a stack trace like this:

File "twisted-client-2/get-poetry-stack.py", line 125, in
    poetry_main()

... # I removed a bunch of lines here

File ".../twisted/internet/tcp.py", line 463, in doRead  # Note the doRead callback
    return self.protocol.dataReceived(data)
File "twisted-client-2/get-poetry-stack.py", line 58, in dataReceived
    traceback.print_stack()

There’s the doRead callback we used in client 1.0! As we noted before, Twisted builds new abstractions by using the old ones, not by replacing them. So there is still an IReadDescriptor implementation hard at work, it’s just implemented by Twisted instead of our code. If you are curious, Twisted’s implementation is in twisted.internet.tcp. If you follow the code, you’ll find that the same object implements IWriteDescriptor and ITransport too. So the IReadDescriptor is actually the Transport object in disguise. We can visualize a dataReceived callback with Figure 10:

Figure 10: the dataReceived callback
Figure 10: the dataReceived callback

Once a poem has finished downloading, the PoetryProtocol object notifies its PoetryClientFactory:

def connectionLost(self, reason):
    self.poemReceived(self.poem)

def poemReceived(self, poem):
    self.factory.poem_finished(self.task_num, poem)

The connectionLost callback is invoked when the transport’s connection is closed. The reason argument is a twisted.python.failure.Failure object with additional information on whether the connection was closed cleanly or due to an error. Our client just ignores this value and assumes we received the entire poem.

The factory shuts down the reactor after all the poems are done. Once again we assume the only thing our program is doing is downloading poems, which makes PoetryClientFactory objects less reusable. We’ll fix that in the next Part, but notice how the poem_finished callback keeps track of the number of poems left to go:

...
    self.poetry_count -= 1

    if self.poetry_count == 0:
        ...

If we were writing a multi-threaded program where each poem was downloaded in a separate thread we would need to protect this section of code with a lock in case two or more threads invoked poem_finished at the same time. Otherwise we might end up shutting down the reactor twice (and getting a traceback for our troubles). But with a reactive system we needn’t bother. The reactor can only make one callback at a time, so this problem just can’t happen.

Our new client also handles a failure to connect with more grace than the 1.0 client. Here’s the callback on the PoetryClientFactory class which does the job:

def clientConnectionFailed(self, connector, reason):
    print 'Failed to connect to:', connector.getDestination()
    self.poem_finished()

Note the callback is on the factory, not on the protocol. Since a protocol is only created after a connection is made, the factory gets the news when a connection cannot be established.

A simpler client

Although our new client is pretty simple already, we can make it simpler if we dispense with the task numbers. The client should really be about the poetry, after all. There is a simplified 2.1 version in twisted-client-2/get-poetry-simple.py.

Wrapping Up

Client 2.0 uses Twisted abstractions that should be familiar to any Twisted hacker. And if all we wanted was a command-line client that printed out some poetry and then quit, we could even stop here and call our program done. But if we wanted some re-usable code, some code that we could embed in a larger program that needs to download some poetry but also do other things, then we still have some work to do. In Part 6 we’ll take a first stab at it.

Suggested Exercises

  1. Use callLater to make the client timeout if a poem hasn’t finished after a given interval. Use the loseConnection method on the transport to close the connection on a timeout, and don’t forget to cancel the timeout if the poem finishes on time.
  2. Use the stacktrace method to analyze the callback sequence that occurs when connectionLost is invoked.

52 thoughts on “An Introduction to Asynchronous Programming and Twisted”

  1. Excellent! There is no excuse now for people interested in asynchronous programming to understand and accept Twisted as *the* framework in Python.

    This really helps visual learners too. Thank you very much.

  2. As a programmer which is new to Twisted I must say: Great tutorial!

    I’m looking forwards for part 6 AND at least one part explaining deferres!

  3. Great introduction to Twisted! Thank you very much for writing these articles up.

    As for exercise #1 in part 6, Twisted is returning a reason of type ConnectionDone in all cases, i.e. the connection was done cleanly. Am I doing something wrong?
    According to the docs, I should expect a ConnectionLost reason.

    http://pastebin.com/m564634f0

    To terminate the servers, I tried sending the interrupt and quit signals, as well as kill -9

    I am running Twisted 9.0.0 on Mac OS X 10.6 and system python 2.6.1

    1. Hey Olivier, you’re on the right track, it’s just that making tcp connections fail is harder than I realized. Since the OS will close the connection when the process exits, it still gets closed cleanly on the client side. You’d probably need to use two servers and then physically disconnect the cable to get a failed connection, and even then you would need to wait for the tcp connection to timeout.

      I’m going to take out this exercise, it’s not so good for a tutorial, I think. One thing, though, is when you are checking the class of ‘reason': the ‘reason’ argument is a Failure, not the exception itself. Look at the ‘check’ method on Failure objects to see how you can test what type of exception it is wrapping.

      Glad you like the introduction, and thanks for helping me debug it :)

  4. Thanks for a great introduction! My head is bleeding slightly from trying to learn Python, Twisted and RabbitMQ all at the same time, but your tutorials have made a lot more sense than the Twisted docs.

  5. Hey, these tutorials are super helpful, thanks so much for writing them. My pathetic contribution is to alert you of a typo: “At the center of every program built with Twisted, no matter how may” should be “…how many”.

  6. OMG, I love your tutorial. Thanks to you, I lost my fear and browse the source code to learn more about Twisted. Thank you very much!

  7. First of all, thank you for this great work-tutorial.
    One small question, near the end of this part you write that ‘Since a protocol is only created after a connection is made, the factory gets the news when a connection cannot be established.’, but earlier you write that a protocol’s construction is made by the actions in figures 8 & 9, that have the reverse ‘chronological’ order (protocol is created firstly and after that it is connected with a transport object). So which order is the right?
    Thank you,
    George

    1. Hey George, the connection (Transport) is created before the Protocol object. The diagrams don’t really show the Transport being created at all, that is done by the connectTCP call. If I have some time, I’ll update them to make that clearer.

  8. You can probably tell how quickly I’m going through these :p. Ebullient thanks in order, yet again Dave.

    I got stuck on that point that George brought up, but like Pingu said, I lost my fear of the source code and dug in. For anyone suffering the same fate: I discovered that the Transport gets created before the protocol, but the protocol gets a None placeholder for its ‘protocol.transport’ attribute until makeConnection is called. Overriding this method lets you access the transport before the connection is started. See ***Spoiler*** http://pastebin.com/fv3yzzQu

    1. Nicely done! You might also check out the connectionMade method on protocol objects, which
      gets called right after the protocol has been hooked up to the transport.

  9. WOW! Learnt more here that I even did from the 2 booked on twisted network programming. Question – I was working fine with the protocol.LineReceiver > datareceived callName (required)
    E-Mail (will not be published) (required) -back until my client message exceeded the TCP frame size. Now, I get only partial messages from the client, is there a standard protocol to manage this or do I have to write my own protocol? Thanks mate.
    STAN
    Australia

    1. Hey Stan, glad you like the series. For the LineReceiver protocol you’ll want to
      use the lineReceived method (I may not have spelled it quite right) instead of
      the dataReceived method. The dataReceived method is the ‘raw’ stream of bytes
      which can come in arbitrary size chunks. The lineReceived method is specific to
      the LineReceiver and it is called for each complete line you get. Make sense?

  10. Great tutorials! I´m learning so much! :)

    I have one correction though. The method used to identify which server the data is coming from is getPeer(), while getHost() is for the local side of the connection.

  11. I too am thrilled at finding such an awesome course on using twisted – complete with proper diagrams and starting from the very beginning.

    One thing that I noticed when reading the twisted sources was that some of the classes are new-style classes and some are not. Newcomers to python may have learned to call the base class methods with super (classname, self).__the_superclass_method__(). Some of the classes in twisted (esp. the lower-level, longer-established classes) are calling the base class methods directly like baseclass.__the_superclass_method__() (if I’m understanding correctly).

    Looking forward to reading the rest …

    1. Hi Brenda, thanks for the kind words! Since Twisted has been around for a number
      of years now, it probably contains a mix of Python styles as Python has evolved over
      time. I imagine the core developers plan on eventually moving everything to the new
      style, but they take backwards compatibility very seriously (thankfully) so the changes
      tend to happen gradually.

  12. Hi Dave!

    I need to make a server with custom protocol, which is extremely simple but also should be very reliable. All I need are prefix messages with size and following normal messages, two-way communication. So I’ve got a question. Is there any way to set data size limt in dataReceived method of the Protocol class? As far as I understood, this method just returns all the data that was passed to the socket. But what if someone throws huge files with malicious aims?

    1. Hey Alex, the protocol you describe sounds very much like netstrings. Twisted already includes a built-in Protocol class called NetstringReceiver (or something similar) which allows you to set a length limit. If the limit is passed, the Protocol will close the connection. Even if you decide not to use the built-in protocol, it is a nice, short example of how to do the kind of limiting you want.

  13. Hi Dave!
    I’ve already looked into the implementation of netstring provided by twisted, but it is just a usual class inherited from Protocol. Therefore, it is just kind of wrapper over dataReceived method. Nevertheless I decided to do some tests on this method, so I wrote a test server (based on Protocol class) and sent 3 Gb file there. It appeared that twisted automatically limits size of data which can be received with dataReceived method at one call. The size of data pieces were around 30 kb (but every time size was different though). So I don’t need to worry about this problem any more and just control the overall size inside dataReceived method.
    Thanks for immediate reply and willing for help, Dave :)

    1. Cool, not quite finished, right? For example, you seem to be calling os._exit, is that for debugging?
      Also, looks like you might have left the parentheses off the call to self.success…

    1. Glad you are planning on doing that, it is a great way to contribute.
      I would ask on the Twisted mailing list, the core folks would have the
      best ideas about what Twisted could use right now.

  14. Here is my solution for exercise one :
    http://pastebin.com/Y7NYG7Ls

    I have also some more questions focused on python rather on twisted itself.

    Class attributes vs instance attributes:

    Why in PoetryProtocol class there is class attribute “poem” ?
    Do all protocols need to share that ?

    Plus every Protcol, which inherits from BaseProtocol has class variable “transport”, I don’t get it.
    If transport is a connection then why all protocols need to share one connection?
    I am pretty sure I misunderstood something here.

    Please if you could explain it.

    1. So this is a little bit of Python trickery. When you access an attribute on a Python object and the object does not have that attribute, Python will look for the attribute on the class and, if it exists, return that instead. So far so good. But if you set an attribute on a Python object then Python will always set that attribute on the object itself, even if the attribute doesn’t currently exist on the object but it does exist on the class. Thus, Python programmers will often use class attributes as “default” values for object attributes, knowing that whenever they set a non-default value, it will be set on the object and thereafter override the value on the class.

      1. Wow thanks for your quick answer!

        I get it now.

        Btw. it’s really rare to find such a good tutorial ( good, because of figures, code examples,not only text and chance to ask the author a question ).

        Best regards

          1. By the way I have question related to Twisted, but not exactly with your tutorial. However since you are experienced Twisted programmer I thought you might know. I am currently working on a network game ( in browser ) and I plan to use Twisted + Javascript, but I miss one thing – HTML templating, it is such a hassle to write html output in a form of a python string and return that in GET/POST handler. Do you maybe know any HTML templating solution for Twisted? I’ve done a small research and I found something called Nevow, which is already dead and not supported, and also a combination of Twisted + Flask/Django for templating, but I wonder if that will be truely asynchronous..
            I know this is not directly related to the topic, but I couldn’t find any contact information to send you that as private message.

          2. You can use pretty much any templating system you want to with
            Twisted Web, you just call the templating engine once you’ve got
            all the data you need back from your async requests.

      2. Hi Dave,
        thanks for the detailed and well to read tutorial first!
        I have a question regarding Python. I’d like to know what the benefit is of using a class attribute like:

        class PoetryProtocol(Protocol):
        poem = ”

        Instead of placing it in the __init__ method as an object attribute.

        class PoetryProtocol(Protocol):
        def __init__(self):
        self.poem = ”

        Is it just that I can then access it in the class object too?

    1. Thanks very much! I seem to recall some recent posts on the Twisted mailing list about people using Twisted for lower-level networking, but I don’t recall the details. I think it’s a safe bet the majority of Protocol implementations are at the application layer, with Transports abstracting away the rest.

  15. Hi Dave, like other already said, great tutorial, thank you for it. One question regarding this part.
    I would like to know if I good understand the sequence of execution:
    – all is started in line “reactor.connectTCP(host, port, factory)”
    – twisted is responsible for all execution (it knows when to create Protocol, start connection, readData etc). So, it knows which part from our implementation need to use and when.

    Is above correct or I miss something? I’m new in Python and it is hard to me to understand what exactly “connectTCP” do.

    1. Hi Jacek, I think you have a good sense of it. The call to reactor.connectTCP tells Twisted to make a TCP connection to the host and port and use the factory to make a protocol if the connection attempt succeeds. The call to reactor.run hands control to the Twisted event loop so all that can happen.

Leave a Reply