Part 21: Lazy is as Lazy Doesn’t: Twisted and Haskell
In the last Part we compared Twisted with Erlang, giving most of our attention to some ideas they have in common. And that ended up being pretty simple, as asynchronous I/O and reactive programming are key components of the Erlang runtime and process model.
Today we are going to range further afield and look at Haskell, another functional language that is nevertheless quite different from Erlang (and, of course, Python). There won’t be as many parallels, but we will nevertheless find some asynchronous I/O hiding under the covers.
Functional with a Capital F
Although Erlang is also a functional language, its main focus is a reliable concurrency model. Haskell, on the other hand, is functional through and through, making unabashed use of concepts from category theory like functors and monads.
Don’t worry, we’re not going into any of that here (as if we could). Instead we’ll focus on one of Haskell’s more traditionally functional features: laziness. Like many functional languages (but unlike Erlang), Haskell supports lazy evaluation. In a lazily evaluated language the text of a program doesn’t so much describe how to compute something as what to compute. The details of actually performing the computation are generally left to the compiler and runtime system.
And, more to the point, as a lazily-evaluated computation proceeds the runtime may evaluate expressions only partially (lazily) instead of all at once. In general, the runtime will evaluate only as much of an expression as is needed to make progress on the current computation.
Here is a simple Haskell statement applying
head, a function that retrieves the first element of a list, to the list
[1,2,3] (Haskell and Python share some of their list syntax):
If you install the GHC Haskell runtime, you can try this out yourself:
[~] ghci GHCi, version 6.12.1: http://www.haskell.org/ghc/ : ? for help Loading package ghc-prim ... linking ... done. Loading package integer-gmp ... linking ... done. Loading package base ... linking ... done. Prelude> head [1,2,3] 1 Prelude>
The result is the number 1, as expected.
The Haskell list syntax includes the handy ability to define a list from its first couple of elements. For example, the list
[2,4 ..] is the sequence of even numbers starting with 2. Where does it end? Well, it doesn’t. The Haskell list [2,4 ..] and others like it represent (conceptually) infinite lists. You can see this if you try to evaluate one at the interactive Haskell prompt, which will attempt to print out the result of your expression:
Prelude> [2,4 ..] [2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110,112,114,116,118,120,122,124,126,128,130,132,134,136,138,140,142,144,146, ...
You'll have to press Ctrl-C to stop that computation as it will never actually terminate. But because of lazy evaluation, it is possible to use these infinite lists in Haskell with no trouble:
Prelude> head [2,4 ..] 2 Prelude> head (tail [2,4 ..]) 4 Prelude> head (tail (tail [2,4 ..])) 6
Here we are accessing the first, second, and third elements of this infinite list respectively, with no infinite loop anywhere in sight. This is the essence of lazy evaluation. Instead of first evaluating the entire list (which would cause an infinite loop) and then giving that list to the
head function, the Haskell runtime only constructs as much of the list as it needs for
head to finish its work. The rest of the list is never constructed at all, because it is not needed to proceed with the computation.
When we bring the
tail function into play, Haskell is forced to construct the list further, but again only as much as it needs to evaluate the next step of the computation. And once the computation is done, the (unfinished) list can be discarded.
Here’s some Haskell code that partially consumes three different infinite lists:
Prelude> let x = [1..] Prelude> let y = [2,4 ..] Prelude> let z = [3,6 ..] Prelude> head (tail (tail (zip3 x y z))) (3,6,9)
Here we zip all the lists together, then grab the head of the tail of the tail. Once again, Haskell has no trouble with this and only constructs as much of each list as it needs to finish evaluating our code. We can visualize the Haskell runtime “consuming” these infinite lists in Figure 46:
Although we’ve drawn the Haskell runtime as a simple loop, it might be implemented with multiple threads (and probably is if you are using the GHC version of Haskell). But the main point to notice is how this figure looks like a reactor loop consuming bits of data as they come in on network sockets.
You can think of asynchronous I/O and the reactor pattern as a very limited form of lazy evaluation. The asynchronous I/O motto is: “Only process as much data as you have”. And the lazy evaluation motto is: “Only process as much data as you need”. Furthermore, a lazily-evaluated language applies that motto almost everywhere, not just in the limited scope of I/O.
But the point is that, for a lazily-evaluated language, making use of asynchronous I/O is no big deal. The compiler and runtime are already designed to process data structures bit by bit, so lazily processing the incoming chunks of an I/O stream is just par for the course. And thus the Haskell runtime, like the Erlang runtime, simply incorporates asynchronous I/O as part of its socket abstractions. And we can show that by implementing a poetry client in Haskell.
Our first Haskell poetry client is located in haskell-client-1/get-poetry.hs. As with Erlang, we’re going to jump straight to a finished client, and then suggest further reading if you’d like to learn more.
Haskell also supports light-weight threads or processes, though they aren’t as central to Haskell as they are to Erlang, and our Haskell client creates one process for each poem we want to download. The key function there is
runTask which connects to a socket and starts the
getPoetry function in a light-weight thread.
You’ll notice a lot of type declarations in this code. Haskell, unlike Python or Erlang, is statically typed. We don’t declare types for each and every variable because Haskell will automatically infer types not explicitly declared (or report an error if it can’t). A number of the functions include the
IO type (technically a monad) because Haskell requires us to cleanly separate code with side-effects (i.e., code that performs I/O) from pure functions.
getPoetry function includes this line:
poem <- hGetContents h
which appears to be reading the entire poem from the handle (i.e., the TCP socket) at once. But Haskell, as usual, is lazy. And the Haskell runtime includes one or more actual threads which perform asynchronous I/O in a
select loop, thus preserving the possibilities for lazy evaluation of I/O streams.
Just to illustrate that asynchronous I/O is really going on, we have included a “callback” function,
gotLine, that prints out some task information for each line in the poem. But it’s not really a callback function at all, and the program would use asynchronous I/O whether we included it or not. Even calling it “gotLine” reflects an imperative-language mindset that is out of place in a Haskell program. No matter, we’ll clean it up in a bit, but let’s take our first Haskell client out for a spin. Start up some slow poetry servers:
python blocking-server/slowpoetry.py --port 10001 poetry/fascination.txt python blocking-server/slowpoetry.py --port 10002 poetry/science.txt python blocking-server/slowpoetry.py --port 10003 poetry/ecstasy.txt --num-bytes 30
Now compile the Haskell client:
cd haskell-client-1/ ghc --make get-poetry.hs
This will create a binary called
get-poetry. Finally, run the client against our servers:
./get-poetry 10001 10002 1000
And you should see some output like this:
Task 3: got 12 bytes of poetry from localhost:10003 Task 3: got 1 bytes of poetry from localhost:10003 Task 3: got 30 bytes of poetry from localhost:10003 Task 2: got 20 bytes of poetry from localhost:10002 Task 3: got 44 bytes of poetry from localhost:10003 Task 2: got 1 bytes of poetry from localhost:10002 Task 3: got 29 bytes of poetry from localhost:10003 Task 1: got 36 bytes of poetry from localhost:10001 Task 1: got 1 bytes of poetry from localhost:10001 ...
The output is slightly different than previous asynchronous clients because we are printing one line for each line of poetry instead of each arbitrary chunk of data. But, as you can see, the client is clearly processing data from all the servers together, instead of one after the other. You’ll also notice that the client prints out the first poem as soon as it’s finished, without waiting for the others, which continue on at their own pace.
Alright, let’s clean the remaining bits of imperative cruft from our client and present a version which just grabs the poetry without bothering with task numbers. You can find it in haskell-client-2/get-poetry.hs. Notice that it’s much shorter and, for each server, just connects to the socket, grabs all the data, and sends it back.
Ok, let’s compile a new client:
cd haskell-client-2/ ghc --make get-poetry.hs
And run it against the same set of poetry servers:
./get-poetry 10001 10002 10003
And you should see the text of each poem appear, eventually, on the screen.
You will notice from the server output that each server is sending data to the client simultaneously. What’s more, the client prints out each line of the first poem as soon as possible, without waiting for the rest of the poem, even while it’s working on the other two. And then it quickly prints out the second poem, which it has been accumulating all along.
And all of that happens without us having to do much of anything. There are no callbacks, no messages being passed back and forth, just a concise description of what we want the program to do, and very little in the way of how it should go about doing it. The rest is taken care of by the Haskell compiler and runtime. Nifty.
Discussion and Further Reading
In moving from Twisted to Erlang to Haskell we can see a parallel movement, from the foreground to the background, of the ideas behind asynchronous programming. In Twisted, asynchronous programming is the central motivating idea behind Twisted’s existence. And Twisted’s implementation as a framework separate from Python (and Python’s lack of core asynchronous abstractions like lightweight threads) keeps the asynchronous model front and center when you write programs using Twisted.
In Erlang, asynchronicity is still very visible to the programmer, but the details are now part of the fabric of the language and runtime system, enabling an abstraction in which asynchronous messages are exchanged between synchronous processes.
And finally, in Haskell, asynchronous I/O is just another technique inside the runtime, largely unseen by the programmer, for providing the lazy evaluation that is one of Haskell’s central ideas.
We don’t have any profound insight into this situation, we’re just pointing out the many and interesting places where the asynchronous model shows up, and the many different ways it can be expressed.
And if any of this has piqued your interest in Haskell, then we can recommend Real World Haskell to continue your studies. The book is a model of what a good language introduction should be. And while I haven’t read it, I’ve heard good things about Learn You a Haskell.
This brings us to the end of our tour of asynchronous systems outside of Twisted, and the penultimate part in our series. In Part 22 we will conclude, and suggest ways to learn more about Twisted.
Suggested Exercises for the Startlingly Motivated
- Compare the Twisted, Erlang, and Haskell clients with each other.
- Modify the Haskell clients to handle failures to connect to a poetry server so they download all the poetry they can and output reasonable error messages for the poems they can’t.
- Write Haskell versions of the poetry servers we made with Twisted.