Twisted Introduction

This multi-part series introduces Asynchronous Programming and the Twisted networking framework.

  1. In Which We Begin at the Beginning
  2. Slow Poetry and the Apocalypse
  3. Our Eye-beams Begin to Twist
  4. Twisted Poetry
  5. Twistier Poetry
  6. And Then We Took It Higher
  7. An Interlude, Deferred
  8. Deferred Poetry
  9. A Second Interlude, Deferred
  10. Poetry Transformed
  11. Your Poetry is Served
  12. A Poetry Transformation Server
  13. Deferred All The Way Down
  14. When a Deferred Isn’t
  15. Tested Poetry
  16. Twisted Daemonologie
  17. Just Another Way to Spell “Callback”
  18. Deferreds En Masse
  19. I Thought I Wanted It But I Changed My Mind
  20. Wheels within Wheels: Twisted and Erlang
  21. Lazy is as Lazy Doesn’t: Twisted and Haskell
  22. The End

This introduction has some translations in other languages:

121 thoughts on “Twisted Introduction”

  1. hi dave,

    I’m Jayson Pryde, and I’m new to twisted.
    I’ve been learning Twisted via your awesome tutorial, and I am already in part 13.
    I tried to implement my own client-server system, but I am encountering some hangs in the client.
    It’s probably because I messed up with callbacks in the deferred.
    I’ve asked this via this link in stackoverflow:

    http://stackoverflow.com/questions/42418021/asynchronous-client-in-twisted-not-sending-receiving-request-using-netstringr

    It would be greatly appreciated if you can find time and take a look, and help me point out where I messed up.
    So what happened in the client is I am already able to send a request, but it seems that the callback to process the request is not fired up.

    Hope you can find time. thanks a lot in advance!

          1. Thanks a lot dave. I already merged it.
            Before I received your reply, this was somehow the same thing I did to make it working.
            I remove the other factory/class to make it work. But still, my question is, how come your examples work (even if they are already old) when I run it in my machine? And also, I’m still quite confused with the deferred returning a deferred.

            Thanks a lot again Dave! 🙂

          2. I’m not sure what you are asking — my examples work because the Twisted project has been very good about maintaining backwards compatibility. The code you posted didn’t work because of bugs I tried to explain in the pull request. I recommend reading the source code for Deferred itself and reworking the examples in the relevant chapters of the tutorial. Learning a new kind of programming takes time, you have to go slowly in the beginning.

  2. Hi dave,

    Thank you for your nice introduction to asynchronous programming, and I’m reading chapter 4 now. I think the philosophy of asynch programming is what I need for my project so I wish to ask you few questions.

    Let me explain myself first, currently I’m working on an automated web-site testing project. I decide use selenium + bs4 + unittest approach. The website I’m testing heavily relay on ajax and iframe, so I wish to develop someway better than traditional unittest case, which is separate interaction(I/O) and content assertion(RAM)

    My original idea is have a main thread do all navigation and interaction, and use bs4 to parse the DOM, and then create children to assert those parsed DOM. However, after read your blog, I realize I don’t really need threads to do it, I think asynchronous way suits my purpose too.

    So my question is, is my understanding of asynchronous programming suits my purpose correct? is this a possible solution for automated web testing? or is there anyone else already did/doing something similar? (since I really can’t find anything similar)

    Thank You
    Jack

    1. Hey Jack, thanks for the kind words! If I’m understanding you, I think Twisted could be used in lieu of threads. I wonder, though, if you need threads or async at all? Is it not possible to do the interaction and then parse the result in the same thread? Both threaded and async code are more complex than vanilla single-threading and for functional testing the performance of a single, non-async thread may be acceptable.

  3. Hi Dave, thank you for your answer!

    Since I’m using selenium to simulate user visiting website using browser, it actually involves heavy I/O operation: fetching pages. In fact, since the data in that website is huge and queries are not that optimized, some page may take more than 30s to load.

    This is why I’m looking for a method to separate the I/O operation from DOM assertion. You are right about threads, I think I don’t really need thread programming. But if my code can run DOM assertion while I/O fetching next page would be nice

    Furthermore, it’s also about code organization. Since the website currently is under developing, although main frame remain same, the content on single page might change frequently, that’s why I wish to keep them separate, for better organization and change

    Thank You
    Jack

  4. Hey sir i am new in learning of twisted and i learn many thing but i am confuse at one point of my script i am making script where user just need to give the number of URL of any website then script will be extract data from the website like Title, Description and Links which are on the website and after getting links it will also extract these three details from the extracted link and so on untill the link not completed (Similar to Sitemap Crawler). this is an idea.

    I write this code using Twisted Python but it complete only one cycle after that reactor stop can you please help and that how i can run the Twsited addcallback function into the loop according to certain condition did not complete thank you

    Please look at the code. Thank you

    2:)Extracting Code From Website

    def parseHtml(html):
    print(“\t\t\t———————-“)
    print(“\t\t\t Requesting Website “)
    print(“\t\t\t———————-“)
    # Create this Defered Variable for returning the Function Values to another function
    defered = defer.Deferred()

    # Now Calling another function which will be extract 3 thing from the Response(Website Code)
    # reactor.callLater (Received 3 parameters 1: Delay time, Return Type, 3:parameter mean Data
    reactor.callLater(3, defered.callback, html)
    # defered will be return the Value to another function
    return defered
    pass

    3:)Extracting Title,Description,Links(URL) from Source Code

    def ExtractingData(response, url, crawling, Webname, tocrawl):
    crawled = set([])
    WebKeyword = []
    WebTitle = []
    WebDescription = []
    keywordregex = re.compile(‘<meta\sname=[“\’]keywords[“\’]\scontent=“\’[“\’]\s/>’)
    linkregex = re.compile(‘<a\shref=\’|”[\'”].?>’)
    print(“\t\t\t——————————–“)
    print(“\t\t\t Extracting Data from Website “)
    print(“\t\t\t——————————–“)
    msg = response
    startPos = msg.find(”)
    print(“Start position:{}”.format(startPos))
    if startPos != -1:
    endPos = msg.find(”, startPos + 7)
    if endPos != -1:
    title = msg[startPos + 7:endPos]
    print(“Title:{}”.format(title))
    WebTitle.append(title)
    else:
    WebTitle.append(“N/A”)
    pass
    pass
    # Getting Description from the Website
    Soup = BeautifulSoup(msg, ‘html.parser’)
    Desc = Soup.findAll(attrs={“name”: “description”})
    if len(Desc) <= 0:
    print(“N/A”)
    WebDescription.append(“N/A”)
    else:
    print(“Description:{}”.format(Desc[0][‘content’].encode(‘utf-8’)))
    WebDescription.append(Desc[0][‘content’].encode(‘utf-8’))
    pass
    keywordlist = keywordregex.findall(msg)
    if len(keywordlist) > 0:
    keywordlist = keywordlist[0]
    keywordlist = keywordlist.split(“, “)
    print (“Keyword:{}”.format(keywordlist))
    WebKeyword.append(keywordlist)
    else:
    WebKeyword.append(“N/A”)
    pass
    links = linkregex.findall(msg)
    # print(“Links:{}”.format(links))
    crawled.add(crawling)
    for link in (links.pop(0) for _ in xrange(len(links))):
    if link.startswith(‘/’):
    link = ‘http://’ + url[1] + link
    elif link.startswith(‘#’):
    link = ‘http://’ + url[1] + url[2] + link
    elif not link.startswith(‘http’):
    link = ‘http://’ + url[1] + ‘/’ + link
    pass
    if link not in crawled:
    if Webname[0] in link:
    print(“Link:{}”.format(link))
    tocrawl.add(link)
    pass
    pass
    pass
    print(“Crawled URL:{}”.format(len(tocrawl)))
    defers=defer.Deferred()
    defers.callback(None)
    return
    pass

    1:)Sitemap Crawler

    def SitemapCrawler(Link):
    tocrawl = {Link}
    Webname = str(Link).replace(“http://”, “”).split(“.”)
    Iterator = False
    while Iterator is not True:
    try:
    crawling = tocrawl.pop()
    except:
    Iterator = True
    pass
    url = urlparse.urlparse(crawling)
    #downloadPage(crawling,”NewFile.txt”)
    print(“Calling:{}”.format(crawling))
    d = getPage(crawling)
    # Extracting Code or Website Response from Pages
    d.addCallback(parseHtml)
    # Calling Finish/Stop Function at the END of Processes
    d.addCallback(ExtractingData, url=url, crawling=crawling, Webname=Webname, tocrawl=tocrawl)
    reactor.run()
    pass
    # StoringIntoDatabase(WebTitle, WebKeyword, crawledList, WebDescription)
    pass

    4:)Finish Process

    def finishingProcess():
    print(“\t\t\t Stopping the Process……..”)
    # 3 is the delay time which will be stop all work without this we cannot stop the work
    reactor.callLater(5, reactor.stop)
    pass

    1. Hi there, it’s a bit difficult to read this code as it is not formatted. Do you have it in source control, say GitHub, where it would be a lot easier to read and comment upon?

Leave a Reply