Twisted Introduction

hi dave,

I’m Jayson Pryde, and I’m new to twisted.
I’ve been learning Twisted via your awesome tutorial, and I am already in part 13.
I tried to implement my own client-server system, but I am encountering some hangs in the client.
It’s probably because I messed up with callbacks in the deferred.
I’ve asked this via this link in stackoverflow:

http://stackoverflow.com/questions/42418021/asynchronous-client-in-twisted-not-sending-receiving-request-using-netstringr

It would be greatly appreciated if you can find time and take a look, and help me point out where I messed up.
So what happened in the client is I am already able to send a request, but it seems that the callback to process the request is not fired up.

Hope you can find time. thanks a lot in advance!

Hi Jayson, do you have the code in a source code repository like github? It would be much easier to take a look and help you in that case.

Hi Dave,

Thanks for the reply.
Here’s the github link of my code:
https://github.com/jaysonpryde/Twisted-Example

Thanks a lot again!

Cool, I just sent you a pull request.

Thanks a lot dave. I already merged it.
Before I received your reply, this was somehow the same thing I did to make it working.
I remove the other factory/class to make it work. But still, my question is, how come your examples work (even if they are already old) when I run it in my machine? And also, I’m still quite confused with the deferred returning a deferred.

Thanks a lot again Dave! 🙂

I’m not sure what you are asking — my examples work because the Twisted project has been very good about maintaining backwards compatibility. The code you posted didn’t work because of bugs I tried to explain in the pull request. I recommend reading the source code for Deferred itself and reworking the examples in the relevant chapters of the tutorial. Learning a new kind of programming takes time, you have to go slowly in the beginning.

Hi dave,

Thank you for your nice introduction to asynchronous programming, and I’m reading chapter 4 now. I think the philosophy of asynch programming is what I need for my project so I wish to ask you few questions.

Let me explain myself first, currently I’m working on an automated web-site testing project. I decide use selenium + bs4 + unittest approach. The website I’m testing heavily relay on ajax and iframe, so I wish to develop someway better than traditional unittest case, which is separate interaction(I/O) and content assertion(RAM)

My original idea is have a main thread do all navigation and interaction, and use bs4 to parse the DOM, and then create children to assert those parsed DOM. However, after read your blog, I realize I don’t really need threads to do it, I think asynchronous way suits my purpose too.

So my question is, is my understanding of asynchronous programming suits my purpose correct? is this a possible solution for automated web testing? or is there anyone else already did/doing something similar? (since I really can’t find anything similar)

Thank You
Jack

Hey Jack, thanks for the kind words! If I’m understanding you, I think Twisted could be used in lieu of threads. I wonder, though, if you need threads or async at all? Is it not possible to do the interaction and then parse the result in the same thread? Both threaded and async code are more complex than vanilla single-threading and for functional testing the performance of a single, non-async thread may be acceptable.

Hi Dave, thank you for your answer!

Since I’m using selenium to simulate user visiting website using browser, it actually involves heavy I/O operation: fetching pages. In fact, since the data in that website is huge and queries are not that optimized, some page may take more than 30s to load.

This is why I’m looking for a method to separate the I/O operation from DOM assertion. You are right about threads, I think I don’t really need thread programming. But if my code can run DOM assertion while I/O fetching next page would be nice

Furthermore, it’s also about code organization. Since the website currently is under developing, although main frame remain same, the content on single page might change frequently, that’s why I wish to keep them separate, for better organization and change

Thank You
Jack

Hey sir i am new in learning of twisted and i learn many thing but i am confuse at one point of my script i am making script where user just need to give the number of URL of any website then script will be extract data from the website like Title, Description and Links which are on the website and after getting links it will also extract these three details from the extracted link and so on untill the link not completed (Similar to Sitemap Crawler). this is an idea.

I write this code using Twisted Python but it complete only one cycle after that reactor stop can you please help and that how i can run the Twsited addcallback function into the loop according to certain condition did not complete thank you

Please look at the code. Thank you

2:)Extracting Code From Website

def parseHtml(html):
print(“\t\t\t———————-“)
print(“\t\t\t Requesting Website “)
print(“\t\t\t———————-“)
# Create this Defered Variable for returning the Function Values to another function
defered = defer.Deferred()

# Now Calling another function which will be extract 3 thing from the Response(Website Code) # reactor.callLater (Received 3 parameters 1: Delay time, Return Type, 3:parameter mean Data reactor.callLater(3, defered.callback, html) # defered will be return the Value to another function return defered pass

3:)Extracting Title,Description,Links(URL) from Source Code

def ExtractingData(response, url, crawling, Webname, tocrawl):
crawled = set([])
WebKeyword = []
WebTitle = []
WebDescription = []
keywordregex = re.compile(‘<meta\sname=[“\’]keywords[“\’]\scontent=“\’[“\’]\s/>’)
linkregex = re.compile(‘<a\shref=\’|”[\'”].?>’)
print(“\t\t\t——————————–“)
print(“\t\t\t Extracting Data from Website “)
print(“\t\t\t——————————–“)
msg = response
startPos = msg.find(”)
print(“Start position:{}”.format(startPos))
if startPos != -1:
endPos = msg.find(”, startPos + 7)
if endPos != -1:
title = msg[startPos + 7:endPos]
print(“Title:{}”.format(title))
WebTitle.append(title)
else:
WebTitle.append(“N/A”)
pass
pass
# Getting Description from the Website
Soup = BeautifulSoup(msg, ‘html.parser’)
Desc = Soup.findAll(attrs={“name”: “description”})
if len(Desc) <= 0:
print(“N/A”)
WebDescription.append(“N/A”)
else:
print(“Description:{}”.format(Desc[0][‘content’].encode(‘utf-8’)))
WebDescription.append(Desc[0][‘content’].encode(‘utf-8’))
pass
keywordlist = keywordregex.findall(msg)
if len(keywordlist) > 0:
keywordlist = keywordlist[0]
keywordlist = keywordlist.split(“, “)
print (“Keyword:{}”.format(keywordlist))
WebKeyword.append(keywordlist)
else:
WebKeyword.append(“N/A”)
pass
links = linkregex.findall(msg)
# print(“Links:{}”.format(links))
crawled.add(crawling)
for link in (links.pop(0) for _ in xrange(len(links))):
if link.startswith(‘/’):
link = ‘http://’ + url[1] + link
elif link.startswith(‘#’):
link = ‘http://’ + url[1] + url[2] + link
elif not link.startswith(‘http’):
link = ‘http://’ + url[1] + ‘/’ + link
pass
if link not in crawled:
if Webname[0] in link:
print(“Link:{}”.format(link))
tocrawl.add(link)
pass
pass
pass
print(“Crawled URL:{}”.format(len(tocrawl)))
defers=defer.Deferred()
defers.callback(None)
return
pass

1:)Sitemap Crawler

def SitemapCrawler(Link):
tocrawl = {Link}
Webname = str(Link).replace(“http://”, “”).split(“.”)
Iterator = False
while Iterator is not True:
try:
crawling = tocrawl.pop()
except:
Iterator = True
pass
url = urlparse.urlparse(crawling)
#downloadPage(crawling,”NewFile.txt”)
print(“Calling:{}”.format(crawling))
d = getPage(crawling)
# Extracting Code or Website Response from Pages
d.addCallback(parseHtml)
# Calling Finish/Stop Function at the END of Processes
d.addCallback(ExtractingData, url=url, crawling=crawling, Webname=Webname, tocrawl=tocrawl)
reactor.run()
pass
# StoringIntoDatabase(WebTitle, WebKeyword, crawledList, WebDescription)
pass

4:)Finish Process

def finishingProcess():
print(“\t\t\t Stopping the Process……..”)
# 3 is the delay time which will be stop all work without this we cannot stop the work
reactor.callLater(5, reactor.stop)
pass

Hi there, it’s a bit difficult to read this code as it is not formatted. Do you have it in source control, say GitHub, where it would be a lot easier to read and comment upon?

Hi Dave,

My English is not very well, so I hope you can understand what I want to tell you.

I come from Taiwan, and I’m learning Python. I read your “Twisted Introduction”, it’s really helpful for me. I know someone already had translated into Simplified Chinese, but some of the content is not very …correct.

So I have re-translated it to Traditional Chinese, also modified your sample code make they run in Python 3.I want to put re-translated articles and modified code on my blog and GitHub. But I want to get your permission first.

Your articles are very helpful to me, so I want to share them with others people who want to learn Twisted. I hope I can get your permission.

Thank you.
Shan-Ho Chan

Hi Shan-Ho, I’m very glad you’ve found my articles helpful and you definitely have my permissions to translate and re-post them, thank you.

Thank you for doing such a service for mankind man. 4 birds with one stone python/sockets/twisted/async_coding. Reaching out from Turkey.

So glad you liked it!

Very good.
Verificado em Putas Rio Claro SP, Rio Claro – SP

Hi Dave,
Everyone seems to point to this as the definitive Twisted starting point, and it was very helpful up through part three, but after that I can’t get any of the twisted clients to work. I’m sure something was changed in Twisted, but I’m not sure what. Do you have any ideas what might be broken, or do you know of any other good tutorials using a more recent version of Twisted?

PS: I’m using Python 3.7.5 with Twisted 19.10.0

There have been some updates to the tutorial code in GitHub to support Python 3. Try that version.

Dave, you did awesome job, thanks you very much for sharing!

wish you all the best,
Roman

Share this:

129 replies on “Twisted Introduction”

Leave a ReplyCancel reply