Hmm.. A thought... you need to fetch a bunch of URLs in parallell, instead of using one thread per URL, have you considered doing a handler of some kind that sets up and runs select on all sockets fetching pages? /Andy / Gabriel Ambuehl <gabriel_ambuehl at buz.ch> wrote: | Hello Courageous, | | Sunday, April 15, 2001, 10:26:23 PM, you wrote: | > Having completed both cores, and with the C++ core HIGHLY OPTIMIZED, | > I was finally able to perform a performance test of the the C++ | system | > versus the Python system. To my surprise, the C++ core only beat | Python | > by about 30%. Given the obvious inequities in coding time in both | efforts, | > plus whatever future coding time inequities I might project onto | users of | > either core by implication of the programming language, I was quite | > surprised by these results. | | This is very interesting. I've got to implement a server resource | monitoring system and had a shot at it in my beloved Python. While | Python's | threading obviously works (something I can't really say about C++ as | it appears to be not very well thought the whole stuff), | I found it to be very slow. I'm now thinking about | whether I should try to reimplement the whole url stuff in C (being | C/C++ novice) to see whether this would speed up the whole process (or | is there any C implementation of an httplib for Python that works with | it's threading?). The major PITA I continually stumbling across is | the fact that I need to have concurrent service checks, so a single | threaded app with a large queue as scheduling mechanism isn't of much | use. I've been thinking about a fork() based solution (AFAIK this is | what NetSaint is doing) but the reporting of the results isn't doable | in any halfway reliable or elegant way and it obviously requires way | more resources than a threaded app. The original idea was to have a | constantly running thread for every resource to monitor (which can get | kinda | problematical ram usage wise in very big networks but this isn't my | problem | just now as I can throw upto 1GB RAM on this even for a few number of | hosts[2]). which then schedules itself using sleep(). This appears to | be | working perfectly but slow in Python and not at all (due to libcurl[3] | related crashes) in C/C++. | | Ideally, I'd want to implement the whole stuff in C++ (or probably | some wild mix of C and C++, which generally works pretty ok) with | existing | libraries but obviously nobody thought about giving the threading | stuff some flag that would take care of the data (so that pointers | can't get fucked by non thread safe libs while something other is | executed) | and I clearly lack the programming experience to do such a complicated | task myself (I think it would be possible but I've some worries about | the performance penalties this could cause). | | But your report is pretty encouraging to try it again in Python with | an httplib implemented in C (as said, any pointers to such a beast | would be appreciated). | | Given that I might decide to use libcurl (http://curl.haxx.se) as a | starting point | (which doesn't appear to be threadsafe at all to me, even if some | other people | state it is for their apps [1]) what does Python do with non thread | safe | modules in a threaded app? Crash? Do some magic to get the data | consistent | before switching threads? Not defined? Never tested? ANY comment | (preferably from people who know) on this topic as well as on the | stability of the threading stuff (I sometimes had strange crashes | during the loading of the program but once it was running, it kept | running) would be greatly appreciated. | | | | | Best regards, | Gabriel | | [1] Everything is fine as long as I don't try to do concurrent fetches | which I desperately need. | | [2] Python did some two hundred concurrent threads with about 30 MB | RAM usage on FreeBSD which would be very nice if I could only get | CPU utilization way down. | | [3] Pointers to any thread safe HTTP or even better HTTP and HTTPS | libs are very welcome. Preferably code that isn't GPL'd so I can use | it in a closed source project (but I'd be willing to deal with the | author of good lib to get a license for this).
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4