--- In python-list at y..., Gabriel Ambuehl <gabriel_ambuehl at b...> wrote: > Sunday, April 15, 2001, 10:26:23 PM, you wrote: >> Having completed both cores, and with the C++ core HIGHLY >> OPTIMIZED, I was finally able to perform a performance test of the >> the C++ system versus the Python system. To my surprise, the C++ >> core only beat Python by about 30%. Given the obvious inequities in >> coding time in both efforts, plus whatever future coding time >> inequities I might project onto users of either core by implication >> of the programming language, I was quite surprised by these >> results. Programs which are I/O bound, either because they talk to other slow programs, or because they do very little processing themselves, will likely perform similarly in C and Python. The Python version will use more memory. For example, at eGroups.com, Sam Rushing wrote an outgoing mailsender called "Newman", completely in Python and his Medusa async i/o framework. It performs exceedingly well and is only about 8000 lines of Python. If rewritten in C, it would use less memory, and probably perform slightly better. > This is very interesting. I've got to implement a server resource > monitoring system and had a shot at it in my beloved Python. While > Python's threading obviously works (something I can't really say > about C++ as it appears to be not very well thought the whole > stuff), I found it to be very slow. You can see some performance comparisons for operations common to scripting languages at my ScriptPerf page: http://www.chat.net/~jeske/Projects/ScriptPerf/ I'd say that C++ will perform much better threaded than Python, but you have to be the smart one doing the locking in C++, wheras Python helps you out a little bit. > I'm now thinking about whether I should try to reimplement the whole > url stuff in C (being C/C++ novice) to see whether this would speed > up the whole process (or is there any C implementation of an httplib > for Python that works with it's threading?). Last time I used httplib, it was terribly slow for two reasons. First, it was calling write() on the socket for each piece of the HTTP header. I made it build a string in memory and then only do one write() and it resulted in a major speed increase. Second, it does DNS lookups each time you call it. Adding a small DNS cache will get you another big speed win. > The major PITA I continually stumbling across is the fact that I > need to have concurrent service checks, so a single threaded app > with a large queue as scheduling mechanism isn't of much use. Python threading has never performed very well for me. Usually, this is because it's using Pthreads, and you may be using a user-space implementation of Pthreads. There are usually ways to get around single points of contention by just allocating your units of work in larger blocks. I recommend making a non-threaded test-harness and running the Python profiler on it. (after you fix httplib) > I've been thinking about a fork() based solution (AFAIK this is what > NetSaint is doing) but the reporting of the results isn't doable in > any halfway reliable or elegant way and it obviously requires way > more resources than a threaded app. Sure, you can report results. Just open pipes back to the main process, and when a child dies, read results off the pipe. If you have lots of results you might need to make the main process async/non-block and read results continuously. You can even use Python marshall to hand back complex data-types. Going multi-process does not have to mean using lots more resources. In Linux, a thread is pretty close to a process. If you load up all the code and then fork(), you'll have something which is pretty damn close to the efficiency of threading, without the locking overhead. > The original idea was to have a constantly running thread for every > resource to monitor (which can get kinda problematical ram usage > wise in very big networks but this isn't my problem just now as I > can throw upto 1GB RAM on this even for a few number of > hosts[2]). which then schedules itself using sleep(). This appears > to be working perfectly but slow in Python and not at all (due to > libcurl[3] related crashes) in C/C++. Sounds like you should look at the co-routine based version of the Medusa async-i/o library. It's basically select() based cooperative multitasking. If you go the next step and use Stackless python, you can really cut down on your memory usage. Generally I wouldn't suggest having hundreds of concurrent threads, even if you were writing your software in C. Just use async I/o with a few worker threads. > Ideally, I'd want to implement the whole stuff in C++ (or probably > some wild mix of C and C++, which generally works pretty ok) with > existing libraries but obviously nobody thought about giving the > threading stuff some flag that would take care of the data (so that > pointers can't get fucked by non thread safe libs while something > other is executed) and I clearly lack the programming experience to > do such a complicated task myself (I think it would be possible but > I've some worries about the performance penalties this could cause). You certainly should learn how to keep data safe in a threaded environment before you do threaded programming in C/C++. > [2] Python did some two hundred concurrent threads with about 30 MB > RAM usage on FreeBSD which would be very nice if I could only get > CPU utilization way down. Try: 1) optimizing httplib as I mentioned 2) don't spawn hundreds of threads, build an async I/O select loop (possibly with Medusa), and use a small number of worker threads to handle data. 3) run a python profile of your code in a single-threaded test harness -- David Jeske (N9LCA) + http://www.chat.net/~jeske/ + jeske at chat.net
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4