-----BEGIN PGP SIGNED MESSAGE----- Hello David, Sunday, April 22, 2001, 7:52:52 PM, you wrote: > Programs which are I/O bound, either because they talk to other slow > programs, or because they do very little processing themselves, will > likely perform similarly in C and Python. That's what I thought. As long as you aren't a good programmer able to play with poll()/select() while a sleep, the whole network programming, if it comes down to speed, is horrible to do. > more memory. For example, at eGroups.com, Sam Rushing wrote an > outgoing mailsender called "Newman", completely in Python and his > Medusa async i/o framework. It performs exceedingly well and is only > about 8000 lines of Python. If rewritten in C, it would use less > memory, and probably perform slightly better. This is about as specialized as the thing I want to implement. Basically, I'm out to rewrite netsaint ;-) but I want to be several times faster (especially HTTP). > I'd say that C++ will perform much better threaded than Python, but > you have to be the smart one doing the locking in C++, wheras Python > helps you out a little bit. Threading in either C or C++ is a major PITA. Since the compiler doesn't take complete care of the memory management, you can do serious fuck ups with threads (and as we all know, buffer overflows are common even in normal apps). > Last time I used httplib, it was terribly slow for two reasons. First, > it was calling write() on the socket for each piece of the HTTP > header. I made it build a string in memory and then only do one > write() and it resulted in a major speed increase. Second, it does DNS > lookups each time you call it. Adding a small DNS cache will get you > another big speed win. I just figured that Python, even if using threads, won't do more than one urllib query at once since even when having 1500 concurrent threads, the CPU usage wasn't significantly higher than with one and the Apache log showed a comparable rate of request. > Python threading has never performed very well for me. Usually, this > is because it's using Pthreads, and you may be using a user-space > implementation of Pthreads. Yup. FreeBSD does Pthreads in userspace (which can be fucking slow even when used from C) but also offers kthreads. I just couldn't yet figure out how to use those. > There are usually ways to get around single points of contention > by just allocating your units of work in larger blocks. Normally, this is quite true. However, if you want to continously monitor some network resource, you can't do this in any other way than just connect do it and perform your test. There's not even any chance to do tests at once... > I recommend making a non-threaded test-harness and running the Python > profiler on it. (after you fix httplib) As said, I'm beginning to doubt that the http stuff is even threaded... I'll have to look at Doug's asynchttp.py as soon as I find the time. > Going multi-process does not have to mean using lots more > resources. In Linux, a thread is pretty close to a process. If you > load up all the code and then fork(), you'll have something which is > pretty damn close to the efficiency of threading, without the locking > overhead. This depends a bit. As long as you go on the CPU usage, it isn't such a problem to work with fork(), trouble starts when you go with high number of concurrent, blocking processes (which don't much CPU during select()) and you run out of RAM and the system begins to swap. Python's threading takes care of the RAM usage but sacrifices enormous amounts of CPU time. > Generally I wouldn't suggest having hundreds of concurrent threads, > even if you were writing your software in C. Just use async I/o with a > few worker threads. Have yet to dig into the async stuff. Stevens Unix Network Programming vol1 lays right beside me... >> do such a complicated task myself (I think it would be possible but >> I've some worries about the performance penalties this could cause). > You certainly should learn how to keep data safe in a threaded > environment before you do threaded programming in C/C++. ACK. >> [2] Python did some two hundred concurrent threads with about 30 MB >> RAM usage on FreeBSD which would be very nice if I could only get >> CPU utilization way down. > Try: > 1) optimizing httplib as I mentioned I think I can built upon Doug's work here. But I'm still not entirely sure whether I should use Python for this. C would give lighter weigh code since about every system out there has got libc... > 2) don't spawn hundreds of threads, build an async I/O select loop > (possibly with Medusa), and use a small number of worker threads > to handle data. s.a. > 3) run a python profile of your code in a single-threaded test harness ACK. Thanks for your tips, anyway. Best regards, Gabriel -----BEGIN PGP SIGNATURE----- Version: PGP 6.0.2i iQEVAwUBOuMYsMZa2WpymlDxAQHq8wgAxjdUvtOZSmpNdKQL4CUvl46/GX3Iu5+O Gv8jOHvGIncqM3tRcesCRsrj43d7nTEokxHxRjBcfJ24VCmTL5XH5nMXftQmUbhn DcD8XsowUsKrrFxP3L3TjmZPuUkiVvQusWYKk0UsC4pQBKhy1MrTOIGlQshFk7Xg jdTEQsq90FZHCxFCC/wDZDR4qlJ+QlSxc8wmcXMUaoH/cEKOfPo9lrr+wxr62j53 9kVHINj75JpNiEZgtQ/rHSJNDAZ7M1SdDhD4WpiXC8kEqPtephg7YJug3uuqcC8r o06akIS9O9StKvHP//yHLK4Y3E3IEuPD1N2+svnjwqmXjSqB8C2kiQ== =QFAL -----END PGP SIGNATURE-----
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4