The company I work for has over the last couple of years created an application server for use in most of our customer projects. It embeds Python and most project code is written in Python by now. It is quite resource-hungry (several GB of RAM, MySQL databases of 50-100GB). And of course it is multi-threaded and, at least originally, we hoped to make it utilize multiple processor cores. Which, as we all know, doesn't sit very well with Python. Our application runs heavy background calculations most of the time (in Python) and has to service multiple (few) GUI clients at the same time, also using Python. The problem was that a single background thread would increase the response time of the client threads by a factor of 10 or (usually) more. This led me to add a dirty hack to the Python core to make it switch threads more frequently. While this hack greatly improved response time for the GUI clients, it also slowed down the background threads quite a bit. top would often show significantly less CPU usage -- 80% instead of the more usual 100%. The problem with thread switching in Python is that the global semaphore used for the GIL is regularly released and immediately reacquired. Unfortunately, most of the time this leads to the very same thread winning the race on the semaphore again and thus more wait time for the other threads. This is where my dirty patch intervened and just did a nanosleep() for a short amount of time (I used 1000 nsecs). I have then created a better scheduling scheme and written a small test program that nicely mimics what Python does for some statistics. I call the scheduling algorithm the round-robin semaphore because threads can now run in a more or less round-robin fashion. Actually, it's just a semaphore with FIFO semantics. The implementation problem with the round-robin semaphore is the __thread variable I had to use because I did not want to change the signature of the Enter() and Leave() methods. For CPython, I have replaced this thread-local allocation with an additional field in the PyThreadState. Because of that, the patch for CPython I have already created is a bit more involved than the simple nanosleep() hack. Consequently, it's not very polished yet and not at all as portable as the rest of the Python core. I now show you the results from the test program which compares all three scheduling mechanisms -- standard python, my dirty hack and the new round-robin semaphore. I also show you the test program containing the three implementations nicely encapsulated. The program was run on a quad-core Xeon 1.86 GHz on Fedora 5 x86_64. The first three lines from the output (including the name of the algorithm) should be self-explanatory. The fourth and the fifth show a distribution of wait times for the individual threads. The ideal distribution would be everything on the number of threads (2 in this case) and zero everywhere else. As you can see, the round-robin semaphore is pretty close to that. Also, because of the high thread switching frequency, we could lower Python's checkinterval -- the jury is still out on the actual value, likely something between 1000 and 10000. I can post my Python patch if there is enough interest. Thanks for your attention. -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: results.txt Url: http://mail.python.org/pipermail/python-dev/attachments/20080318/4b3d5d53/attachment.txt -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: p.cpp Url: http://mail.python.org/pipermail/python-dev/attachments/20080318/4b3d5d53/attachment-0001.txt
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4