On 2011-01-30 21:43, "Martin v. Löwis" wrote: > Am 30.01.2011 17:54, schrieb Alexander Belopolsky: >> On Sun, Jan 30, 2011 at 11:35 AM, Victor Stinner >> <victor.stinner at haypocalc.com> wrote: >> .. >>> We should find a compromise between speed (limit the number of system >>> calls) and the usability of Python modules. >> >> Do you have measurements that show python spending significant time on >> failing open calls? > > No; past measurements always showed that this is insignificant, probably > thanks to operating system caching the relevant directory blocks (so > it doesn't really matter whether you make one or ten lookups per > directory; my guess is that it matters more if you look into ten > directories instead of one). Dear Python-developers, I would like you to be aware of one particular problem related to the system calls in massively parallel systems. We are developing a Python-based simulation software GPAW (https://wiki.fysik.dtu.dk/gpaw/) and tested it with up to tens of thousands of CPU cores. The program uses MPI, thus thousands of Python interpreters are launched at start-up time. As all these interpreters execute the same import statements, the huge amount of (IO-related) system calls puts extreme pressure to the file system, and as result just starting the Python interpreter(s) can take ~45 minutes with ~30 000 CPU cores! Currently, we have tried to work around the problem either by installing Python and required additional modules (NumPy and GPAW) to a ramdisk, or by modifying the CPython source (at the moment 2.6 version) in such a way that only single process performs the system calls and uses MPI to broadcast the results to other processes (preliminary work in progress). As a related problem, dynamic linking can also be quite expensive (or even not available in some systems), and in some cases we have made a small hack to CPython for enabling statically linked packages (simple modules can of course be included relatively easily in static Python build.) I am not expecting that the problems can be solved easily for the general CPython interpreter, especially as massively parallel supercomputers are quite small niche of Python usage. However, I think it would be good to be aware of problems with large amount of system calls in a more special Python usage. Best regards, Jussi -- Jussi Enkovaara, Application Scientist, High Performance Computing, CSC PO. BOX 405 02101 Espoo, Finland, Tel +358 9 457 2935, fax +358 9 457 2302 CSC - IT Center for Science, www.csc.fi, e-mail: jussi.enkovaara at csc.fi
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4