The following is the beginnings of a PEP for a new memory model for Python. It currently contains only the motivation section and a description of a preliminary design. I'm submitting the PEP in its current form to get a feel for whether or not I should pursue this proposal and to find out if I am overlooking any details that would make it incompatible with Python's core implementation, i.e. implementing it would cause too much of an affect on Python's performance. I do plan to implement something along these lines, but may have to change my approach if I hear comments about this PEP to the contrary. Cheers, Paul PEP: XXX Title: A New Memory Management Model for Python Version: $Revision: 1.3 $ Last-Modified: $Date: 2001/08/20 23:59:26 $ Author: barrett@stsci.edu (Paul Barrett) Status: Draft Type: Standards Track Created: 05-Sep-2001 Python-Version: 2.3 Post-History: Replaces: PEP 42 Abstract This PEP proposes a new memory management model to provide better support for the various types of memory found in modern operating systems. The proposed model separates the memory object from its access method. In simplest terms, memory objects only allocate memory, while access objects only provide access to that memory. This separation allows various types of memory to share a common interface or access object and vice versa. Motivation There are three sequence objects which share similar interfaces, but have different intended uses. The first is the indispensable 'string' object. A 'string' is an immutable sequence of characters and supports slicing, indexing, concatenation, replication, and related string-type operations. The second is the 'array' object. Like a 'list', it is a mutable sequence and supports slicing, indexing, concatenation, and replication, but its values are constrained to one of several basic types, namely characters, integers, and floating point numbers. This constraint enables efficient storage of the values. The third object is the 'buffer' which behaves similar to a string object at the Python programming level: it supports slicing, indexing, concatenation, and related string-like operations. However, its data can come from either a block of memory or an object that exports the buffer interface, such as 'mmap', the memory-mapped file object which is its prime justification. Each object has been used at one time or other as a way of allocating read-write memory from the heap. The 'string' object is often used at the C programming level because it is a standard Python object, but its use goes counter to its intended behavior of being immutable. The preferred way of allocating such memory is the 'array' object, but its insistence on returning a representation of itself for both the 'repr' and 'str' methods makes it cumbersome to use. In addition, the use of a 'string' as an initializer during 'array' creation is inefficient, because the memory is temporarily allocated twice, once for the 'string' and once for the 'array'. This is particularly onerous when allocating tens of megabytes of memory. The 'buffer' object also has its problems, some of which have been discussed on python-dev. Some of the more important ones are: (1) the 'buffer' object always returns a read-only 'buffer', even for read-write objects. This is apparently a bug in the 'buffer' object, which is fixable. (2) The buffer API provides no guarantee about the lifetime of the base pointer - even if the 'buffer' object holds a reference to the base object, since there is no locking mechanism associated with the base pointer. For example, if the initial 'buffer' is deleted, the memory pointer of the derived 'buffer' will refer to freed memory. This situation happens most often at the C programming level as in the following situation: PyObject *base = PyBuffer_New(100); PyObject *buffer = PyBuffer_FromObject(base); Py_DECREF(base); This problem is also fixable. And (3) the 'buffer' object cannot easily be used to allocate read-write memory at the Python programming level. The obvious approach is to use a 'string' as the base object of the 'buffer'. Yet, a 'string' is immutable which means the 'buffer' object derived from it is also immutable, even if problem (1) is fixed. The only alternative at the Python programming level is to use the cumbersome 'array' object or to create your own version of the 'buffer' object to allocate a block of memory. We feel that the solution to these and other problems is best illustrated by problem (3), which can essentially be described as the simple operation of allocating a block of read-write memory from the heap. Python currently provides no standard way of doing this. It is instead done by subterfuge at the C programming level using the 'string', 'array', or 'buffer' APIs. A solution to this specific problem is to include a 'malloc' object as part of standard Python. This object will be used to allocate a block of memory from the heap and the 'buffer' object will be use to access this memory just as it is used to access data from a memory-mapped file. Yet, this hints at a more general solution, the creation of two classes of objects, one for memory-allocation, and one for memory-access. The Model We propose a new memory-management model for Python which separates the allocation object from its access method. This mix-and-match memory model will enable various access objects, such as 'array', 'string', and 'file', to access easily the data from different types of memory, namely heap, shared, and memory-mapped files; or in other words, different types of memory can share a common interface (see figure below). It will also provide better support for the various types of memory found in modern operating systems. |---------------------------------------------------| | interface layer | | ----------------------------------------------- | | array | string | file | ... | |===================================================| | data layer | | ----------------------------------------------- | | heap memory | shared memory | memory mapped file | |---------------------------------------------------| Memory Objects Modern operating systems, such as Unix and Windows, provide access to several different types of memory, namely heap, shared, and memory-mapped files. These memory types share two common attributes, a pointer to the memory and the size of the memory. This information is usually sufficient for objects whose data uses heap memory, since the object is expected to have sole control over that memory throughout the lifetime of the object. For objects whose data also uses shared and memory-mapped files, an additional attribute is necessary for access permission. However, the issue of how to handle memory persistence across processes does not appear well-defined in modern OSs, but appears to be left to the programmer to implement. In any case, a fourth attribute to handle memory persistence seems imperative. Access Objects Consider 'array', 'buffer', and 'string' objects. Each provides, more or less, the same string-like interface to its underlying data. They each support slicing, indexing, concatenation, and replication of the data. They differ primarily in the types of initializing data and the permissions associated with the underlying data. Currently, the 'array' initializer accepts only 'list' and 'string' objects. If this was extended to include objects that support the 'buffer interface', then the distinction between the 'array' and 'buffer' objects would disappear, since they both support the sequence interface and the same set of base objects. The 'buffer' object is therefore redundant and no longer necessary. The 'string' and 'array' objects would still be distinct, since the 'array' object encompasses more data-types than does the 'string' object. The 'array' object is also mutable requiring its underlying data to be read-write, while the 'string' object is immutable requiring read-only data. This new memory-management model therefore suggests that the 'string' object support the 'buffer interface' with the proviso that the data have read-only permission. Implementation References Copyright This document has been placed in the public domain. -- Paul Barrett, PhD Space Telescope Science Institute Phone: 410-338-4475 ESS/Science Software Group FAX: 410-338-4767 Baltimore, MD 21218
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4