RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://lwn.net/Articles/941090/ below:

A per-interpreter GIL [LWN.net]

LWN.net needs you!
Without subscribers, LWN would simply not exist. Please consider signing up for a subscription and helping to keep LWN publishing.

"Subinterpreters", which are separate Python interpreters running in the same process that can be created using the C API, have been a part of Python since the previous century (version 1.5 in 1997), but they are largely unknown and unused. Eric Snow has been on something of a quest, since 2015 or so, to bring better multicore processing to Python by way of subinterpreters (or "multiple interpreters"). He has made it part of the way there, with the adoption of a separate global interpreter lock (GIL) for each subinterpreter, which was added for Python 3.12. Back in April, Snow gave a talk (YouTube video) at PyCon about multiple interpreters, their status, and his plans for the feature in the future.

We have looked in on the subinterpreter work a few times along the way; beyond the article when Snow started his quest, he presented a status update at the 2018 Python Language Summit and there were some further discussions back in 2020. On April 7, 2023, two weeks before Snow gave his talk, the steering council accepted PEP 684 ("A Per-Interpreter GIL") for inclusion into Python 3.12.

Background

Snow began by defining "two key concepts: the GIL and multiple interpreters". The GIL is used by the Python runtime to "protect a lot of important global state". On the positive side, the GIL helps keep the Python implementation simpler than it would be otherwise, which is important for a project where most of its contributors are volunteers, he said. On the negative side, the GIL negatively impacts the performance of threaded, CPU-bound Python code. The GIL is not perfect, but it is not quite as big of a problem for most people as it is made out to be; that is not particularly obvious, however, so Python "ends up getting a lot of bad press" about the GIL.

To understand "subinterpreters", there is a need to know what he means by "interpreter". There are multiple meanings for it, depending on context, such as the python executable, the full runtime including the standard library and the user's code, or the bytecode virtual machine (VM). For his talk, he would be using it to mean the Python VM and the run-time state to support it; it may be helpful to think of it as the "interpreter state", but it is often called a "subinterpreter" as well, he said.

In practice, subinterpreters are "like a cross between using threads and processes"; Nick Coghlan once described them as "threads with opt-in sharing", Snow said. He continued: "In a way, using multiple interpreters takes all the painful parts out of using threads." CPython has long supported multiple interpreter states in a single process, running at the same time in different threads, but it is not a well-known feature.

Attendees may or may not have noticed, but there has been "quite a bit of negativity for a long time around the GIL and Python's support for multiple cores" in the tech world, he said. Back in 2014, he talked himself into doing something, he was not sure what, to change that narrative. Quoting from his 2015 python-ideas post on multiple interpreters with their own GILs, he said, "I knew it would have to make Python's multicore support 'obvious, unmistakable, and undeniable'". Multiple interpreters was exactly what was needed to do so, he believes, but he did not realize "it would turn into a project that would consume a lot of my spare time over the next eight-and-a-half years".

If separate subinterpreters no longer shared a single GIL, that would mean that each thread in a process could run an interpreter with its own GIL. The threads would no longer block each other in the Python interpreter. Each thread could truly run in parallel with the others on multiple cores in the system.

CPython was written with the idea that every interpreter's execution was isolated and independent from any other's. In theory, that meant that the interpreter state would be isolated; "in practice, that was far from the truth". Since the subinterpreter feature was so little-known, thousands of global variables holding run-time state had been added to CPython by the time he got started on the project. The shared GIL protected all of that state; in order to stop sharing the GIL, the interpreter would "have to mostly stop sharing any global state".

Plan

So he set out on a plan to isolate interpreters from each other; this amounted to thousands of small, relatively uncomplicated tasks to isolate the state. The best part was that the isolation work was valuable even if it turned out that not sharing the GIL was too difficult to do—or did not provide the multicore benefits that he expected. That outcome was, he said, a real possibility when he started out, but he was successful in following the plan fully. It remains to be seen, though, what impact it has on the multicore parallelism story for Python.

"We'll have a per-interpreter GIL in Python 3.12", Snow said, to applause. It has been a long project for him, with several pauses, lots of collaboration, some obstacles, and "occasional burnout"; it required eight PEPs, three of which he authored. Most of the work has been on his own time, though he is grateful to his employer Microsoft, which has given him 20% of his time to work on open source for the last five years—and full time on open source starting earlier in 2023. In addition, many members of the Python community, Victor Stinner in particular, have provided help, support, encouragement, and more.

All of that work would be for naught if no one takes advantage of it, he said. So he wanted to ensure that he provided attendees with enough information to do so. Before that, however, he wanted to show "why you should care".

There are four different mechanisms for Python concurrency: threads, multiple processes, distributed processing, and async. David Beazley gave an excellent talk about those mechanisms at PyCon 2015, Snow said; it had a live demo ("of course") that built up a simple network server that computed requested Fibonacci numbers using several of the concurrency techniques.

Snow adapted the threaded version of Beazley's server in order to add multiple interpreters into the mix. He ran those programs on his laptop and said that he was excited to show the results, which can be seen in his slides as well as in the video. For long-running requests (fib(30)), the numbers for multiple interpreters are slightly better than those for multiple processes. Unlike with threads, though, a server running with multiple interpreters or processes does not have its performance degrade when adding a second or third client. Multiple interpreters took 0.9 seconds for one, two, or three clients; because of the GIL, threads took 0.9, 1.8, and 2.5s for those three scenarios (processes were 0.9, 1.0, and 1.1s).

Looking at the requests-per-second (rps) numbers for short-running requests (fib(1)) on multiple interpreters shows only a modest drop-off when adding clients, 13,100 for one down to 11,800 for three. The threaded version dropped off more quickly (13,000 to 8,100). In addition, requesting a single fib(40) caused the threads to go to 100 rps, while multiple interpreters was still maintaining 11,500 rps. "A long-running request does not starve the short-running ones; it's like magic", Snow said.

He put up a slide with both tables of results; "isn't that beautiful?", he asked, to another round of applause. He put up some graphs of the results, which is "exactly what I had hoped for all these years". While he expected results to be similar to what he got, "actually seeing them just blows my mind".

For those who want to try this out for themselves, he pointed to a branch that he was maintaining. But at this point, the first beta of Python 3.12 was released in May, a month after his talk, and contains the per-interpreter GIL work. Since then, several other releases have been made, including the rc1 release on August 6. There is no tutorial on using the feature available, yet, but that should be coming, he said. He also could not resist displaying the numbers one more time, to chuckles and laughter.

Currently, multiple interpreters can only be created and used from the C API; there is no Python interface to the feature. In 2017, he proposed adding a standard library module to expose multiple interpreters in the language in order to fix that lack. PEP 554 ("Multiple Interpreters in the Stdlib") would start to provide access to multiple interpreters from Python code. The Python API would be basic at first; for example, the only way to run code would be to pass it as a string, as with exec().

PEP 554 outlines many additional features that could be added to the interpreters module once the basic features are in. The implementation of the PEP was completed a few years ago, he said, though there were some minor tweaks needed along the way. Because it was strongly tied to the per-interpreter GIL feature in people's minds, PEP 554 was not considered for adoption independently. When it looked likely that the per-interpreter GIL work would land in Python 3.12, the final discussion on PEP 554 did not complete in time to have that PEP considered for the release. So there will be a per-interpreter GIL soon, but no easy way to access multiple interpreters from Python code until 3.13 in 2024.

He has released an interpreters module on PyPI that can be used to access the feature in the meantime. He said that others are welcome to create their own; it is pretty straightforward to do so as there are only three C API calls that need to be used.

Examples

He then presented some examples of using multiple interpreters in Python code; for the most part, the examples would follow the API proposed in PEP 554. Creating an interpreter is pretty straightforward:

    interp = interpreters.create()

That call actually does quite a bit of work; it creates the new interpreter state, populates the

builtins

and

sys

modules, imports "a bunch of other modules", and initializes the interpreter's

__main__

module. Interpreter creation is not super cheap, Snow said, but in practice it should not matter to most users. On the debug build he was using for testing, it took around 35ms to create an interpreter; destroying one took around 5ms.

He was curious about how many interpreters he could create. On his laptop with 9GB of RAM, he ran out of memory after creating around 5,000 interpreters. That suggests that each takes 1.5MB or so. He is pretty sure that can be improved.

He showed four ways to execute the canonical "hello world" program in the current Python: the REPL, the command line (i.e. python -c), from a file, and via exec() on a string argument. With each of those, the interpreter gets the source, compiles it, then uses the bytecode interpreter to execute it using the __dict__ of the __main__ module as the execution environment. The exact same thing happens with:

    script = '''
    print('Hello world!')
    '''

    interp.run(script)

The difference is that the execution is done in the execution environment of

interp

; it uses the

__main__.__dict__

from there instead of the one in the interpreter where the

interp.run()

call is being made.

Starting a new interpreter in its own thread is not difficult either, he said. The usual way to start a thread looks something like:

    def task():
        print('Hello world!')

    t = Thread(target = task)
    t.start()

    # alternatively
    t = Thread(target = exec, args = (script,))
    t.start()

The second version (which uses

script

from above) can be modified to run the interpreter in its own thread by simply switching the

target

    t = Thread(target = interp.run, args = (script,))
    t.start()

It may make sense to create the interpreter in the thread, thus in

task()

, for some applications.

Meanwhile, an interpreter can be used multiple times, just by calling its run() method. Importantly, the execution environment does not get reset between runs, so the state of the interpreter is preserved:

    >>> interp.run('answer = 42')
    >>> interp.run('print(answer)')
    42

One consequence is that it will be possible to create interpreters ahead of time, pre-populating them by importing needed modules, thus paying any import price early. Interpreter pools could be created, with interpreters that would get assigned to work as it become available.

Communication

Multiple interpreters are a powerful feature, he said, "but that power is limited if we don't have an easy way to send data to an interpreter or to get data back". Because of the isolation between these interpreters, you cannot simply send and receive objects between them—"at least for now". Data can be serialized and deserialized for communication. One simple way to do that is by using an f-string for the string to be passed:

    data1, data2 = 42, 'spam'

    interp.run(f'''call_something({data1})
    call_something_else({data2!r})
    ''')

Reading and writing data from an interpreter could be done using os.pipe(); the interpreters are all in the same process, so the file descriptor numbers can simply be passed as integers. "However, we can certainly be more efficient than using pipes." The pickle module could be used as an alternative to just passing bytes around; it can serialize and deserialize most objects, but is, again, not all that efficient.

At one point, PEP 554 proposed adding a mechanism to efficiently pass objects between interpreters, which is based on the idea of communicating sequential processes (CSP). That mechanism was called "channels" after its counterpart in the Go language. He implemented it "and it works"; he thinks it makes for an elegant solution to the problem, even if his implementation was lacking in some ways.

His point was not that channels should necessarily be added to the standard library, but that it is not too hard to implement ways to send data efficiently rather than using pipes or pickle. The current state is that sending messages or objects to other interpreters is not all that efficient, which is not something that will change for Python 3.12. He is hopeful that an efficient and easy-to-use solution will be adopted in the next year for inclusion into Python 3.13. In the meantime, the gap may be filled by PyPI packages; "that depends on a lot of you".

Beyond just passing data back and forth, there is also value in sharing objects directly; this would be particularly useful for applications that use large data structures, such as massive arrays or in-memory databases. He is hopeful that the community will help fill that gap and that there will be some sharing-safe data types available for 3.13. In theory, large, read-only arrays should be safe to share between interpreters due to the design of the buffer protocol; he prototyped support for that at one point and it seemed to work, though he did not do a lot of testing.

The difficult part of any kind of concurrency is "how to safely share data and to do so efficiently". In a free-threaded interpreter, all data in the process is potentially shared, which means that all of it must be protected from simultaneous access. In contrast, any data sharing must be explicit for isolated interpreters, so the protection only needs to be applied to those objects or types. The narrow focus of explicit sharing "helps reduce potential development and maintenance costs for everyone ... and that's a critical distinction".

He thinks that multiple interpreters are now ready to take their place among the Python concurrency options. The feature can be used to implement the "relatively ancient and still relatively popular idea of isolated threads of execution that pass data back and forth explicitly". It is, effectively, the actor model using CSP; two languages that natively support this kind of concurrency are Go and Erlang. He looks forward to seeing what the community will do with multiple interpreters in the coming years.

Plans

Snow ended his talk with a look at his plans, which start with getting PEP 554 adopted for Python 3.13. It just provides the foundation, though, so there will be more work needed after that. Some work on more efficient ways to pass data is needed, as is work on ways for sharing data between interpreters. Another area he would like to work on is the ability to pass callables, instead of only strings, to the run() method. There has not been much work done on improving the performance of creating and destroying interpreters, which is another thing he plans to look into.

He was asked about global state in extension modules. Snow said that extension modules that want to support multiple interpreters need to implement PEP 489 ("Multi-phase extension module initialization"); there is some excellent documentation describing how to do so, he said. Part of the work to switch to multi-phase initialization will move any global state to module-specific state so that the extension can operate with multiple interpreters.

Snow gave his talk when the status of PEP 703 ("Making the Global Interpreter Lock Optional in CPython") was not yet known. It will provide a free-threaded interpreter, though possibly at the cost of some single-threaded performance. PEP 703, which also requires extensions to implement multi-phase initialization for obvious reasons, was not reintroduced until May; it was not until the end of July that the steering council announced its intent to approve the PEP after several lengthy threads along the way.

Now, the two features will coexist, eventually—or at least that is the plan. The existence of a free-threaded Python may dampen some of the enthusiasm around multiple interpreters, though the explicit-sharing model definitely does have its appeal. In any case, the Python concurrency picture has gotten rather larger over the past few months; it will be interesting to see where it all goes.

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4