RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://speakerdeck.com/trent/parallelizing-the-python-interpreter-an-alternate-approach-to-async below:

Parallelizing the Python Interpreter: An Alternate Approach to Async

This is the short overview presentation I gave to Python developers at the PyCon 2013 Language Summit.

More Decks by Trent Nelson

Other Decks in Programming

Transcript

Background/Motivation •  Two parts: •  “Parallelizing” the interpreter •  Allowing
multiple threads to run CPython internals concurrently •  True “asynchronicity” •  Leverage the parallel facilities above to expose a “truly asynchronous” async API to Python code •  (Allow Python code to run concurrently within the same interpreter) •  I’ve had a vague idea for how to go about the parallel aspect for a while •  The async discussions on python-ideas last year provided the motivation to try and tie both things together •  Also figured it would be a great pet project to better familiarize myself with CPython internals
Details •  Have been working on it since late-December-ish (full
time until mid Feb-ish) •  It all lives in hg.python.org/sandbox/trent, px branch •  Include/object.h •  Include/pyparallel.h •  Python/pyparallel.c & pyparallel_private.h •  Vast majority of code •  The code quality is… ultra proof-of-concept/prototype/ hackfest •  Think of a how a homeless hoarder would treat his shopping cart •  Everything gets added, nothing gets removed •  ….so if you’re hoping to quickly grok how it all works by reviewing the code… you’re going to have a bad time. •  Only works on Vista+ (for now) •  Aim is to get everything working first, then refactor/review, review support for other platforms etc
How it’s exposed to Python: The Async Façade •  Submission
of arbitrary “work”: async.submit_work(func, args, kwds, callback=None, errback=None) •  Calls func(args, kwds) from a parallel thread* •  Submission of timers: async.submit_timer(time, func, args, kwds, …) async.submit_timer(interval, func, args, kwds, …) •  Calls func(args, kwds) from a parallel thread some ‘time’ in the future, or every interval. •  Submission of “waits”: async.submit_wait(obj, func, args, kwds, …) •  Calls func(args, kwds) from a parallel thread when ‘obj’ is signalled
The Async Façade (cont.) •  Asynchronous file I/O f =
async.open(filename, ‘wb’) async.write(f, buf, callback=…, errback=…) •  Asynchronous client/server services client = async.client(host, port) server = async.server(host, port) async.register(transport=client, protocol=…) async.register(transport=server, protocol=…) async.run() 
Things to note with the demo… •  Constant memory use
•  Only one python_d.exe process… •  ….exploiting all cores •  ....without any explicit “thread” usage (i.e. no threading.Thread instances) •  (Not that threading.Thread instances would automatically exploit all cores)
“If we’re a parallel thread, do X, if not, do
Y” •  “Thread-sensitive” calls are ubiquitous •  The challenge then becomes how quickly you can detect if we’re a parallel thread •  The quicker you can detect it, the less overhead incurred by the whole approach
Windows solution: the TEB #ifdef WITH_INTRINSICS # ifdef MS_WINDOWS #
include <intrin.h> # if defined(MS_WIN64) # pragma intrinsic(__readgsdword) # define _Py_get_current_process_id() (__readgsdword(0x40)) # define _Py_get_current_thread_id() (__readgsdword(0x48)) # elif defined(MS_WIN32) # pragma intrinsic(__readfsdword) # define _Py_get_current_process_id() (__readfsdword(0x20)) # define _Py_get_current_thread_id() (__readfsdword(0x24))
“Send Complete”: Clarification •  Called when a send() (WSASend()) call
completes (either synchronously or asynchronously) •  What it doesn’t mean: •  The other side definitely got it (they could have closed the connection) •  What it does mean: •  All the data you tried to send successfully became bytes on a wire •  Send socket buffer is empty •  What it implies: •  You’re free to send more data if you’ve got it. •  Why it’s useful: •  Eliminates the need for a producer/consumer relationship •  i.e. pause_producing()/stop_consuming() •  No need to buffer anything internally
Things to note: def chargen def chargen(lineno, nchars=72): start =
ord(' ') end = ord('~') c = lineno + start while c > end: c = (c % end) + start b = bytearray(nchars) for i in range(0, nchars-2): if c > end: c = start b[i] = c c += 1 b[nchars-1] = ord('\n') return b •  No blocking calls •  Will happily consume all CPU when called in a loop
Things to note: class Chargen •  No explicit send() • 
“Sending” is achieved by returning a “sendable” object •  PyBytesObject •  PyByteArray •  PyUnicode •  Callable PyObject that returns one of the above import async class Chargen: def initial_bytes_to_send(self): return chargen(0) def send_complete(self, transport, send_id): return chargen(send_id) server = async.server('10.211.55.3’, 20019) async.register(transport=server, protocol=Chargen) async.run()
Things to note: class Chargen •  Next “send” initiated from
send_complete() •  ….which generates another send_complete() •  ….and so on •  It’s an “IO hog” •  Always wants to send something when given the chance import async class Chargen: def initial_bytes_to_send(self): return chargen(0) def send_complete(self, transport, send_id): return chargen(send_id) server = async.server('10.211.55.3’, 20019) async.register(transport=server, protocol=Chargen) async.run()
Why chargen is such a good demo •  You’re only
sending 73 bytes at a time •  The CPU time required to generate those 73 bytes is not negligible (compared to the cost of sending 73 bytes) •  Good simulator of real world conditions, where the CPU time to process a client request would dwarf the IO overhead communicating the result back to the client •  With a default send socket buffer size of 8192 bytes and a local netcat client, you’re never going to block during send() •  Thus, processing a single request will immediately throw you into a tight back-to-back send/callback loop, with no opportunity to service other clients (when doing synchronous sends)
Summary •  I suuuuuuucked at CPython internals when I started
•  “Why do we bother INCREF/DECREFing Py_None?!” •  Probably should have picked an easier “pet project” than parallelizing the interpreter •  I’m really happy with progress so far though: exploit multiple cores without impacting single-threaded performance! •  Lots of work still to do •  The “do X” part (the thread-safe alternatives to “do Y”) is a huge topic that I’m planning on tackling in subsequent presentations •  How the async client/server stuff is implemented is a huge separate topic too
Pre-emptive Answers to your Questions •  I’ve encountered the problem,
spent weeks debugging it, and come up with a solid solution •  I’ve got a temporary solution in place and a better long term solution in mind •  I’ve thought about the problem, it can probably be exploited to instantly crash the interpreter, and I’m currently dealing with it by not doing that •  “Doctor, it hurts when I do this.” •  I have no idea and the very problem you speak of could threaten the viability of the entire approach

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4