A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/ipython/ipython/issues/1260/ below:

heartbeat failure on long gil-holding operation · Issue #1260 · ipython/ipython · GitHub

When exploring the recent heart failure discussion on-list, I discovered that I can reliably cause a heartbeat failure by grabbing the gil and sleeping for longer than the heartbeat period:

def csleep(t):
    """gil-holding sleep with cython.inline"""
    from cython import inline
    import sys
    code = '\n'.join([
        'from posix cimport unistd',
        'unistd.sleep(t)',
    ])
    while True:
        inline(code, quiet=True, t=t)
        print time.time()
        sys.stdout.flush() # this is important

csleep(5)

This reliably triggers the heart-failure dialog in both the notebook and the qtconsole. Strangely, the stdout.flush() (which results in IOStream.flush(), and ultimately ZMQStream.flush()) appears to be necessary to trigger the failure.

It turns out, the culprit is the fact that we are using pyzmq's non-copying sends.

For non-copying sends to work, pyzmq links up zmq's free function callback (their equivalent of dealloc @ refcount=0) and Python's decref. This means that when libzmq is done with the message, it must grab the GIL. The default behavior is for a context to have one io_thread (the canonical measure is 1 io_thread per GBps of throughput), but this means that if libzmq is waiting on the GIL, all other libzmq action in that context is blocked, including the heartbeat.

Potential answers include:

  1. stop using non-copying sends in the kernel - perfectly GIL-less, but with obvious and probably unacceptable cost of copying everything.
  2. give the hearbeat thread its own context - most obvious and certain solution to this problem, but may require some bookkeeping at shutdown to make sure things actually halt.
  3. use more than one io_thread in kernel contexts - doesn't actually guarantee a solution to the problem unless only one non-copying message is outstanding at a time.

I think 2. is the most reasonable answer to all this, and I've tested it to confirm that it does indeed work for this case. We may also want to use Context(io_threads=2) anway, to allow for potential issues with GIL-grabbing in the main app, though I've never seen any such issues in single-threaded code.

Potential optimizations:


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4