When exploring the recent heart failure discussion on-list, I discovered that I can reliably cause a heartbeat failure by grabbing the gil and sleeping for longer than the heartbeat period:
def csleep(t): """gil-holding sleep with cython.inline""" from cython import inline import sys code = '\n'.join([ 'from posix cimport unistd', 'unistd.sleep(t)', ]) while True: inline(code, quiet=True, t=t) print time.time() sys.stdout.flush() # this is important csleep(5)
This reliably triggers the heart-failure dialog in both the notebook and the qtconsole. Strangely, the stdout.flush()
(which results in IOStream.flush()
, and ultimately ZMQStream.flush()
) appears to be necessary to trigger the failure.
It turns out, the culprit is the fact that we are using pyzmq's non-copying sends.
For non-copying sends to work, pyzmq links up zmq's free function callback (their equivalent of dealloc @ refcount=0) and Python's decref. This means that when libzmq is done with the message, it must grab the GIL. The default behavior is for a context to have one io_thread (the canonical measure is 1 io_thread per GBps of throughput), but this means that if libzmq is waiting on the GIL, all other libzmq action in that context is blocked, including the heartbeat.
Potential answers include:
I think 2. is the most reasonable answer to all this, and I've tested it to confirm that it does indeed work for this case. We may also want to use Context(io_threads=2)
anway, to allow for potential issues with GIL-grabbing in the main app, though I've never seen any such issues in single-threaded code.
Potential optimizations:
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4