[Guido] > I'm much more confident about the getc_unlocked() approach than about > fgets() -- with the latter we need much more faith in the C library > implementers. (E.g. that fgets() never writes beyond the null bytes > it promises, and that it locks/unlocks only once.) Also, you're > relying on blindingly fast memchr() and memset() implementations. Yet Andrew's timings say it's a wash on Linux and Solaris (perhaps even a bit quicker on Solaris, despite that it's paying an extra layer of function call per line, to keep it out of get_line proper). That tells me the assumptions are indeed mild. The business about not writing beyond the null byte is a concern only I would have raised: the possibility is an aggressively paranoid reading of the std (I do *lots* of things with libc I'm paranoid about <0.9 wink>). If even *Microsoft* didn't blow these things, it's hard to imagine any other vendor exploding ... Still, I'd rather get rid of ms_getline_hack if I could, because the code is so much more complicated. >> Both methods lack a refinement I would like to see, but can't >> achieve in "the Windows way": ensure that consistency is on no >> worse than a per-line basis. [Example omitted] > The only portable way to ensure this that I can see, is to have a > separate mutex in the Python file object. Since this is hardly a > common thing to do, I think it's better to let the application manage > that lock if they need it. Well, it would be easy to fiddle the HAVE_GETC_UNLOCKED method to keep the file locked until the line was complete, and I wouldn't be opposed to making life saner on platforms that allow it. But there's another problem here: part of the reason we release Python threads around the fgets is in case some other thread is trying to write the data we're trying to read, yes? But since FLOCKFILE is in effect, other threads *trying* to write to the stream we're reading will get blocked anyway. Seems to give us potential for deadlocks. > (Then why are we bothering with flockfile(), you may ask? I wouldn't ask that, no <wink>. > Because otherwise, accidental multithreaded reading from the same > file could cause core dumps.) Ugh ... turns out that on my box I can provoke core dumps anyway, with this program. Blows up under released 2.0 and CVS Pythons (so it's not due to anything new): import thread def read(f): import time time.sleep(.01) n = 0 while n < 1000000: x = f.readline() n += len(x) print "r", print "read " + `n` m.release() m = thread.allocate_lock() f = open("ga", "w+") print "opened" m.acquire() thread.start_new_thread(read, (f,)) n = 0 x = "x" * 113 + "\n" while n < 1000000: f.write(x) print "w", n += len(x) m.acquire() print "done" Typical run: C:\Python20>\code\python\dist\src\pcbuild\python temp.py opened w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w w r w r w r w r w r w r w r w r w r w r w r w r w r w r w r w r w r w r w r r w r w r w r w r w r and then it dies in msvcrt.dll with a bad pointer. Also dies under the debugger (yay!) ... always dies like so: + We (Python) call the MS fwrite, from fileobject.c file_write. + MS fwrite succeeds with its _lock_str(stream) call. + MS fwrite then calls MS _fwrite_lk. + MS _fwrite_lk calls memcpy, which blows up for a non-obvious reason. Looks like the stream's _cnt member has gone mildly negative, which _fwrite_lk casts to unsigned and so treats like a giant positive count, and so memcpy eventually runs off the end of the process address space. Only thing I can conclude from this is that MS's internal stream-locking implementation is buggy. At least on W98SE. Other flavors of Windows? Other platforms? Note that I don't claim the program above is *sensible*, just that it shouldn't blow up. Alas, short of indeed adding a separate mutex in Python file objects-- or writing our own stdio --I don't believe I can fix this. the-best-thing-to-do-with-threads-is-don't-ly y'rs - tim
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4