[Guido] > Are you sure Perl still uses stdio at all? Pretty sure, but there are so many layers of macros the code is undecipherable, and I can't step thru macros in the debugger either (that's assuming I wanted to devote N hours to building Perl from source too -- which I don't). Perl also makes heavy use of macroizing std library names, so e.g. when I see "fopen" (which I do!), that doesn't mean I'm getting the fopen I'm thinking of. But the MSVC config files define all sorts of macros to get at the MS stdio _cnt and _ptr (and most other) FILE* fields, and the version of fopen in the Win32 stuff appears to defer to the platform fopen (after doing Perlish stuff, like if someone passed "/dev/null" as the file name, Perl changes it to "NUL"). This is what it's like: the first line of Perl's win32_fopen is this: dTHXo; That's conditionally defined in perl.h, either as #define dTHXo dTHXoa(PERL_GET_THX) or, if pTHXo is not defined, as # define dTHXo dTHX dTHX in turn is #defined in 4 different places across 3 different files in 2 different directories. I'll skip those. OTOH, dTHXoa is easy! It's only defined once: #define dTHXoa(a) pTHXo = a Ah, *that* clears it up <wink>. Etc. 20 years ago I may have thought this was fun. I thought debugging large systems of m4 macros was fun then, and I'm not sure this is either better or worse than that -- well, it's worse, because I understood m4's implementation. > If so, does it open the file in binary or in text mode? Sorry, but I really don't know and it's a pit to pursue. If it's not native text mode, they do a good job of faking it (e.g., Ctrl-Z acts like an EOF when reading a text file from Perl on Windows -- not something even Larry would be likely to do on his own <wink>). > Based on the APIs in MS's libc, I presume that the crlf->lf > translation is not done by stdio proper but by the Unix I/O > emulation just underneath it (open() has an O_BINARY option > flag, so read() probably does the translation). Yes; and late in the last release cycle, import.c's open_exclusive had a Windows bug related to this (fdopen() used "wb", but the earlier open() didn't use O_BINARY, and fdopen *acted* like it had used "w"). Also, the MS setmode() function works on file handles, not streams. > That comes down to copying most bytes an extra time. Understood. But the CRLF are stored physically on disk, so unless the disk controller is converting them, *someone's* software (whether MS's or Perl's) is doing it. By the time Perl is doing its fast line-input stuff, and doing what sure looks like a straight copy out of an IO buffer, it's clear from the code that CRLF has already been translated to LF. > (To test this hypothesis, you could try to open the test file > with mode "rb" and see if it makes a difference.) In Python, that saved about 10% (but got the wrong answers <wink>). In Perl, about 15-20%. But I don't think that tells us who's doing the translation. Assuming that the translation takes about the same total time for each, it makes sense that the percentage would be higher for Perl (since its total runtime is lower: same-sized slice of a smaller pie). > My biggest worry: thread-safety. There must be a way to lock > the file (you indicated that fgets() uses it). Yes, via the unadvertised _lock_str and _unlock_str macros defined in MS mtdll.h, which is not on the include path: /* * This is an internal C runtime header file. It is used when building * the C runtimes only. It is not to be used as a public header file. */ The routines and macros it calls are also unadvertised. After an hour of thrashing I wasn't able to successfully link any code trying to call these routines. Doesn't mean it's impossible, does means they're internal to MS libc and aren't meant to be called by anything else. That's why it's called "cheating" <wink>. Perl appears to ignore the whole issue (but Perl's thread story is muddy at best). [... ungetc ...] Not worried here either. > ... > You probably don't have many lines longer than 1000 characters. None, in fact. >> + It drops (line-at-a-time) drops to a little under 13 seconds if I >> comment out the thread macros. > If you mean the Py_BLOCK_THREADS around the resize, that can be safely > dropped. I meant *all* thread-related macros -- was just trying to get a feel for how much that fiddling cost (it's an expense Perl doesn't seem to have -- yet). Was measurable but not substantial. WRT the resize, there's now a "fast path" that avoids it. > (If/when we introduce Vladimir's malloc, we'll have to decide whether > it is threadsafe by itself or whether it requires the global > interpreter lock. I vote to make it threadsafe by itself.) As feared, this thread is going to consume my life <0.5 wink>. > ... > Doesn't it make more sense to delay the resize until this point? I > don't know how much the character copying accounts for, but I could > imagine a strategy based on memchr() and memcpy() that first searches > for a \n, and if found, allocates to the right size before copying. > Typically, the buffer contains many lines, so this could be optimized > into requiring a single exactly-sized malloc() call in the common case > (where the buffer doesn't wrap). But possibly scanning the buffer for > \n and then copying the bytes separately, even with memcmp() and > memcpy(), slows things down too much for this to be faster. Turns out that Perl does very much what I was doing; the Perl code is actually more burdensome, because its routine is trying to deal not only with \n-termination, but also arbitrary-string termination (Perl's Awk-like input record separator), and "paragraph mode", and fixed-size reads, and some other stuff I can't figure out from the macro names. In all cases with a terminator, though, it's doing the same business of both copying and testing in a very tight inner loop. It doesn't appear to make any serious attempts to avoid resizing the buffer. But, Perl has its own malloc routines, and I'm guessing they're highly tuned for this stuff. Since we're stuck with the MS malloc-- and Win9x's in particular seems lame --adding this near the start of my stuff did yield a nice speedup: if (fp->_cnt > 0 && (pBuf = (char *)memchr(fp->_ptr, '\n', fp->_cnt)) != NULL) { /* it's all in the buffer so don't bother releasing the * global lock */ total_buf_size = pBuf - fp->_ptr + 1; v = PyString_FromStringAndSize(fp->_ptr, (int)total_buf_size); if (v != NULL) { pBuf = BUF(v) + total_buf_size; fp->_cnt -= total_buf_size; fp->_ptr += total_buf_size; } goto done; } So that builds the result string directly from the stdio buffer when it can. Times dropped from (before this particular small hack) count_chars_lines 14.880 14.854 readlines_sizehint 9.280 9.302 using_fileinput 48.610 48.589 while_readline 13.450 13.451 to count_chars_lines 14.780 14.784 readlines_sizehint 9.550 9.514 using_fileinput 43.560 43.584 while_readline 10.600 10.578 Since I have no long lines in this test data, and the stdio buffer typically contains thousands of chars, most calls should be satisfied by the fast path. Compared to the previous code, the fast path (1) avoids global lock fiddling (but that didn't account for much in a distinct test); (2) crawls over the buffer twice instead of once; and, (3) avoids one (shrinking!) realloc. So crawling over the buffer an extra time costs nothing compared to the cost of a resize; and that's likely just more evidence that malloc/realloc suck on this platform. CAUTION: no file locking is going on now (because I haven't found a way to do it). My previous claim that the MS getc macro did no locking was wrong, as I discovered by stepping thru the generated machine code. stdio.h #defines getc without locking, but in _MT mode it later gets #undef'ed and turned into a function call. >> /* XXX unclear to me why MS's getc macro does "& 0xff" */ >> *pBuf++ = ch = *ms_ptr++ & 0xff; > I know why. getchar() returns an int in the range [-1, 255]. If > chars are signed the &0xff is needed else you would get a return in > the range [-128, 127] and -1 would be ambiguous (EOF==-1). Bingo -- MS chars are signed. > ... > But here since you're copying to another character, it's pointless. Yup! Gone. > .... > Note that get_line() with negative n could be implemented as > get_line(0) with some post processing. Andrew's glibc getline code appears to have wanted to do that, but looks to me like it's unreachable (unless I'm hallucinating, the "n < 0" test after return from glibc getline can't succeed, because the enclosing block is guarded by an "n==0" test). > This should be done completely separately, in PyFile_GetLine. I assume you have an editor <wink>. > The negative n case is only used by raw_input() -- it means strip > the \n and raise EOFError for EOF, and I expect that this is rarely > if ever used in a speed-conscious situation. I've never seen raw_input used except when stdin and stdout were connected to a tty. When I tried raw_input from a DOS box under the debugger, it never called get_line. Something trickier is going on there; I suspect it's actually calling fgets (eventually) instead in that case. more-mysteries-than-i-really-need-ly y'rs - tim
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4