[Mark Favas] > ... > The lines range in length from 96 to 747 characters, with > 11% @ 233, 17% @ 252 and 52% @ 254 characters, so #1 [a vendor > who actually optimized fgets()] looks promising - most lines are > long enough to trigger a realloc. Plus as soon as you spill over the stack buffer, I make you pay for filling 1024 new bytes with newlines before the next fgets() call, and almost all of those are irrelevant to you. It doesn't degrade gracefully. Alas, I tried several "adaptive" schemes (adjusting how much of the initial segment of a larger stack buffer they would use, based on the actual line lengths seen in the past), but the costs always exceeded the savings on my box. > Cranking up INITBUFSIZE in ms_getline_hack to 260 from 200 > improves thing again, by another 25%: > total 131426612 chars and 514216 lines > count_chars_lines 5.081 5.066 > readlines_sizehint 3.743 3.717 > using_fileinput 11.113 11.100 > while_readline 6.100 6.083 > for_xreadlines 3.027 3.033 Well, I couldn't let you forego *all* of 25%. The current fileobject.c has a stack buffer of 300 bytes, but only uses 100 of them on the first gets() call. On a very quiet machine, that saved 3-4% of the runtime on *my* test case, whose line lengths are typical of the text files I crunch over, so I'm happy for me. If 100 bytes aren't enough, it must call fgets() again, but just appends the next call into the full 300-byte buffer. So it saves the realloc for lines under 300 chars. > Apart from the name <grin>, I like ms_getline_hack... Ya, it's now the non-pejorative getline_via_fgets(). I hate that I became a grown-up <0.9 wink>. time-to-pick-wings-off-of-flies-ly y'rs - tim
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4