Guido van Rossum <guido@python.org>: > Eric, before we go furhter, can you give an exact definition of > EOFness to me? A file is at EOF when attempts to read more data from it will fail returning no data. > What's wrong with just setting the parser loose on the input and > letting it deal with EOF? Nothing wrong in theory, but it's a problem in practice. I don't want to import the second parser unless it's actually needed, because it's much larger than the first one. > In your example, apparently a line > containing the word "history" signals that the rest of the file must > be parsed by the second parser. What if "history" is the last line of > the file? The eof() test can't tell you *that*! Right. That case never happens. I mean it *really* never happens :-). What we're talking about is a game system. The first parser recognizes a spec language for describing games of a particular class (variants of Diplomacy, if that's meaningful to you). The system keeps logfiles which consist of a a section in the game description language, optionally followed by the token "history" and an order log. The parser for the order log language is a *lot* larger than the one for the description language. This is why I said I don't want the first parser to just call the second. I want to test for EOF to know whether I have to import the second parser at all! Here's the beginning of my problem: the first parser can't export a line buffer, because it doesn't *have* a line buffer. It's a subclass of shlex and does single-character reads. There are two ways I can cope with this. One is to do a (nonzero) length read after the first parser exits; the other is to have the first parser set a state flag controlling whether the second parser loads. This is where it bites that I can't test for EOF with a read(0). The second shlex parser only has token-level pushback! If do a nonzero-length read and I get data, I'm screwed. On the other hand (as I said before) setting a lexer state flag seems wrong, because EOFness is a property of the underlying stream rather than the parser. I'd be duplicating state that exists in the stdio stream structure anyway; it ought to be accessible. > > Now, another and more general way to handle this would be to make an > > equivalent of the old FIONCLEX ioctl part of Python's standard set of > > file object methods -- a way to ask "how many bytes are ready to be > > read in this stream? > > There's no portable way to do that. Actually, fstat(2) is portable enough to support a very useful approximation of FIONCLEX. I know, because I tried it. Last night I coded up a "waiting" method for file objects that calls fstat(2) on the associated file descriptor. For a plain file, it then subtracts the result of ftell() from the fstat size field and returns that -- for other files, it simply returns the size field. I then tested this on plain files, FIFOs, and sockets under Linux. It turns out fstat(2) gives useful information in all three cases (a count of characters waiting in the buffer in the latter two). I expected this; it should be true under all current Unixes. fstat(2) does not give useful size-field results for Linux block devices. I didn't test the character (terminal) devices. (I documented my results in Python's Doc/lib/stat.tex, in a patch I have already submitted to SourceForge.) I would be quite surprised if the plain-file case didn't work on Mac and Windows. I would be a little surprised if the socket case failed, because all three probably inherited fstat(2) from the ancestral BSD TCP/IP stack. Just having the plain-file case work would, IMHO, be justification enough for this method. If it turns out to be portable across Mac and Windows sockets as well, *huge* win. Could this be tested by someone with access to Windows and Mac systems? -- <a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a> An armed society is a polite society. Manners are good when one may have to back up his acts with his life. -- Robert A. Heinlein, "Beyond This Horizon", 1942
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4