--UugvWAfsgieZRqgk Content-Type: text/plain; charset=us-ascii Content-Disposition: inline I have been researching the question of how to ask a file descriptor how much data it has waiting for the next sequential read, with a view to discovering what cross-platform behavior we could count on for a hypothetical `waiting' method in Python's built-in file class. 1: Why bother? I have these main applications in mind: 1. Detecting EOF on a static plain file. 2. Non-blocking poll of a socket opened in non-blocking mode. 3. Non-blocking poll of a FIFO opened in non-blocking mode. 4. Non-blocking poll of a terminal device opened in non-blocking mode. These are all frequently requested capabilities on C newsgroups -- how often have *you* seen the "how do I detect an individual keypress" question from beginning programmers? I believe having these capabilities would substantially enhance Python's appeal. 2: What would be under the hood? Summary: We can do this portably, and we can do it with only one (1) new #ifdef. Our tools for this purpose will be the fstat(2) st_size field and the FIONREAD ioctl(2) call. They are complementary. In all supposedly POSIX-conformant environments I know of, the st_size field has a documented meaning for plain files (S_IFREG) and may or may not give a meaningful number for FIFOs, sockets, and tty devices. The Single Unix Specification is silent on the meaning of st_size for file types other than regular files (S_IFREG). I have filed a defect report about this with OpenGroup and am discussing appropriate language with them. (The last sentence of the Inferno operating system's language on stat(2) is interesting: "If the file resides on permanent storage and is not a directory, the length returned by stat is the number of bytes in the file. For directories, the length returned is zero. Some devices report a length that is the number of bytes that may be read from the device without blocking.") The FIONREAD ioctl(2) call, on the other hand, returns bytes waiting on character devices such as FIFOs, sockets, or ttys -- but does not return a useful value for files or directories or block devices. The FIONREAD ioctl was supported in both SVr4 and 4.2BSD. It's present in all the open-source Unixes, SunOS, Solaris, and AIX. Via Google search I have discovered that it's also supported in the Windows Sockets API and the GUSI POSIX libraries for the Macintosh. Thus, it can be considered portable for Python's purposes even though it's rather sparsely documented. I was able to obtain confirming information on Linux from Linus Torvalds himself. My information on Windows and the Mac is from Gavriel State, formerly a lead developer on Corel's WINE team and a programmer with extensive cross-platform experience. Gavriel reported on the MSCRT POSIX environment, on the Metrowerks Standard Library POSIX implementation for the Mac, and on the GUSI POSIX implementation for the Mac. 2.1: Plain files Torvalds and State confirm that for plain files (S_IFREG) the st_size field is reliable on all three platforms. On the Mac it gives the file's data fork size. One apparent difficulty with the plain-file case is that POSIX does not guarantee anything about seek_t quantities such as lseek(2) returns and the st_size field except that they can be compared for equality. Thus, under the strict letter of POSIX law, `waiting' can be used to detect EOF but not to get a reliable read-size return in any other file position. Fortunately, this is less an issue than it appears. The weakness of the POSIX language was a 1980s-era concession to a generation of mainframe operating systems with record-oriented file structures -- all of which are now either thoroughly obsolete or (in the case of IBM VM/CMS) have become Linux emulators :-). On modern operating systems under which files have character granularity, stat(2) emulations can be and are written to give the right result. 2.2: Block devices The directory case (S_IFDIR) is a complete loss. Under Unixes, including Linux, the fstat(2) size field gives the allocated size of the directory as if it were a plain file. Under MSCRT POSIX the meaning is undocumented and unclear. Metroworks returns garbage. GUSI POSIX returns the number of files in the directory! FIONREAD cannot be used on directories. Block devices (S_IFBLK) are a mess again. Linus points out that a system with removable or unmountable volumes *cannot* return a useful st_size field -- what happens when the device is dismounted? 2.3: Character devices Pipes and FIFOs (S_IFIFO) look better. On MSCRT the fstat(2) size field returns the number of bytes waiting to be read. This is also true under current Linuxes, though Torvalds says it is "an implementation detail" and recommends polling with the FIONREAD ioctl instead. Fortunately, FIONREAD is available under Unix, Windows, and the Mac. Sockets (S_IFSOCK) look better too. Under Linux, the fstat(2) size field gives number of bytes waiting. Torvalds again says this is "an implementation detail" and recommends polling with the FIONREAD ioctl. Neither MSCRT POSIX nor Metroworks has direct support for sockets. GUSI POSIX returns 1 (!) in the st_size field. But FIONREAD is available under Unix, Windows, and the GUSI POSIX libraries on the Mac. Character devices (S_IFCHR) can be polled with FIONREAD. This technique has a long history of use with tty devices under Unix. I don't know whether it will work with the equivalents of terminal devices for Windows and the Mac. Fortunately this is not a very important question, as those are GUI environments with the terminal devices are rarely if ever used. 3. How does this turn into Python? The upshot of our portability analysis is that by using FIONREAD and fstat(2), we can get useful results for plain files, pipes, and sockets on all three platforms. Directories and block devices are a complete loss. Character devices (in particular, ttys) we can poll reliably under Unix. What we'll get polling the equivalents of tty or character devices under Windows and the Mac is presently unknown, but also unimportant. My proposed semantics for a Python `waiting' method is that it reports the amount of data that would be returned by a read() call at the time of the waiting-method invocation. The interpreter throws OSError if such a report is impossible or forbidden. I have enclosed a patch against the current CVS sources, including documentation. This patch is tested and working against plain files, sockets, and FIFOs under Linux. I have also attached the Python test program I used under Linux. I would appreciate it if those of you on Windows and Macintosh machines would test the waiting method. The test program will take some porting, because it needs to write to a FIFO in background. Under Linux I do it this way: (echo -n '%s' >testfifo; echo 'Data written to FIFO.') & I don't know how to do the equivalent under Windows or Mac. When you run this program, it will try to mail me your test results. -- <a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a> Sometimes it is said that man cannot be trusted with the government of himself. Can he, then, be trusted with the government of others? -- Thomas Jefferson, in his 1801 inaugural address --UugvWAfsgieZRqgk Content-Type: text/plain; charset=us-ascii Content-Description: Patch implementing the waiting method Content-Disposition: attachment; filename="waiting.patch" Index: fileobject.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Objects/fileobject.c,v retrieving revision 2.108 diff -c -r2.108 fileobject.c *** fileobject.c 2001/01/18 03:03:16 2.108 --- fileobject.c 2001/01/25 16:16:10 *************** *** 35,40 **** --- 35,44 ---- #include <errno.h> #endif + #ifndef DONT_HAVE_IOCTL_H + #include <sys/ioctl.h> + #endif + typedef struct { PyObject_HEAD *************** *** 423,428 **** --- 427,513 ---- } static PyObject * + file_waiting(PyFileObject *f, PyObject *args) + { + struct stat stbuf; + #ifdef HAVE_FSTAT + int ret; + #endif + + if (f->f_fp == NULL) + return err_closed(); + if (!PyArg_NoArgs(args)) + return NULL; + #ifndef HAVE_FSTAT + PyErr_SetString(PyExc_OSError, "fstat(2) is not available."); + clearerr(f->f_fp); + return NULL; + #else + Py_BEGIN_ALLOW_THREADS + errno = 0; + ret = fstat(fileno(f->f_fp), &stbuf); + Py_END_ALLOW_THREADS + if (ret == -1) { /* the fstat failed */ + PyErr_SetFromErrno(PyExc_IOError); + clearerr(f->f_fp); + return NULL; + } else if (S_ISDIR(stbuf.st_mode) || S_ISBLK(stbuf.st_mode)) { + PyErr_SetString(PyExc_IOError, + "Can't poll a block device or directory."); + clearerr(f->f_fp); + return NULL; + } else if (S_ISREG(stbuf.st_mode)) { /* plain file */ + #if defined(HAVE_LARGEFILE_SUPPORT) && SIZEOF_OFF_T < 8 && SIZEOF_FPOS_T >= 8 + fpos_t pos; + #else + off_t pos; + #endif + Py_BEGIN_ALLOW_THREADS + errno = 0; + pos = _portable_ftell(f->f_fp); + Py_END_ALLOW_THREADS + if (pos == -1) { + PyErr_SetFromErrno(PyExc_IOError); + clearerr(f->f_fp); + return NULL; + } + #if !defined(HAVE_LARGEFILE_SUPPORT) + return PyInt_FromLong(stbuf.st_size - pos); + #else + return PyLong_FromLongLong(stbuf.st_size - pos); + #endif + } else if (S_ISFIFO(stbuf.st_mode) + || S_ISSOCK(stbuf.st_mode) + || S_ISCHR(stbuf.st_mode)) { /* stream device */ + #ifndef FIONREAD + PyErr_SetString(PyExc_OSError, + "FIONREAD is not available."); + clearerr(f->f_fp); + return NULL; + #else + int waiting; + + Py_BEGIN_ALLOW_THREADS + errno = 0; + ret = ioctl(fileno(f->f_fp), FIONREAD, &waiting); + Py_END_ALLOW_THREADS + if (ret == -1) { + PyErr_SetFromErrno(PyExc_IOError); + clearerr(f->f_fp); + return NULL; + } + + return Py_BuildValue("i", waiting); + #endif /* FIONREAD */ + } else { /* should never happen! */ + PyErr_SetString(PyExc_OSError, "Unknown file type."); + clearerr(f->f_fp); + return NULL; + } + #endif /* HAVE_FSTAT */ + } + + static PyObject * file_fileno(PyFileObject *f, PyObject *args) { if (f->f_fp == NULL) *************** *** 1263,1268 **** --- 1348,1354 ---- {"truncate", (PyCFunction)file_truncate, 1}, #endif {"tell", (PyCFunction)file_tell, 0}, + {"waiting", (PyCFunction)file_waiting, 0}, {"readinto", (PyCFunction)file_readinto, 0}, {"readlines", (PyCFunction)file_readlines, 1}, {"xreadlines", (PyCFunction)file_xreadlines, 1}, --UugvWAfsgieZRqgk Content-Type: text/plain; charset=us-ascii Content-Description: Test program for the waiting method Content-Disposition: attachment; filename="waiting_test.py" #!/usr/bin/env python import sys, os, random, string, time, socket, smtplib, readline print "This program tests the `waiting' method of file objects." fp = open("waiting_test.py") if hasattr(fp, "waiting"): print "Good, you're running a patched Python with `waiting' available." else: print "You haven't installed the `waiting' patch yet. This won't work." sys.exit(1) successes = "" failures = "" nogo = "" print "" print "First, plain files:" filesize = fp.waiting() print "There are %d bytes waiting to be read in this file." % filesize if os.name == 'posix': os.system("ls -l waiting_test.py") print "That should match the number in the ls listing above." else: print "Please check this with your OS's directory tools." get = random.randrange(fp.waiting()) print "I'll now read a random number (%d) of bytes." % get fp.read(get) print "The waiting method sees %d bytes left." % fp.waiting() if get + fp.waiting() == filesize: print "%d + %d = %d. That's consistent. Test passed." % \ (get, fp.waiting(), filesize) successes += "Plain file random-read test passed.\n" else: print "That's not consistent. Test failed." failures += "Plain file random-read test failed\n" print "Now let's see if we can detect EOF reliably." fp.read() left = fp.waiting() print "I'll do a read()...the waiting method now returns %d" % left if left == 0: print "That looks like EOF." successes += "Plain file EOF test passed.\n" else: print "%d bytes left. Test failed." % left failures += "Plain file EOF test failed\n" fp.close() print "" print "Now sockets:" print "Connecting to imap.netaxs.com's IMAP server now..." sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) file = sock.makefile('rb') sock.connect(("imap.netaxs.com", 143)) print "Waiting a few seconds to avoid a race condition..." time.sleep(3) greetsize = file.waiting() print "There appear to be %d bytes waiting..." % greetsize greeting = file.readline() print "I just read the greeting line..." sys.stdout.write(greeting) if len(greeting) == greetsize: print "...and the size matches. Test passed." successes += "Socket test passed.\n" else: print "That's not right. Test failed." failures += "Socket test failed.\n" sock.close() print "" if not hasattr(os, "mkfifo"): print "Your platform doesn't have FIFOs (mkfifo() is absent), so I can't test them." nogo = "FIFO test could not be performed." else: print "Now FIFOs:" print "I'm making a FIFO named testfifo."; os.mkfifo("testfifo") str = string.letters[:random.randrange(len(string.letters))] print "I'm going to send it the following string '%s' of random length %d:" \ % (str, len(str),) # Note: Unix dependency here! os.system("(echo -n '%s' >testfifo; echo 'Data written to FIFO.') &" % str) fp = open("testfifo", "r") print "Waiting a few seconds to avoid a race condition..." time.sleep(3) ready = fp.waiting() print "I see %d bytes waiting in the FIFO." % ready if ready == len(str): print "That's consistent. Test passed." successes += "FIFO test passed.\n" else: print "That's not consistent. Test failed." failures += "FIFO test failed\n" os.remove("testfifo") print "\nSummary:" report = "Platform is: %s, version is %s\n" % (sys.platform, sys.version) if successes: report += "The following tests succeeded:\n" + successes if failures: report += "The following tests failed:\n" + failures if nogo: report += "The following tests could not be performed:\n" + nogo if not nogo: report += "No tests were skipped.\n" if not failures: report += "All tests succeeded.\n" print report if os.name == 'posix': me = os.environ["USER"] + "@" + socket.getfqdn() else: me = raw_input("Enter your emasil address, please?") try: server = smtplib.SMTP('localhost') report = ("From: %s\nTo: esr@thyrsus.com\nSubject: waiting_test\n\n" % me) + report server.sendmail(me, ["esr@thyrsus.com"], report) server.quit() except: print "The attempt to mail your test result failed.\n" --UugvWAfsgieZRqgk--
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4