Tom Brown wrote: > (Continuing thread started at > http://mail.python.org/pipermail/csv/2008-October/000688.html) > > On Sun, Oct 19, 2008 at 16:46, Andrew McNamara > <andrewm at object-craft.com.au <mailto:andrewm at object-craft.com.au>> wrote: > > >I downloaded the 2.6 source tar ball, but is it too late for new > features to > >get into versions <3? > > Yep. > > >How would you feel about adding the following tests to > Lib/test/test_csv.py > >and getting them to pass? > > > >Also http://www.python.org/doc/2.5.2/lib/csv-fmt-params.html says > >"*skipinitialspace *When True, whitespace immediately following the > >delimiter is ignored." > >but my tests show whitespace at the start of any field is ignored, > including > >the first field. > > I suspect (but I haven't checked) that it means "after the delimiter and > before any quoted field (or some variation on that). > > I agree that whitespace after the delimiter and before any quoted field > is skipped. Also whitespace after the start of the line and before any > quoted field is skipped. > All of the "dialect" parameters are there to allow parsing of a specific > common form of CSV file. Because there is no formal definition of the > format, the module simply aims to parse (and produce the same result) > as common applications such as Excel and Access. Changing the behaviour > in any non-backwards compatible way is sure to get screams of anguish > from many users. Even when the behaviour appears to be a bug, you can > be sure people are counting on it working like that. > > > skipinitialspace defaults to false and by the same logic skipfinalspace > should default to false to preserve compatibility with the csv module in > 2.6. On the other hand, the switch to version 3 is as good a time as any > to break backwards compatibility to adopt something that works better > for new users. Read Andrew's lips: They don't want "better", they want "the same as MS". > Based on my experience parsing several hundred csv generated by many > different people I think it would be nice to at least have a dialect > that is excel + skipinitialspace=True + skipfinalspace=True. Based on my experience extracting data from innumerable csv files (and infinite varieties thereof), spreadsheet files, and database tables, in 99.99% of cases one should automatically apply the following transformations to each text field: * strip leading whitespace * strip trailing whitespace * replace embedded runs of whitespace by a single space and one needs to ensure that the definition of whitespace includes the no-break space (NBSP) character. As this "space normalisation" is needed for all input sources, the csv module is IMHO the wrong place to put it. A string method would be a better idea. Cheers, John
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4