"Bill Janssen" <janssen at parc.com> wrote in message news:04Jun29.224113pdt."58612"@synergy1.parc.xerox.com... > is that the byte vectors we tend to call strings in Python have no > string-ness, as understood in the 21st century. Python strings are sequences of 0 to n chars from an abstract 256-char alphabet. This meets my understanding of the standard 20th century CS definition of string. Has there been a significant change in the last few years? > There is no character set associated with them, The byte set is intentionally not any *particular* natural language char set, but a possible carrier for any of them. Perhaps unfortunately, it lacks a single standard glyph set or graphic representation., but I believe Unicode also differentiates between characters (code points?) and glyphs (which are also not standardized). The byte set also (fortunately) lacks the complications of letters, capitals, signs, marks, ligatures, symbols, and so on, which complications usually make the chararacter set for a particular language somewhat fuzzy. > documentation, particularly the language manual, is extremely > confusing on this point, in classifying "string" and "Unicode" objects > as the same sort of thing. I think it a matter a viewpoint whether one emphasizes the similarities or differences. > And then not documenting them clearly. The subject of strings, Unicode, internationalization, and Python could use a manual in itself. > Unicode ... is not integrated with the file streams support. Reading numbers other than bytes is also not integrated with the file type. Adding a 'bytes' parameters to file(), or a readbytes(n) method, would be generally helpful for anyone wanting to iterate thru a file in chunks other than 'lines'. Terry J. Reedy
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4