A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from http://mail.python.org/pipermail/python-dev/2010-June/101090.html below:

[Python-Dev] thoughts on the bytes/string discussion

[Python-Dev] thoughts on the bytes/string discussionIan Bicking ianb at colorstudy.com
Sat Jun 26 00:26:20 CEST 2010
On Fri, Jun 25, 2010 at 4:02 PM, Guido van Rossum <guido at python.org> wrote:

> On Fri, Jun 25, 2010 at 1:43 PM, Glyph Lefkowitz
> > I'd like a version of 'decode' which would give me a type that was, in
> every
> > respect, unicode, and responded to all protocols exactly as other
> > unicode objects (or "str objects", if you prefer py3 nomenclature ;-))
> do,
> > but wouldn't actually copy any of that memory unless it really needed to
> > (for example, to pass to a C API that expected native wide characters),
> and
> > that would hold on to the original bytes so that it could produce them on
> > demand if encoded to the same encoding again. So, as others in this
> thread
> > have mentioned, the 'ABC' really implies some stuff about C APIs as well.
> > I'm not sure about the exact performance impact of such a class, which is
> > why I'd like the ability to implement it *outside* of the stdlib and see
> how
> > it works on a project, and return with a proposal along with some data.
> >  There are also different ways to implement this, and other optimizations
> > (like ropes) which might be better.
> > You can almost do this today, but the lack of things like the
> hypothetical
> > "__rcontains__" does make it impossible to be totally transparent about
> it.
>
> But you'd still have to validate it, right? You wouldn't want to go on
> using what you thought was wrapped UTF-8 if it wasn't actually valid
> UTF-8 (or you'd be worse off than in Python 2). So you're really just
> worried about space consumption. I'd like to see a lot of hard memory
> profiling data before I got overly worried about that.
>

It wasn't my profiling, but I seem to recall that Fredrik Lundh specifically
benchmarked ElementTree with all-unicode and sometimes-ascii-bytes, and
found that using Python 2 strs in some cases provided notable advantages.  I
know Stefan copied ElementTree in this regard in lxml, maybe he also did a
benchmark or knows of one?

-- 
Ian Bicking  |  http://blog.ianbicking.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20100625/607e5b2e/attachment.html>
More information about the Python-Dev mailing list

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4