A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://mail.python.org/pipermail/python-dev/2011-August/113336.html below:

[Python-Dev] PEP 393 review

[Python-Dev] PEP 393 review"Martin v. Löwis" martin at v.loewis.de
Tue Aug 30 08:20:46 CEST 2011
> I don't compare ASCII and ISO-8859-1 decoders. I was asking if decoding b'abc' 
> from ISO-8859-1 is faster than decoding b'ab\xff' from ISO-8859-1, and if yes: 
> why?

No, that makes no difference.

> 
> Your patch replaces PyUnicode_New(size, 255) ...  memcpy(), by 
> PyUnicode_FromUCS1().

You compared to the wrong revision. PyUnicode_New is already a PEP 393
function, and this version you have been comparing to is indeed faster
than the current version. However, it is also incorrect, as it fails
to compute the maxchar, and hence fails to detect pure-ASCII strings.

See below for the actual diff. It should be obvious why the 393 version
is faster: 3.3 currently needs to widen each char (to 16 or 32 bits).

Regards,
Martin

@@ -5569,41 +5569,8 @@
                       Py_ssize_t size,
                       const char *errors)
 {
-    PyUnicodeObject *v;
-    Py_UNICODE *p;
-    const char *e, *unrolled_end;
-
     /* Latin-1 is equivalent to the first 256 ordinals in Unicode. */
-    if (size == 1) {
-        Py_UNICODE r = *(unsigned char*)s;
-        return PyUnicode_FromUnicode(&r, 1);
-    }
-
-    v = _PyUnicode_New(size);
-    if (v == NULL)
-        goto onError;
-    if (size == 0)
-        return (PyObject *)v;
-    p = PyUnicode_AS_UNICODE(v);
-    e = s + size;
-    /* Unrolling the copy makes it much faster by reducing the looping
-       overhead. This is similar to what many memcpy() implementations
do. */
-    unrolled_end = e - 4;
-    while (s < unrolled_end) {
-        p[0] = (unsigned char) s[0];
-        p[1] = (unsigned char) s[1];
-        p[2] = (unsigned char) s[2];
-        p[3] = (unsigned char) s[3];
-        s += 4;
-        p += 4;
-    }
-    while (s < e)
-        *p++ = (unsigned char) *s++;
-    return (PyObject *)v;
-
-  onError:
-    Py_XDECREF(v);
-    return NULL;
+    return PyUnicode_FromUCS1((unsigned char*)s, size);
 }

 /* create or adjust a UnicodeEncodeError */
More information about the Python-Dev mailing list

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4