RetroSearch Browse

Tue Nov 15 00:31:12 CET 2011 · https://mail.python.org/pipermail/python-dev/2011-November/114489.html

Previously, in python 2.6, I had made a lot of use of urllib.urlopen to capture
web page content and then post process the data from the site I was downloading.
Now, those routines, and the new routines I am trying to use for python 3.2 are
running into what seems to be a windows only (maybe even windows 7 only problem).

Using the following code with python 3.2.2 (64) on windows 7 ...

import urllib.request

fp = urllib.request.urlopen(URL_string_that_I_use)

string = fp.read()
fp.close()
print(string.decode("utf8"))

I get the following message:
Traceback (most recent call last):
  File "TATest.py", line 5, in <module>
    string = fp.read()
  File "d:\python32\lib\http\client.py", line 489, in read
    return self._read_chunked(amt)
  File "d:\python32\lib\http\client.py", line 553, in _read_chunked
    self._safe_read(2)      # toss the CRLF at the end of the chunk
  File "d:\python32\lib\http\client.py", line 592, in _safe_read
    raise IncompleteRead(b''.join(s), amt)
http.client.IncompleteRead: IncompleteRead(0 bytes read, 2 more expected)

Using the following code instead ...

import urllib.request

fp = urllib.request.urlopen(URL_string_that_I_use)
for Line in fp:
    print(Line.decode("utf8").rstrip('\n'))
fp.close()

I get a fair amount of the web page's content, but then the rest of the capture
is thwarted by ...

Traceback (most recent call last):
  File "TATest.py", line 9, in <module>
    for Line in fp:
  File "d:\python32\lib\http\client.py", line 489, in read
    return self._read_chunked(amt)
  File "d:\python32\lib\http\client.py", line 545, in _read_chunked
    self._safe_read(2)  # toss the CRLF at the end of the chunk
  File "d:\python32\lib\http\client.py", line 592, in _safe_read
    raise IncompleteRead(b''.join(s), amt)
http.client.IncompleteRead: IncompleteRead(0 bytes read, 2 more expected)

Trying to read another page yields ...

Traceback (most recent call last):
  File "TATest.py", line 11, in <module>
    print(Line.decode("utf8").rstrip('\n'))
  File "d:\python32\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\x92' in position
21: character maps to <undefined>

I do believe this is a windows issue, but can python be made more robust to deal
with what is causing it? When trying similar code on Linux, we do not encounter
the problem.

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://mail.python.org/pipermail/python-dev/2011-November/114489.html below:

[Python-Dev] urllib.request.urlopen struggling in Windows 7