<Patrick.Bussi at space.alcatel.fr> wrote... > > Hello, > > I cannot understand why I get the following error. > > 1) Here is an extract of the web page content I want to parse : > ------------- > function RedirectPage() > { > document.location ="http://www.nicewebsite.com"; > } > ------------- > > > 2) Here is the program I use to extract the URL to which I need to redirect my > browsing (ugly, I agree) > ------------- > q=re.compile("document.location =.*") > t2=q.search(f3) # f3 contains the full string of the web page > b= t2.group() > q1=re.compile('[^document.location ="].*[^";]') > t4=q1.search(b) > redirectURL=t4.group() > f4=urllib.urlopen(redirectURL).read() > -------------- > The program successfully extracts the name of the Url. I can print the exact > text "http://www.nicewebsite.com" (without the quotes). > > > 3) And finally here is the traceback I get : > ----------------------- > Traceback (innermost last): > File "./MyProgram", line 157, in ? > f4=urllib.urlopen(redirectURL).read() > File "/usr/lib/python1.5/urllib.py", line 59, in urlopen > return _urlopener.open(url) > File "/usr/lib/python1.5/urllib.py", line 157, in open > return getattr(self, name)(url) > File "/usr/lib/python1.5/urllib.py", line 272, in open_http > return self.http_error(url, fp, errcode, errmsg, headers) > File "/usr/lib/python1.5/urllib.py", line 285, in http_error > result = method(url, fp, errcode, errmsg, headers) > File "/usr/lib/python1.5/urllib.py", line 456, in http_error_302 > return self.open(newurl, data) > File "/usr/lib/python1.5/urllib.py", line 157, in open > return getattr(self, name)(url) > File "/usr/lib/python1.5/urllib.py", line 247, in open_http > if not host: raise IOError, ('http error', 'no host given') > IOError: [Errno http error] no host given > --------------------------- > > > For professional reason, I had to change the web site name above. Could in any > case urllib() be sensitive to the content of the requested URL (full of "&" > chars) ? > > Thank you for any help. > Patrick: Maybe it's the URL: >>> import re, urllib >>> q=re.compile("document.location =.*") >>> t2=q.search(f3) >>> b= t2.group() >>> q1=re.compile('[^document.location ="].*[^";]') >>> t4=q1.search(b) >>> redirectURL=t4.group() >>> redirectURL 'http://www.nicewebsite.com' >>> f4=urllib.urlopen(redirectURL).read() >>> f4 '<!-- frames -->\015\012<frameset rows="50%,75" border="0" framespacing="0"> \015\012 <frame name="destframe" src="http://futuresite.register.com/futuresite.shtml" marginwidth="0" marginheight="0" scrolling="auto" frameborder="0">\015\012 <frame src="http://futuresite.register.com/us?www.nicewebsite.com" name="bottom" frameborder="0" scrolling="NO" marginwidth="0" marginheight="0" noresize>\015\012</frameset><noframes></noframes>\015\012' >>> Seems to work fine with the URL you gave, so have you checked that the URL you are using really is a valid one? It looks like you might have missed a slash out: >>> urllib.urlopen("http:/www.nosuchsitehere.com") Traceback (innermost last): File "<interactive input>", line 1, in ? File "d:\python20\lib\urllib.py", line 61, in urlopen return _urlopener.open(url) File "d:\python20\lib\urllib.py", line 166, in open return getattr(self, name)(url) File "d:\python20\lib\urllib.py", line 261, in open_http if not host: raise IOError, ('http error', 'no host given') IOError: [Errno http error] no host given Hope you track it down. regards Steve
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4