"Sammy Mannaert" <nstalkie at tvd.be> wrote in message news:3AD5E147.6133A8DD at tvd.be... > hi, > > i'm trying to use httplib to fetch a file > automatically. it's basically just the > example from the 2.0 httplib GET example. > > it works for all urls except for > http://www.deathinjune.net/html/news/news.htm > > does anyone know why it won't work for this > url ? is there an easy way to fix it ? > i tried surfinf to the url in netscape and > lynx. both worked fine. > > sammy > > > --- program follows --- > #! /usr/bin/env python > > import httplib > > def fetch(domain, path): > h = httplib.HTTP(domain) > h.putrequest('GET', path) > h.putheader('Accept', 'text/html') > h.putheader('Accept', 'text/plain') > h.endheaders() > errcode, errmsg, headers = h.getreply() > print errcode > f = h.getfile() > data = f.read() > f.close() > print len(data) > > def main(): > fetch('www.brainwashed.com', '/c93/news1.html') > fetch('www.deathinjune.net', '/html/news/news.htm') > > main() > > --- program end --- > sammy: Here, I got the following output: 200 40605 404 212 Not sure why we are getting the 404 since, like you, I can read the page fine in a browser. Still got a 404 if I reversed the order of the page acesses, too. Just as a matter of interest, is there a particular reason to use httplib? urllib is almost always more convenient for this type of operation, and I had no trouble with: >>> import urllib >>> u = urllib.urlopen("http://www.deathinjune.net/html/news/news.htm") >>> c = u.read() >>> len(c) 4133 regards Steve
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4