Sammy Mannaert wrote: > hi, > > i'm trying to use httplib to fetch a file > automatically. it's basically just the > example from the 2.0 httplib GET example. > > it works for all urls except for > http://www.deathinjune.net/html/news/news.htm > > does anyone know why it won't work for this > url ? is there an easy way to fix it ? > i tried surfinf to the url in netscape and > lynx. both worked fine. > If you want to fetch files, you'd better use the higher level urllib module since it can follow redirects and can handle "virtual hosts". To fetch a file from a named virtual host you have to do like all modern browsers when you use low level code like httplib : send a "host" header with the full name of the server you want to match : def fetch(domain, path): h = httplib.HTTP(domain) h.putrequest('GET', path) h.putheader('Accept', 'text/html') h.putheader('Accept', 'text/plain') h.putheader('Host', domain) # <------------- add this line h.endheaders() errcode, errmsg, headers = h.getreply() print errcode f = h.getfile() data = f.read() f.close() print len(data) -- Romuald Texier
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4