iddwb <iddwb at imap1.asu.edu> wrote in comp.lang.python: > > I've been going through the diveintopython stuff. Mostly excellent > material. however, there is syntax I don't understand was hoping someone > could enlighten me.. > > Here's the code i've been working with. I borrowed the code to create my > own HTMLProc class. The difficulties begin with the uknown_starttag > function. I keep getting a syntax error on the strattrs assignment. I > thought I copied the code verbatim but it still generates a syntax > error. Now, if I understood the syntax, I could fix it (probably). But I > am wondering if someone might explain the problem and the fix. If it's not the indentation problem, maybe you're using an old Python version? That assignment uses string methods and a list comprehension, and you need Python 2.0 at least for that. But 2.1 came out today, time for an upgrade anyway :). -- Remco Gerlich > > #!/usr/local/bin/python > # first test to open web pages using urlopen2 > from sgmllib import SGMLParser > import sys > > class HTMLProc(SGMLParser): > def reset(self): > # from diveintopython.org, extends SGMLParser > SGMLParser.reset(self) > self.parts = [] > > def unknown_starttag(self, tag, attrs): > strattrs = "".join([' %(key)s="%(value)s"' % locals() for key, value in attrs]) > self.parts.append("<%(tag)s%(strattrs)s>" % locals()) > > def unknown_endtag(self, tag): > self.parts.append("</%(tag)s>" % locals()) > > def output(self): > return "".join(self.parts) > > def do_body(fd): > try: > gmlbuffer = HTMLProc() > gmlbuffer.feed(fd.read()) > fd.close() > gmlbuffer.close() > except AttributeError: > gmlbuffer.unknown_starttag("body", "bgcolor") > print "Attribute Error" > return -1 > print "done with body" > return gmlbuffer > > if __name__ == '__main__': > # print sys.argv[1:] > try: > f = open("dean.html") > except IOError: > print "couldn't open ", sys.argv[1:] > sys.exit(1) > htmlbuff = do_body(f) > print htmlbuff.parts > > David Bear > College of Public Programs/ASU >
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4