Hi, I've met a problem in match a regular expression in python. Hope any of you could help me. Here are the details: I have many tags like this: xxx<a href="http://xxx.xxx.xxx" xxx>xxx xxx<a href="wap://xxx.xxx.xxx" xxx>xxx xxx<a href="http://xxx.xxx.xxx" xxx>xxx ..... And I want to find all the "http://xxx.xxx.xxx" out, so I do it like this: httpPat = re.compile("(<a )(href=\")(http://.*)(\")") result = httpPat.findall(data) I use this to observe my output: for i in result: print i[2] Surprisingly I will get some output like this: http://xxx.xxx.xxx">xxx</a>xxx In fact it's filtered from this kind of source: <a href="http://xxx.xxx.xxx">xxx</a>xxx" But some result are right, I wonder how can I get the all the answers clean like "http://xxx.xxx.xxx"? Thanks for your help. Regards, Johnny
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4