Jeremy> Are you scraping the full SF bug report or just the summary Jeremy> page? Perhaps we should make a more concerted effort to share Jeremy> our scraping code. It's likely that we didn't make the same Jeremy> mistakes, so we'll either be able to cut the bugs in half by Jeremy> looking for divergences or double the number of bugs by taking Jeremy> the worst from each. I just scrape the summary page for the time being. I have a separate script that allows me add more tag info to my local database (but no way to display that stuff yet). For that I do grab the detail page. Are you parsing the HTML or tearing it apart with regular expressions? I make a couple simple transformations on the HTML before trying to match that make the regular expressions a hell of a lot easier to write. I'll shoot you a copy in private mail. I doubt most of the python-dev readership is interested in this to any great degree. Skip
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4