> I'm interested in contributing to GBayes .. > > I'm thinking of trying word stemming and adding other types of token > indicators. How can I contribute? Pretty soon, a SF propject will be created (Barry has already gotten the request in). We'll gladly add you to the list of developers. > Btw, I have been saving up my spam for a year or so.. I have about > 31,238 spam messages saved up now. These are categorized as spam > based on my reading of the subject, or examining the body when in > doubt. There are probably 10% dups in the corpus. Some of them have > viruses, likely klez. Cool. > I'd like to replicate Tim's test rig so I can compare my results > with existing ones. My spam isn't in mbox format, but I can convert > it.. If you can't wait for the SF project, you can find all the code in the Python CVS tree: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/nondist/sandbox/spambayes/ > I'm particularly intersted in how to allow html only messages > (reduce false positives). I'm getting a lot of personal mail in > that format, unfortunately. You train it with an equal number of spam and non-spam ("ham") that you received. Just make sure the ham training messages contain enough representatives of the html-only mail. --Guido van Rossum (home page: http://www.python.org/~guido/)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4