A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from http://mail.python.org/pipermail/python-dev/2002-September/028596.html below:

[Python-Dev] Getting started with GBayes testing

[Python-Dev] Getting started with GBayes testing [Python-Dev] Getting started with GBayes testingGuido van Rossum guido@python.org
Wed, 04 Sep 2002 20:24:29 -0400
> I'm interested in contributing to GBayes ..
> 
> I'm thinking of trying word stemming and adding other types of token
> indicators. How can I contribute?

Pretty soon, a SF propject will be created (Barry has already gotten
the request in).  We'll gladly add you to the list of developers.

> Btw, I have been saving up my spam for a year or so.. I have about
> 31,238 spam messages saved up now. These are categorized as spam
> based on my reading of the subject, or examining the body when in
> doubt. There are probably 10% dups in the corpus. Some of them have
> viruses, likely klez.

Cool.

> I'd like to replicate Tim's test rig so I can compare my results
> with existing ones. My spam isn't in mbox format, but I can convert
> it..

If you can't wait for the SF project, you can find all the code in the
Python CVS tree:

  http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/nondist/sandbox/spambayes/

> I'm particularly intersted in how to allow html only messages
> (reduce false positives).  I'm getting a lot of personal mail in
> that format, unfortunately.

You train it with an equal number of spam and non-spam ("ham") that
you received.  Just make sure the ham training messages contain enough
representatives of the html-only mail.

--Guido van Rossum (home page: http://www.python.org/~guido/)



RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4