RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from http://mail.python.org/pipermail/python-dev/2002-August/028262.html below:

[Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8

[Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8Eric S. Raymond esr@thyrsus.com
Sat, 24 Aug 2002 05:03:51 -0400

Previous message: [Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8
Next message: [Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Tim Peters <tim.one@comcast.net>:
> a. There are other fudges in the code that may rely on this fudge
>    to cancel out, intentionally or unintentionally.  I'm loathe to
>    type more about this instead of working on the code, because I've
>    already typed about it.  See a later msg for a concrete example of
>    how the factor-of-2 "good count" bias acts in part to counter the
>    distortion here.  Take one away, and the other(s) may well become
>    "a problem".

I was thinking of shooting that "goodness bias" through the head and seeing
what happens, actually. I've been unhappy with that fudge in Paul's original
formula from the beginning.
 
> b. Unless the proportion of spam to not-spam in the training sets
>    is a good approximation to the real-life ratio of spam to not-
>    spam, it's also dubious to train the system with bogus P(S) and
>    P(not-S) values.

Right -- which is why I want to experiment with actually *using* the
real life running ratio.

> c. I'll get back to this when our testing infrastructure is trustworthy.
>    At the moment I'm hosed because the spam corpus I pulled off the
>    web turns out to be trivial to recognize in contrast to Barry's
>    corpus of good msgs from python.org mailing lists: 

Ouch.  That's a trap I'll have to watch out for in handling other
peoples' corpora.
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

Previous message: [Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8
Next message: [Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4