RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from http://mail.python.org/pipermail/python-dev/2002-August/028437.html below:

[Python-Dev] The first trustworthy <wink> GBayes results

[Python-Dev] The first trustworthy <wink> GBayes resultsTim Peters tim@zope.com
Thu, 29 Aug 2002 13:54:30 -0400

Previous message: [Python-Dev] The first trustworthy <wink> GBayes results
Next message: [Python-Dev] The first trustworthy <wink> GBayes results
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

[Eric S. Raymond]
> Bogofilter throws out words of length one and two.

Right, I saw that.  It's something I'll run experiments against later.  I'm
running a 5x5 test grid (skipping the diagonal), and as was also true in
speech recognition, if I had been running against just one spam+ham training
corpora and just one spam+ham prediction set, I would have erroneously
concluded that various things either are improvements, are regressions, or
don't matter.  But some ideas obtained from staring at mistakes from one
test run turn out to be irrelevant, or even counter-productive, if applied
to other test runs.  The idea that some notion of "word" is important seems
highly defensible <wink>, but beyond that I discount claims that aren't
derived from a similarly paranoid testing setup.

> ...
> And bogofilter includes the headers.  This is important, since
> otherwise you don't rate things like spamhaus addresses and sender
> names.

Of course -- the reasons I'm not using headers in these particular tests
have been spelled out several times.  They'll get added later, but for now I
don't have a large enough test set where doing so doesn't render the
classifier's job trivial.

Previous message: [Python-Dev] The first trustworthy <wink> GBayes results
Next message: [Python-Dev] The first trustworthy <wink> GBayes results
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4