A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://mail.python.org/pipermail/python-dev/2002-August/028416.html below:

[Python-Dev] The first trustworthy <wink> GBayes results

[Python-Dev] The first trustworthy <wink> GBayes resultsTim Peters tim.one@comcast.net
Thu, 29 Aug 2002 00:18:04 -0400
FYI, about counting multiple instances of a word multiple times, or only
once, when scoring.  Changing it to count words only once did fix the
specific false positive examples I mentioned.  However, across 20 test runs
(training on one of five pairs of corpora, and then for each such training
pair running predictions across the remaining four pairs), it was a mixed
bag.  On some runs it appeared to be a real improvement, on others a real
regression.  Overall, the results didn't support concluding it made a
significant difference to the false positive rate, but weakly supported
concluding that it increased the false negative rate.

That's very tentative -- I didn't stare at the actual misclassifications, I
just ran it while sleeping off a flu, then woke up and crunched the numbers.
This ignorant-of-MIME tokenization scheme is ridiculously bad for the false
negative rate anyway (an entire line of base64 or obfuscated
quoted-printable looks like a ham-favoring single "unknown word" to it), so
there are bigger fish to fry first.




RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4