A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://mail.python.org/pipermail/python-dev/2002-September/028630.html below:

[Python-Dev] Getting started with GBayes testing

[Python-Dev] Getting started with GBayes testingSkip Montanaro skip@pobox.com
Thu, 5 Sep 2002 09:57:45 -0500
    Brad> My feeling is that the presentation of "the message" is
    Brad> independent of the message itself, so if I get a message in Text,
    Brad> HTML, RTF only the actual content is important, not the markup
    Brad> method. Though I suppose using lots of red and large fonts might
    Brad> be an indicator of spam, the text of the message should still
    Brad> suffice.

You might be surprised.  In Paul Graham's "A New Plan for Spam" he writes:

    I don't know why I avoided trying the statistical approach for so
    long.  I think it was because I got addicted to trying to identify
    spam features myself, as if I were playing some kind of
    competitive game with the spammers.  (Nonhackers don't often
    realize this, but most hackers are very competitive.)  When I did
    try statistical analysis, I found immediately that it was much
    cleverer than I had been.  It discovered, of course, that terms
    like "virtumundo" and "teens" were good indicators of spam.  But
    it also discovered that "per" and "FL" and "ff0000" are good
    indicators of spam.  In fact, "ff0000" (html for bright red) turns
    out to be as good an indicator of spam as any pornographic term.

As Tim has pointed out several times, intuition and hunches about this
stuff often turns out to be incorrect.

Skip



RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4