[Delaney, Timothy] > Pretty darned good advice too ... but you won't object if I waste > some time playing with this stuff anyway I hope. Only one way to accumulate > experience after all ;) Not at all! Knock yourself out -- it's really a lot of fun, except when it gets so tedious you start punching the wall just to watch your knuckles bleed <wink>. > Personally, I considered that you were already well past the point of > diminishing returns, Not yet -- false positives are a horrible thing, and the false negative rate still lets a lot of spam through. Cutting the f-n rate, e.g., in half, would mean half as much spam to deal with; generalization left to the reader. > and anything further was of academic interest to those who felt a desire to > tinker ... The best hope for reducing f-n lies in exploiting more header lines than I can test with my mixed corpora, and there's *tons* of room for improvement there (note that the f-n rate is more than 20x greater than the f-p rate now). Anyone who wants to tackle that with tedious experiment should first pick Neil Schemenauer's brain: he had a good start on that early last week. > (i.e. the hard work has been done, and everything else is just fun and > games :) If enough people (or just one dedicated person) waste enough time, > who knows what may come out. Hey - it worked for timsort didn't it ...? ;) Indeed so, and it works for this too -- never underestimate the power of working yourself sick. If you also *write* about it, you can make everyone else ill too by proxy <wink>. sharing-the-pain-ly y'rs - tim
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4