Raymond Hettinger wrote: > Is it too late to challenge a core design decision? > > Instead of multiplying probablities, use fuzzy logic methods. > Classify the indicators into damning, strong, weak, neautral, ... > > After counting the number of indicators in each class, make > a spam/ham decision that can be easily tweaked. This would > make it easy to implement variations of Tim's recent clear > win, where additional indicators are gathered until the > balance shifts sharply to one side. > > Some other advantages are: > -- easily interpreted score vectors (6 damning, 7 strong, 4 weak, ... ) > -- avoids mathematical issues with indicators not being independent > -- allows the addition of non-token based indicators. for instance, > a preponderance of caps would be a weak indicator. the presence > of caps separated by spaces would be a strong indicator. > -- the decision logic would be more intuitive > -- avoids the issue of having equal amounts of spam and ham in > the sample > > The core concept would stay the same -- it's really just a shift from > continuous to discrete. Hmm, there's nothing discrete about fuzzy logic (ok, this claim is 0.65% true ;-) The problem is more about multi-dimensional optimization where you are interested in distilling several different inputs into one value. A weighted average is the simplest form to use here and there are various multi-dimensional optimization algorithms around to aid in finding the "optimal" weights. Another approach would be using a shallow neural network. The only "problem" with these is that Tim generates a variable number of inputs, AFAICT, so that you'd have to use some preprocessing to make the number of inputs constant. Would make a nice internship project, I guess :-) -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4