Un-Bayesing SpamAssassin
It was getting crazy last week. I was getting more and more and more spam. I would go to bed, wake up 8 hours later and have 50+ messages waiting for me. My SpamAssassin was just completely falling down on the job.
At first I lowered the hit level, and that didn't seem to help. Then I went throught he .spamassassin/user_prefs and added points for the following:
score DISGUISE_PORN 3.0
score PORN_16 4.0
score PORN_MEMBERSHIP 4.0
score PORN_6 4.0
score PORN_PASSWORD 4.0
score MUST_BE_18 4.0
score ADULT_SITE 4.0
score BEST_PORN 4.0
score ITS_LEGAL 4.0
score MICROSOFT_EXECUTABLE 4.0
score X_OSIRU_DUL 0.0
score X_OSIRU_DUL_FH 0.0
score X_OSIRU_OPEN_RELAY 0.0
score X_OSIRU_SPAMWARE_SITE 0.0
score X_OSIRU_SPAM_SRC 0.0
score RCVD_IN_OSIRUSOFT_COM 0
score HTML_WEB_BUGS 4.0
score HTML_IMAGE_ONLY_02 4.0
score HTML_IMAGE_ONLY_04 4.0
score HTML_IMAGE_ONLY_06 4.0
score FORGED_YAHOO_RCVD 4.0
score HTML_MIME_NO_HTML_TAG 4.0
Nothing seemed to helping. I spent all day while I was working with a tail -f of .procmail.log in a window trying to monitor what was happening and comparing the spam headers I got to what was supposed but I couldn't figure it out. *Then* I noticed that many of the headers had a BAYES_0 in it. What that means is that the Bayesian filter had determined there was a 0% chance that the email was spam. Unlike what I first thought, instead of just leaving it at 0, the higher Bayes score actually *subtracted points* from the hit count, thus putting it under my spam limit.
Ahh. So first I modified the scores for the BAYES_xx but started getting false positives, which is bad. I was stumped a bit until my coworker Vineet told me that probably what had happened was the the Bayes filtering had "learned badly". Ahhhh. *That* made sense.
So instead of trying to untrain it, or whatever. I wacked the bayes_seen and bayes_tok files. I'm *sure* there are more elegant ways of doing it, but I figured I'd start from scratch and see if that helped. It definitely helped. I'm still getting *a ton* of spam, but Thunderbird is also helping.
Does anyone know the right score modified for attachments in general? I'm *sooo* fucking sick of that virus or whatever it is with insanely stupid text and a .zip file attachement. "I hate cleartext. Password is 21341254". AAAAhhh.
Anyways, that's my suggestion. I cannot wait until there are some solid solutions for Spam. It's just gotten to a crazy level lately.
-Russ