dspam Hackery 101, converting messages to mbox

Tuesday, July 24th, 2007 at 10:18 pm | 576 views | trackback url

dspam; go ahead, send me Viagra

I’ve been running dspam for many years here at gnu-designs, inc. We replaced SpamAssassin with it several years ago, because SA was just not getting any better at filtering spam away from user’s mailboxes.

After a few weeks of using dspam, our filtering performance was over 95%, and rising steadily. 3 years later, we weren’t seeing a single spam slip through to any user’s mailbox. It was very impressive.

Later, I added graymilter in front of dspam to thwart off even more spam. You can see the difference it made:

graymilter results

After a few years, Jonathan A. Zdiarski (author/maintainer of dspam) sold the dspam project to Sensory Networks, and it still continues to be updated on a regular basis today.

But there’s one thing that has always bugged me about dspam… the catchall for messages is stored in $DSPAM_HOME/dspam.messages, and it contains a concatenated list of all messages processed by dspam.

So far, so good… until you need to retrieve one back out of there.

The file is actually a literal concatenation of every message. With the amount of mail we receive, that file grows very large, very fast. I stumbled across a bug today with dspam where messages in the web interface were just vanishing after being forwarded back into the user’s mailbox as non-spam. I needed a way to go back in and retrieve the messages.

Enter the Swiss-Army Chainsaw again; Perl!

With a simple perl one-liner, I was able to turn this “useless text file” of concatenated messages into an mbox-format file I could load up in pine and read like a normal mailbox. From there, I could forward the false-positives back to the users, when the webui eats them for lunch. It looks like this:

perl -pi.$$ -e '$time=scalar(gmtime); s,^(Return-Path: .*)$,From dspam $time\n$1,g' dspam.messages

In simple terms, all this does is take the “Return-Path” line that appears at the start of each message, and pre-pends the “From ” line on the line right before it. Note that this is the “From ” (space after) line, not the “From:” (colon after) line. They are different.

That’s it. Now I can just do:

pine -f $DSPAM_HOME/dspam.messages -i

And away we go!

I’ll end up patching the source to produce this output at some point, but for now, this solved an immediate need I had to fix a critical problem.

I love dspam for what it does. The installation is not for the feint of heart, but after you get it set up, what it does is pure magic, and your users will love you for it. The web interface removes the bulk of the work of maintaining whitelists, filtering scores and other things, and delegates it to each user, where they can customize their own filtering however they choose.

Ending Spam, by Jonathan A. Zdiarski

Jonathan also wrote a book on dspam and filtering, and I highly recommend picking up a copy if you can.

Last Modified: Tuesday, July 24th, 2007 @ 22:18

Leave a Reply

You must be logged in to post a comment.

Bad Behavior has blocked 946 access attempts in the last 7 days.