Some time back, I documented a move from Spamassassin to Dspam, noting that

"Spamassassin works well, but it can be resource hungry, and I have had 
to draw up various scripts over the years to keep the accuracy up to 
scratch.  To give an idea of the amount of memory, for example, that is 
required, in our case, on a server running everything needed for an 
office, spamassassin uses a quarter of the used memory (not total 
memory, but still a big chunk.)"

I reluctantly went back to Spamassassin, though, as Dspam is no longer supported at all by Debian/Raspbian, and no longer available in Stretch.  The problem still exists, though, and in particular, the resource Spamassassin uses on a Raspberry Pi is considerable.  At times, too, mail could take 20 seconds or more to get through Spamassassin, and while that is usually no big deal, if spam should increase, or email volumes increase, the server could be overwhelmed.

The obvious alternative to Dspam is Rspamd.  This looks as though it uses a more mathematical approach than rules-based, and is clearly effective, running as it does on email systems for several millions of users. However, the versions with Debian/Raspbian Jessie and Stretch are very out of date - version 0.6 and 0.9, when at the time of writing version 1.63 has been released.  Further, the maintainer, who is clearly proud of his work, does not see small systems as his target, and does not support non-x86 systems as much. No up-to-date arm ports are available.  It maybe possible to compile Rspamd one-self, but I came across some posts that suggests that the build system in use may cause issues on Raspberry Pi-like systems.

Bogofilter was another option.  I had always considered this really just for use on client systems rather than at the server level, but actually, the various readme's suggest that the greater training given the greater throughput and variety on a server means it is actually best run on a server.  It is written in C and does not have a daemon (which causes some issues with permission - see later) but is fast.  Some long out-of-date reviews against other anti-spam systems, especially on the Linux Weekly News site, a much trusted source of information, suggested it was a capable alternative.

I will not go into the mechanisms used by anti-spam systems, but broadly speaking, anti-spam systems other than Spamassassin's rules-based approaches work by analysing mathematically the way words relate to each other in a corpus of good and bad emails.  I won't pretend to begin to understand the maths, but it seems that such statistical mechanisms really can work.  As I had good experience with Dspam, which uses such an approach, I was willing to give it a go.

But I could not find much information about how to use it with my choice  of email software, namely postfix and dovecot.  Eventually searches led me to Colin Stewart's site, owlfish.com.  His write-ups helped a great deal, although some of the ways he chose to implement bogofilter differed from the approach I wanted to take, with my decisions tailored by what I had learnt installing dspam.  The really helpful web page was this one.  What was more, it was a relatively recent posting, when most of the other information I found was years out of date.  So standing on Colin's giant digital shoulders, the following is what I ended up with.  It must be considered a work-in-progress, but currently is working as expected.

As usual with articles like this, make sure you understand what you are doing rather than just copying what I have done. Some of the advice here may cause butterflies to do whatever they do in rain forests if just done blindly, so I take no responsibility for whatever results you get from this article.

The big design difference between Colin's intention and mine was that Colin wanted each user on his system to have their own wordlist database to determine spam probability. This is a good idea, because different users have different email types from different sources. I would have been happy to try this approach, but ran into a couple of problems.  The biggest one was that I do not run postfix with virtual users, relying on real local users instead, and I could not work out how to tell bogofilter who the real user was, as opposed to the recipient of the email, which might be an alias. So the "user=vmail" line that makes it all happen would not be relevant.  So I decided instead on a simpler (I thought) approach of having a single wordlist database. Such an approach should, too, make permissions less of an issue, bu now that I have bodged something together, I am not sure that's necessarily true.

So I started with a working postfix system, which uses dovecot's deliver agent as the LDA.  The email process starts, and then postfix is configured to reject spammy-looking email, such as dodgy addresses, addresses on RBL lists and so on, quite aggressively and up-front.  That means that most - and I mean around 90% or more - email sent to the server gets no further than this point.  There are many tutorials about setting up suitable rules and block lists so I will not cover that here.  Interestingly, this approach was not well known in the early days of developing anti-spam defences.  I came across one site suggesting such an approach, but the comment was that it was probably not possible.  Well, it is, and it cuts by far the majority of spam immediately.  Once past those guards, the emails went through spamassassin and then were delivered by dovecot.  If Spamassassin flagged a message as spam, dovecot's sieve sorting would deliver it to the Junk folder.  Dovecot then had the anti-spam plugin installed and configured so that spamassassin could learn automatically by moving email into or out of the Junk folder manually.

I installed the sqlite version of bogofilter, partly because I have a long-standing dread of Berkley databases, and partly because some reading suggested the sqlite option was more robust and looked after itself.  The configuration defaults seem OK and nothing really needed to be done to /etc/bogofilter.conf

I adapted Colin's suggestion slightly, and added this line to postfix's master.cf:-

bogofilter      unix    -    n    n    -    -    pipe
        flags=Rq user=filter argv=/usr/local/bin/bogofilter-filter.sh -f ${sender} -- ${recipient}

You will notice that the user specified is "filter".  That means creating a user called filter, with a group called filter.  The "filter" user has no login rights, but does have rights to the /var/spool/bogofilter directory, which is where bogofilter wants to keep its sqlite database.  You need to set these rights manually.

I then followed Colin's suggestion of a file in /etc/postfix called bogofilter_access containing:-

/./   FILTER bogofilter:bogofilter

At the end of /etc/postfix/main.cf I have a section starting:-

      smtpd_recipient_restrictions =

This is where all the restrictions that stop the majority of spam are set out.  At the end of the list, I added:-

     check_client_access pcre:/etc/postfix/bogofilter_access,

That left just the script mentioned in master.cf to create.  Here I used the script suggested by bogofilter in the document on integrating postfix in /usr/share/doc/bogofilter-common:-

/usr/local/bin/bogofilter-filter.sh
    #!/bin/sh

    FILTER=/usr/bin/bogofilter
    FILTER_DIR=/var/spool/bogofilter
    # WARNING! The -i is crucial, else you may see
    # messages truncated at the first period that is alone on a line
    # (which can happen with several kinds of messages, particularly
    # quoted-printable)
    # -G is ignored before Postfix 2.3 and tells it that the message
    # does not originate on the local system (Gateway submission),
    # so Postfix avoids some of the local expansions that can leave
    # misleading traces in headers, such as local address
    # canonicalizations.
    POSTFIX="/usr/sbin/sendmail -G -i"
    export BOGOFILTER_DIR=/home/bogofilter

    # Exit codes from <sysexits.h>
    EX_TEMPFAIL=75
    EX_UNAVAILABLE=69

    cd $FILTER_DIR || \
        { echo $FILTER_DIR does not exist; exit $EX_TEMPFAIL; }

    # Clean up when done or when aborting.
    trap "rm -f msg.$$ ; exit $EX_TEMPFAIL" 0 1 2 3 15

    # bogofilter -e returns: 0 for OK, nonzero for error
    rm -f msg.$$ || exit $EX_TEMPFAIL
## Stevan note - add "-l" to get it to log to syslog
## probably needed for mailgraph too
    $FILTER -l -p -u -e > msg.$$ || exit $EX_TEMPFAIL

    exec <msg.$$ || exit $EX_TEMPFAIL
    rm -f msg.$$ # safe, we hold the file descriptor
    exec $POSTFIX "$@"
    exit $EX_TEMPFAIL

Don't forget to ensure the script is chmod'ed to execute.  Colin's approach is a little different, but the key aspect here is to run the message through the filter to get the header changed.

I could now run a test to ensure that the email was run through bogofilter and that it added an "X-Bogosity" header, indicating whether the message was ham, spam or unsure, along with a probability.  That seemed to work, but first messages were all marked "Unsure."  This is because of the need for training.  We don't have much spam lying around, but I pointed bogofilter at the spam we had, as well as the good email, to train it.  As we used maildir, an example such command is

bogofilter -s -B /mail/<user>/.maildir/.Junk/cur/

for spam and

bogofilter -n -B /mail/<user>/.maildir/cur/

for ham.  In the early days of bogofilter, they said you needed at least 1000 messages to train it properly.  We don't have that may spams although the system has 10s of thousands of good messages, though. Time will improve this, though.

Now on to the dovecot part.  As mentioned, dovecot's delivery agent gets the message to the right user.  We run sieve filtering, so I left the sieve scripts that sent Spamassassin's spam in place, working on a header as it does,  I will alter this by having system-wide scripts that can be set up to run before the user scripts are run that will catch bogofilter-marked spam.  Colin's web page describes this well, so I will not repeat it here.

But the real problem arose with the antispam plug-in.  This plug-in allows an anti-spam system to learn by manually moving an incorrectly flagged email into or out of the Junk folder.  Colin's approach assumed that each user has their own wordlst database, so bogofilter would be run as against the user database.  But I wanted a single database, and I wanted the "filter" user (remember that?) to run the script.  This meant that the relevant file in /etc/dovecot/conf.d/90-plugins.conf became:-

plugin {

antispam_backend = pipe
  antispam_debug_target = syslog
  antispam_signature = X-Bogosity
  antispam_signature_missing = move
  antispam_spam = Junk
  antispam_trash = trash;Trash;Deleted Items; Deleted Messages
  antispam_verbose_debug = 1

  antispam_pipe_program = /usr/local/bin/bogofilter-dovecot-antispam.sh
#  antispam_pipe_program_args = /var/spool/bogofilter/%u
  antispam_pipe_program_spam_arg = spam

  antispam_pipe_program_notspam_arg = ham

  antispam_pipe_tmpdir = /tmp
}

The script to be run, which does the work, is as follows:-

/usr/local/bin/bogofilter-dovecot-antispam.sh

#!/bin/bash

#Arg 1 - directory of the user's Bogofilter DB
#Arg 1 - spam or ham

if [ "$1" == "ham" ]; then
  sudo -u filter /usr/bin/bogofilter -e -p -l -Sn
  exit 0;
fi;

if [ "$1" == "spam" ]; then
  sudo -u filter /usr/bin/bogofilter -e -p -l -Ns
  exit 0;

fi;

Now came the tricky part.  Dovecot will call the script as whatever user, but I needed to run bogofilter as the "filter" user.  This meant changing the sudoers file by running "visudo" and adding:-

Cmnd_Alias BOGO=/usr/bin/bogofilter
ALL ALL=(filter) NOPASSWD: BOGO

I know this is a slight security issue, but only from users with a login on the system, and the risk is so low to be acceptable. It means that the sudo line in the script executes without needing an authorisation password, and that bogofilter can therefore be run by any user.    I did have to make sure that the wordlist that was created in /var/spool/bogofilter was writeable by the "filter" user or group.

--------

The result is that around 50MB of memory on the little Raspberry Pi are freed by not having to run spamd, the spamassassin daemon. Email whizz through much faster than with spamassassin.  It is too early to tell how effective this will all be, but to be honest, because of the excellent job postfix does right at the beginning, and the ability of Thunderbird, our preferred email client, to mop up the spams that slip through, we should not notice much increase in spam in our inboxes.  Time will tell.

It's not been as straight forward as setting up spamassassin, there are not as many sources of information, and there are clearly many different way of running bogofilter against the postfix-dovecot combination. But it seems like a necessary part when you choose to run an email system on as lightweight a system as a Raspberry Pi.