Main Page | Compound List | File List | Compound Members | File Members

MailFilter Class Reference

#include <MailFilter.h>

List of all members.

Public Types

enum  classification {
  BAD_VALUE, UNKNOWN, EMAIL, SUSPECT,
  GARBAGE
}

Public Member Functions

 MailFilter (SpamParameters &param)
 ~MailFilter ()

Private Member Functions

bool isFromLine (const char *buf)
bool copyToTempFiles ()
const char * getNewTempFileName ()
FILE * openFile (const char *fileName, const char *mode, const char *callingFunc)
void closeFile (FILE *fp, const char *fileName, const char *callingFunc)
bool writeLine (const char *buf, FILE *fp, const char *fileName, const char *callingFunc)
const char * readLine (char *buf, const size_t bufSize, FILE *fp, const char *fileName, const char *callingFunc)
void append_file (const char *srcfile, const char *destfile)
void error_append_file (const char *srcfile, const char *destfile)
classification checkMail (const char *tempFileName, SpamParameters &params, HeaderInfo &headInfo)

Private Attributes

size_t mFileCount
Logger log
std::vector< char * > fileNames


Detailed Description

The MailFilter class constructor is the entry point for email processing. The constructor calls MailFilter support routines that read the email from stdin into one or more temporary files (if there is more than one email). The checkMail function is called to analyze the email. Then MailFilter functions are called to append the email to the "inbox" file, the junk_mail file, the garbage_mail file or discard the email (if it is marked as garbage).

Definition at line 60 of file MailFilter.h.


Constructor & Destructor Documentation

MailFilter::MailFilter SpamParameters params  ) 
 

Filter email read from stdin.

Email is read from stdin. If there is more than one email, each email will be placed in a unique temporary file. Each file is then processed in an attempt to determine if it is valid email or spam.

Email is catagorized as EMAIL (or UNKNOWN), SUSPECT, or GARBAGE. Email is marked as garbage when a "kill_word" is found. If the "kill_base64" flag is included in the SpamFilterParams file, email that contains base64 encoded data will be marked as garbage. Email that is marked as garbage is only placed in the garbage file when the SpamFilterParams flag "keep_garbage" is included. Otherwise email marked as garbage is discarded. This keeps the number of headers that must be reviewed in the junk_mail file as low as possible.

Parameters:
params a reference to a SpamParameters object containing information from the SpamFilterParams file.

Definition at line 513 of file MailFilter.C.

References append_file(), checkMail(), copyToTempFiles(), error_append_file(), Logger::errorFound(), fileNames, Logger::getLogger(), SpamParameters::hasFlag(), Logger::log(), log, mFileCount, and HeaderInfo::subject().

00514 {
00515   // file for email
00516   const char* INBOX          = "inbox";
00517   // fiile for email that is suspected of being spam
00518   const char* SPAM           = "junk_mail";
00519   // File for email that is "garbage".
00520   const char* GARBAGE_MAIL   = "garbage_mail";
00521 
00522   mFileCount = 0;
00523   log = pLogger->getLogger("MailFilter");
00524   log.log(Logger::DEBUG, "MailFilter", "enter");
00525 
00526   bool doGarbageTrace = params.hasFlag("trace_garbage") &&
00527                         (! params.hasFlag("keep_garbage"));
00528 
00529   // read mail file from stdin into one or more temporary file
00530   if (copyToTempFiles()) {
00531     size_t numFiles = fileNames.size();
00532     for (int i = 0; i < numFiles; i++) {
00533       const char *tempFileName = fileNames[i];
00534 
00535       HeaderInfo headInfo( doGarbageTrace );
00536 
00537       char msg[256];
00538       classification kind = checkMail(tempFileName, 
00539                                       params,
00540                                       headInfo);
00541 
00542       Logger::LogLevel mode;
00543 
00544       switch (kind) {
00545       case UNKNOWN:
00546         {
00547           // If the email is classified as "UNKNOWN" then something is
00548           // wrong.  But we don't want to lose the email, so append it
00549           // to the inbox.
00550           sprintf(msg, "email classified as UNKNOWN");
00551           append_file( tempFileName, INBOX );
00552           mode = Logger::ERROR;
00553         }
00554         break;
00555       case EMAIL: 
00556         {
00557           sprintf(msg, "Subject: %s added to mail in %s", 
00558                   headInfo.subject(), INBOX );
00559           append_file( tempFileName, INBOX );
00560           mode = Logger::DEBUG;
00561         }
00562         break;
00563       case SUSPECT: {
00564         sprintf(msg, "Subject: %s added to suspected spam in %s", 
00565                 headInfo.subject(), SPAM );
00566         append_file( tempFileName, SPAM );
00567         mode = Logger::DEBUG;
00568       }
00569         break;
00570       case GARBAGE: {
00571         if (params.hasFlag("keep_garbage")) {
00572           sprintf(msg, "Subject: %s is garbage, copied to %s", 
00573                   headInfo.subject(), GARBAGE_MAIL );
00574           append_file( tempFileName, GARBAGE_MAIL );
00575         }
00576         else {
00577           sprintf(msg, "Subject: %s deleted", headInfo.subject() );
00578         }
00579         mode = Logger::DEBUG;
00580       }
00581         break;
00582       case BAD_VALUE: { // something went wrong processing the e-mail
00583         sprintf(msg, "Mail filter error: Subject = %s", headInfo.subject() );
00584         // Append it to the inbox so it is not lost.  The error_append_file
00585         // function will add a marker to the file to indicate that there
00586         // was an error
00587         error_append_file( tempFileName, INBOX );
00588         mode = Logger::ERROR;
00589       }
00590         break;
00591       default: {
00592         sprintf(msg, "bad classification value" );
00593         mode = Logger::ERROR;
00594       }
00595         break;
00596       } // switch
00597 
00598       log.log( mode, "MailFilter", msg );
00599 
00600       if (! log.errorFound()) {
00601         // remove temporary file
00602         sprintf(msg, "removing %s", tempFileName );
00603         log.log(Logger::DEBUG, "MailFilter", msg );
00604         int unlinkRslt = unlink( tempFileName );
00605         if (unlinkRslt != 0) {
00606           sprintf(msg, "error unlinking %s.  Error = %s\n",
00607                   tempFileName, strerror(errno));
00608           log.log(Logger::ERROR, "MailFilter", msg );
00609         }
00610       }
00611       else {
00612         sprintf(msg, "email that caused the error is in %s", tempFileName );
00613         log.log(Logger::ERROR, "MailFilter", msg );
00614       }
00615     } // for
00616   } // if copyToTempFiles
00617 
00618   log.log(Logger::DEBUG, "MailFilter", "exit");
00619 } // MailFilter constructor

MailFilter::~MailFilter  ) 
 

Recover the memory allocated for the temporary file name in fileNames.

Definition at line 166 of file MailFilter.C.

References fileNames.

00167 {
00168   size_t numFiles = fileNames.size();
00169   for (int i = 0; i < numFiles; i++) {
00170     char *pStr = fileNames[i];
00171     delete [] pStr;
00172   }
00173 } // ~MailFilter


Member Function Documentation

void MailFilter::append_file const char *  srcfile,
const char *  destfile
[private]
 

append_file

Append srcfile to destfile. This is used when the destination of the email is decided. The email will either be appended to the junk file or back to the email box.

A carriage return is added between the e-mails. This avoids having e-mails run together.

Definition at line 397 of file MailFilter.C.

References Logger::log(), and log.

Referenced by MailFilter().

00399 {
00400   char msgbuf[ 128];
00401   const char *read_only = "r";
00402   const char *append = "a+";
00403   FILE *read_fp;
00404   FILE *write_fp;
00405 
00406   log.log(Logger::DEBUG, "append_file", "enter");
00407   
00408   if ((read_fp = fopen( srcfile, read_only )) != NULL) {
00409     if ((write_fp = fopen( destfile, append )) != NULL) {
00410       char buf[ 4096 ];
00411       size_t amt_read;
00412       size_t amt_written;
00413 
00414       fprintf(write_fp, "\n");  // add a carriage return (blank line)
00415 
00416       while ((amt_read = fread(buf, 1, sizeof(buf), read_fp)) > 0) {
00417         amt_written = fwrite(buf, 1, amt_read, write_fp );
00418         if (amt_written < amt_read) {
00419           char *err_reason = strerror( errno );
00420           sprintf(msgbuf, "error writing file %s.  Reason = %s", destfile, err_reason);
00421           log.log(Logger::ERROR, "append_file", msgbuf );
00422         }
00423       } // while 
00424 
00425       fclose( write_fp );
00426     }
00427     else {
00428       char *err_reason = strerror( errno );
00429       sprintf(msgbuf, "append_file: error opening file %s.  Reason = %s", 
00430               destfile, err_reason );
00431       log.log(Logger::ERROR, "append_file", msgbuf );
00432     }
00433     fclose( read_fp );
00434   }
00435   else {
00436     char *err_reason = strerror( errno );
00437     sprintf( msgbuf, "append_file: error opening file %s.  Reason = %s", 
00438              srcfile, err_reason );
00439     log.log(Logger::ERROR, "append_file", msgbuf );
00440   }
00441   log.log(Logger::DEBUG, "append_file", "exit");
00442 }  // append_file

MailFilter::classification MailFilter::checkMail const char *  tempFileName,
SpamParameters params,
HeaderInfo headInfo
[private]
 

Attempt to determine of the email is valid or if it is spam.

The email header is checked first. After checking the email header, if it is still unknown whether the email is valid or spam, check the email body. If the email is not found to be "guilty" (e.g., spam) it is assumed to be innocent (valid email).

Parameters:
tempFileName the name of the file containing the email to be checked
params a reference to the SpamParameters object containing information from the SpamFilterParams file.
headInfo a reference to a HeaderInfo object which is used to encapsulate information about the email header. The HeaderInfo object is used to generate an garbage_trace entry (if garbage tracing is turned on and the email is discarded). HeaderInfo is also used to generate debug log messages.

Definition at line 466 of file MailFilter.C.

References MailBody::checkBody(), MailHeader::checkHeader(), MailHeader::getBoundaryStr(), HeaderInfo::klass(), Logger::log(), and log.

Referenced by MailFilter().

00469 {
00470   const char *mode = "r";
00471   classification mailClass = EMAIL;
00472   char msgbuf[256];
00473   log.log(Logger::DEBUG, "checkMail", "enter");
00474 
00475   FILE *fp = openFile(tempFileName, mode, "checkMail");
00476   if (fp != NULL) {
00477     MailHeader headFilter( params, headInfo );
00478     mailClass = headFilter.checkHeader(fp);
00479     if (mailClass == UNKNOWN) {
00480       MailBody bodyFilter( params, headInfo );
00481       const char *boundaryStr = headFilter.getBoundaryStr();
00482       mailClass = bodyFilter.checkBody(boundaryStr, fp);
00483       headInfo.klass(mailClass);
00484     }
00485     fclose( fp );
00486   }
00487 
00488   log.log(Logger::DEBUG, "checkMail", "exit");
00489   return mailClass;
00490 } // checkMail

bool MailFilter::copyToTempFiles  )  [private]
 

copyToTempFiles

Read one or more emails from stdin. Each email will be copied into a temporary file, whose name will be inserted into the fileNames vector.

When I started writing this software, I thought that it was one email per invocation of the mail filter. While testing the mail filter, I found that more than one email may arrive at one time. I don't know why this is. It could be the way mail is handled by my ISP. It could be that so much spam is sent out that SPAM clumps together. Or it could be that spammers include two emails in one mail transaction. What ever the case, this function will separate each email into its own temporary file.

Mail tools find the start of an email via the "From" line. The format for this line is:

From <address> <date>

The date is in UNIX/Linux ctime(3) format. If the date is not included mail software cannot find the start of the email. The same technique is used here.

Definition at line 285 of file MailFilter.C.

References getNewTempFileName(), isFromLine(), Logger::log(), log, readLine(), and writeLine().

Referenced by MailFilter().

00286 {
00287   const char *mode = "w";
00288   char buf[ 1024 ];
00289   char msgbuf[ 128];
00290   bool copyOK = true;
00291   const char *inLine = 0;
00292   const char *fileName = 0;
00293   FILE *fp = 0;
00294   bool firstFrom = true;
00295 
00296   log.log(Logger::DEBUG, "copyToTempFiles", "enter");
00297 
00298   do {
00299     if ((inLine = readLine(buf, sizeof(buf), stdin, 0, "copyToTempFiles"))) {
00300       if (isFromLine( buf )) {
00301         if (firstFrom) {
00302           firstFrom = false;
00303         }
00304         else {
00305           closeFile( fp, fileName, "copyToTempFiles");
00306         }
00307         fileName = getNewTempFileName();
00308         fp = openFile( fileName, mode, "copyToTempFiles" );
00309       } // isFromLine
00310       if (fp) {
00311         if (! writeLine(buf, fp, fileName, "copyToTempFiles")) {
00312           copyOK = false;
00313           break;
00314         }
00315       }
00316     } // inLine = readLine
00317     else {
00318       if (! feof(stdin)) {
00319         copyOK = false;
00320       }
00321       else {
00322         closeFile( fp, fileName, "copyToTempFiles");
00323       }
00324     }
00325   } while (inLine != 0);
00326 
00327   log.log(Logger::DEBUG, "copyToTempFiles", "exit");
00328 
00329   return copyOK;
00330 } // copyToTempFiles

void MailFilter::error_append_file const char *  srcfile,
const char *  destfile
[private]
 

Append the email to the "inbox" file and include an error line in the email header.

Something has gone wrong. The email should not be lost, so it is appended to the "inbox". An error line is included in the header.

Definition at line 341 of file MailFilter.C.

References Logger::log(), and log.

Referenced by MailFilter().

00343 {
00344   static const char *SUBJECT = "subject";
00345   static size_t SUBJECT_LEN = strlen( SUBJECT );
00346   char msgbuf[ 128];
00347   const char *read_only = "r";
00348   const char *append = "a+";
00349   FILE *read_fp;
00350   FILE *write_fp;
00351 
00352   log.log(Logger::DEBUG, "error_append_file", "enter");
00353   
00354   if ((read_fp = fopen( srcfile, read_only )) != NULL) {
00355     if ((write_fp = fopen( destfile, append )) != NULL) {
00356       char line[ 4096 ];
00357       size_t amt_read;
00358       size_t amt_written;
00359 
00360       fprintf(write_fp, "\n");  // add a carriage return (blank line)
00361 
00362       while (fgets(line, sizeof(line), read_fp) != 0) {
00363         fputs(line, write_fp);
00364         // append "X-MailFilterError:" after the "Subject:" line
00365         if (SpamUtil().match(line, SUBJECT_LEN, SUBJECT)) {
00366           fprintf(write_fp, "X-MailFilterError:\n");
00367         }
00368       } // while 
00369 
00370       fclose( write_fp );
00371     }
00372     else {
00373       sprintf(msgbuf, "error opening file %s", destfile );
00374       log.log(Logger::ERROR, "error_append_file", msgbuf );
00375     }
00376     fclose( read_fp );
00377   }
00378   else {
00379     sprintf( msgbuf, "error opening file %s", srcfile );
00380     log.log(Logger::ERROR, "error_append_file", msgbuf );
00381   }
00382   log.log(Logger::DEBUG, "error_append_file", "exit");  
00383 } // error_append_file

const char * MailFilter::getNewTempFileName  )  [private]
 

Create a new file name and enter it into the fileNames vector.

Definition at line 50 of file MailFilter.C.

References fileNames, and mFileCount.

Referenced by copyToTempFiles().

00051 {
00052   const char* TEMP_NAME_ROOT = "mail_temp";
00053   const size_t BUF_SIZE = 64;;
00054   char *pBuf = new char[ BUF_SIZE ];
00055   
00056   int pid = getpid();
00057   // create a unique temporary file name
00058   mFileCount++;
00059   sprintf(pBuf, "%s_%d_%d", TEMP_NAME_ROOT, pid, mFileCount );
00060   fileNames.push_back( pBuf );
00061   return pBuf;
00062 } // getNewTempFileName

bool MailFilter::isFromLine const char *  buf  )  [private]
 

Check to see if the line in buf is a From line that starts an email.

The start of an email is recognized by the leading From line. This line has the format:

From some@address.com <date>

Where <date> is <dow> <mon> dd hh:mm:ss yyyy (UNIX/Linux ctime(3) format)

<dow> = a three letter day of the week: Sun, Mon, Tue, Wed, Thu, Fri, Sat <mon> = a three letter month: Jan, Feb, Mar, Apr, May, Jun, Jul, Aug,... dd = numeric day of the month (e.g., 1..31) hh = hour (1..24) mm = minute ss = second yyyy = year

This function looks for "From ", followed by a date. A date is recognized by looking for a valid day of the week, a valid month and a valid day of the month. While is is till possible that another line in an email will be accidently recognized, it is very unlikely.

Definition at line 206 of file MailFilter.C.

References Logger::log(), and log.

Referenced by copyToTempFiles().

00207 {
00208   static const char *FROM = "From ";
00209   static const size_t FROM_LEN = strlen( FROM );
00210   static const char *dow[] = {"Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat", 0 };
00211   static const char *mon[] =  {"Jan", "Feb", "Mar", "Apr",
00212                                "May", "Jun",  "Jul",  "Aug",  
00213                                "Sep",  "Oct",  "Nov", "Dec", 0 };
00214   char msgbuf[128];
00215   bool isFrom = false;
00216 
00217   if (buf != 0) {
00218     // find "From "
00219     if (strncmp(buf, FROM, FROM_LEN) == 0) {
00220       sprintf(msgbuf, "Found from line: %s", buf );
00221       log.log(Logger::DEBUG, "isFromLine", msgbuf);
00222       // find a day-of-the-week
00223       const char *pDOW = 0;
00224       for (const char **pDay = dow; *pDay != 0; pDay++) {
00225         if ((pDOW = strstr(buf, *pDay)) != 0) {
00226           break;
00227         }
00228       } // for
00229       if (pDOW != 0) {// if a day-of-the-week was found, find a month
00230         const char *pStartMonth = 0;
00231         for (const char **pMon = mon; *pMon != 0; pMon++) {
00232           if ((pStartMonth = strstr(pDOW+3, *pMon)) != 0) {
00233             break;
00234           }
00235         } // for
00236         // look for the day of the month (1..31)
00237         if (pStartMonth != 0) {
00238           const char *pDate = pStartMonth + 3;
00239           pDate = SpamUtil().skipWhiteSpace( pDate );
00240           if (isdigit(*pDate)) {
00241             int date = atoi( pDate );
00242             if (date >= 1 && date <= 31) {
00243               log.log(Logger::DEBUG, "isFromLine", "Found start of email");
00244               isFrom = true;
00245             }
00246           }
00247         }
00248       } // if pDow != 0
00249     }
00250   }
00251   return isFrom;
00252 } // isFromLine

const char * MailFilter::readLine char *  buf,
const size_t  bufSize,
FILE *  fp,
const char *  fileName,
const char *  callingFunc
[private]
 

Read a line of text. Print an error message to the log file if there is an error.

Definition at line 137 of file MailFilter.C.

References Logger::log(), and log.

Referenced by copyToTempFiles().

00142 {
00143   char *inLine = 0;
00144   *buf = '\0';
00145   if ((inLine = fgets( buf, bufSize, fp )) == 0) {
00146     if (! feof(fp)) {
00147       char msgbuf[128];
00148       char *err_reason = strerror( errno );
00149       if (fileName != 0) {
00150         sprintf(msgbuf, "Error reading from %s.  Reason = %s", fileName, err_reason );
00151       }
00152       else if (fp == stdin) {
00153         sprintf(msgbuf, "Error reading from stdin.  Reason = %s", err_reason );
00154       }
00155       log.log(Logger::ERROR, callingFunc, msgbuf );
00156     }
00157   }
00158   return inLine;
00159 } // readLine

bool MailFilter::writeLine const char *  buf,
FILE *  fp,
const char *  fileName,
const char *  callingFunc
[private]
 

Write a line of text. Print a log message to the log file if there is an error. Note that the fgets result is compared to EOF, rather than zero. This is necessary for portability, since apparently zero is not necessarily returned on success.

Definition at line 115 of file MailFilter.C.

References Logger::log(), and log.

Referenced by copyToTempFiles().

00119 {
00120   bool writeOK = true;
00121 
00122   if (fputs( buf, fp ) == EOF) {
00123     writeOK = false;
00124     char msgbuf[128];
00125     char *err_reason = strerror( errno );
00126     sprintf(msgbuf, "Error writing to %s.  Reason = %s", fileName, err_reason );
00127     log.log(Logger::ERROR, callingFunc, msgbuf );
00128   }
00129   return writeOK;
00130 } // writeLine


Member Data Documentation

std::vector<char *> MailFilter::fileNames [private]
 

The names of the temporary files created for the emails read from stdin

Definition at line 72 of file MailFilter.h.

Referenced by getNewTempFileName(), MailFilter(), and ~MailFilter().

Logger MailFilter::log [private]
 

Logger object

Definition at line 70 of file MailFilter.h.

Referenced by append_file(), checkMail(), copyToTempFiles(), error_append_file(), isFromLine(), MailFilter(), readLine(), and writeLine().

size_t MailFilter::mFileCount [private]
 

A count for the temporary files created for the emails read from stdin

Definition at line 68 of file MailFilter.h.

Referenced by getNewTempFileName(), and MailFilter().


The documentation for this class was generated from the following files:
Generated on Sat Mar 27 13:07:38 2004 for Mail Filter by doxygen 1.3.3