Sunday, December 20, 2009

Research paper in NDSS 2010: Improving Spam Blacklisting Through Dynamic Thresholding and Speculative Aggregation

Our blacklist paper titled:

Improving Spam Blacklisting Through Dynamic Thresholding and Speculative Aggregation

and authored by:

Sushant Sinha, Michael Bailey, and Farnam Jahanian
University of Michigan, Ann Arbor, MI - 48109.

is going to be presented in Network and Distributed System Security (NDSS) Symposium, 2010 from 28th Febuary to 3rd March in San Deigo, California.

Here is the abstract:


Unsolicited bulk e-mail (UBE) or spam constitutes a significant
fraction of all e-mail connection attempts and routinely frustrates
users, consumes resources, and serves as an infection vector for
malicious software. In an effort to scalably and effectively reduce
the impact of these e-mails, e-mail system designers have increasingly
turned to blacklisting. Blacklisting (blackholing, block listing) is a
form of course-grained, reputation-based, dynamic policy enforcement
in which real-time feeds of spam sending hosts are sent to networks so
that the e-mail from these hosts may be rejected. Unfortunately,
current spam blacklist services are highly inaccurate and exhibit
both false positives and significant false negatives. In this paper, we
explore the root causes of blacklist inaccuracy and show that the
trend toward stealthier spam exacerbates the existing tension between
false positives and false negatives when assigning spamming IP
reputation. We argue that to relieve this tension, global aggregation
and reputation assignment should be replaced with local aggregation
and reputation assignment, utilizing preexisting global spam
collection, with the addition of local usage, policy, and reachability
information. We propose two specific techniques based on this premise,
\emph{dynamic thresholding} and \emph{speculative aggregation}, whose
goal is to improve the accuracy of blacklist generation. We
evaluate the performance and accuracy of these solutions in the
context of our own deployment consisting of 2.5 million production
e-mails and 14 million e-mails from spamtraps deployed in 11 domains
over a month-long period. We show that the proposed approaches
significantly improve the false positive and false negative rates when
compared to existing approaches.