Internet Security Systems - AlertCon(TM)

PDF: The new spam frontier?

Posted by Ralf Iffert on July 17, 2007 at 10:41 AM EDT.

In 2005, spammers stumbled onto a new technique to evade text-based spam detection systems—image-based spam.  This trickle of activity in 2005 exploded into a predominant technique by the end of 2006 when the percentage of image-based spam reached about one third of all spam.

Our researchers have been tracking a new trend: the usage of PDFs in spam.  This evasion technique relies on the fact that (similar to image-based spam), many antispam programs do not parse PDF files or cannot use other heuristic methods to determine that an email with a PDF attachment is spam.

By the current small trickle of activity, it's obvious to us that spammers are simply experimenting with the technique now. However, as you have probably seen in your own inbox, they are having some success!

Here's a detailed breakdown of the emerging threat as documented by our researchers in Kassel, Germany...

  • On Wednesday June 20th, 2007, we saw the first PDF spam threat. The spams of this threat contained one PDF attachment. The attachment was always the same one (binary identical to all other PDFs of the threat). This spam threat ran for two days.
  • On Tuesday June 26th, 2007, two new types of PDF spam started:
  • One was similar to the first threat (same PDF attachment as in the first threat, but the email body was a little bit different from the first threat)
  • The other one contained varying PDF attachments (only one attachment per spam). Those PDF files contained only one image using the typical image spam characteristics (images with colored background and colored text written in wavy lines).

These first spam threats reached 3-4% of all spam we recorded during that time. Within the following days, the PDF spam threats declined and the overall percentage hovered at about 2%.

  • On Friday July 6th, 2007, PDF spam picked up again and reached 6-8% of all spam.  In some accounts, it has spiked as high as 20%. The PDF files still look similar to those of the first threats—most of the PDF spams are stock spams.  However, we have also seen other types like health-related spam. The size of the PDF attachments has, for the most part, varied between 5k and 90k.

If PDF spam evolves like image-based spam did, then we have to prepare for the possibility that PDF spam could account for 20% or more of all spam.  In fact, we may see this kind of volume increase happen much faster than the two-year rise of image-based spam.

Our researchers will be keeping their eye on this new vector and anticipate that, like the rise of file-format vulnerabilities, new document types may also be on the horizon.

Comments or opinions expressed on this Weblog are the opinions of the authors alone. They are not necessarily reviewed in advance by anyone but the individual authors, and neither IBM Internet Security Systems nor any other party necessarily agrees with them. The views expressed by outside contributors and links to outside websites do not represent the views of IBM Internet Security Systems, its management or employees. All content on this Weblog has been made available on an “as-is” basis, and IBM Internet Security Systems shall not be liable for any direct or indirect damages arising out of use of this Weblog.