Computerworld

12 spam research projects that might make a difference

The latest developments in thwarting spammers, phishers and other cyber criminals

Those who commit cybercrime know they need to stay on the cutting edge of technology to come up with new and different ways to swindle people. Luckily, the good guys are also spending time in research labs developing ways to thwart the latest tricks employed by spammers, phishers and other criminals.

Below is a list of a dozen research projects underway that focus on new technology and techniques to stop spam. While in many cases these projects are reacting to exploits already in use, such as image spam and phishing, the work by these researchers is designed to counter spammers' current developments and may also lead to prevention of future ones. This list, by no means complete, contains select papers recently made public.

Image spam

Spam filter makers were stumped when image spam made its debut last Spring; by hiding the spam message inside an image that filters couldn't discern, spammers got their messages through to in-boxes.

"Learning Fast Classifiers for Image Spam" is the name of a research paper from the University of Pennsylvania that describes how filters can be tweaked to quickly determine whether or not an inbound message containing an image is spam. The paper discusses techniques that focus on simple properties of the image to make classifications as fast as possible, the development of an algorithm that can select features for classification based on speed and predictive power, and a just-in-time feature extraction that "creates features at classification time as needed by the classifier," according to the paper. Researchers claim a 90% to 99% success rate using real-world data in their own tests.

Another project, "Filtering Image Spam with near-Duplicate Detection," from Princeton University, also targets spam hidden in pictures. According to the researchers behind the project, image spam is often sent in batches with visually similar images that differ only with the application of randomization algorithms. The researchers propose a near-duplicate detection system that relies on traditional antispam filtering to whittle inbound mail down to a subset of spam images, then applies multiple image-spam filters to flag all the images that look like the spam caught by traditional means. The prototype, its developers say, has reached "high detection rates" and less than 0.001% false positive (legitimate mail classified as spam) rates.

Out of Georgia Tech comes "A Discriminative Classifier Learning Approach to Image Modeling and Spam Image Identification." This proposal takes a discriminative classifier learning approach to image modeling, so that image spam can be identified. By analyzing images extracted from a body of spam messages, the researchers have identified four key image properties: color moment, color heterogeneity, conspicuousness and self-similarity. Then multiclass characterization is applied to model the images, and a maximal figure-of-merit learning algorithm is proposed to design classifiers for identifying image spam. Researchers say when tested this approach classified 81.5% of spam images correctly.

Another approach is discussed in "Image Spam Filtering by Content Obscuring Detection," from researchers at the University of Cagliari in Italy. This paper reviews low-level image processing techniques that can recognize content obscuring tricks used by spammers -- namely, character breaking and character interference via background noise -- to fool optical character recognition-detection tools.

Page Break

Phishing

The practice of scamming e-mail recipients by convincing them to input personal or financial information into a Web site that then steals the information is nothing new, but continues to be of particular interest as phishers relentlessly modify their tactics to net more victims.

Carnegie Mellon University (CMU) has been researching why phishing attacks work and learned that a little bit of education regarding online fraud goes a long way. Early findings of the research, presented in October at the Anti-Phishing Working Group's eCrime Researchers Summit in Pittsburgh, showed that phishers are often successful because e-mail users ignore information that could help them recognize fraud.

Researchers at the university even developed an online game designed to teach Internet users about the dangers of phishing. Featuring a cartoon fish named Phil, the game, called Anti-Phishing Phil, has been tested in CMU's Privacy and Security Laboratory. Officials with the lab say users who spent 15 minutes playing the interactive, online game were better able to discern fraudulent Web sites than those who simply read tutorials about the threat.

Blacklisting

Blacklisting is the practice of publicizing known IP addresses that send spam so message-transfer agents won't accept connection requests from these senders; it's also used with Web sites that download malicious code so that inbound messages with URLs to these sites are blocked. Blacklisting has been around as long as Internet exploits, but because of the practice's inherently reactive nature (one must know that an IP address or Web site is "bad" before it can be blocked) researchers continue to try and perfect it.

From Dartmouth comes "Blacklistable Anonymous Credentials: Blocking Misbehaving Users without TTPs," or trusted third parties. Published at the end of September, this paper suggests the use of an anonymous credential system that can be used to blacklist misbehaving users without requiring the involvement of a TTP. Because blacklisted users would remain anonymous "misbehaviors can be judged subjectively without users fearing arbitrary deanonymization by a TTP," the paper states.

Researchers at Georgia Tech are looking into behavioral blacklisting. In the paper "Filtering Spam with Behavioral Blacklisting," the concept of having blacklisting techniques adapt to changes in spam senders is proposed. With a filtering system they call SpamTracker, e-mail senders are classified based on their sending behavior, rather than their identity. The filter uses fast clustering algorithms that react quickly to changes in sending behavior, researchers say.

Page Break

Beyond Spam: Spit and gray mail

Also at Georgia Tech, researchers are looking into how to prevent Spit, or Spam over Internet Telephony. "CallRank: Combating Spit using Call Duration, Social Networks, and Global Reputation," discusses the use of a system called CallRank that uses call duration to establish social-network linkages and reputations for callers. Based on this information, VoIP users can decide whether a caller is legitimate or not, they say.

Other research projects include rating detectors of gray mail, or messages that some people may consider spam, while others would not. In a paper called "Improve Spam Filtering by Detecting Gray Mail," Microsoft researchers propose three ways to detect gray mail and then compare their performance.

Out of IBM Research comes a paper that examines the efficacy of combining global and personal antispam filtering systems. The paper, called "Combining Global and Personal Anti-Spam Filtering," examines the advantages of using personally trained antispam filters that do better at statistical text learning, because they are tuned to the individual user's unique e-mail aspects, the researcher says.

Yet classifiers learned for a large number of users can leverage the information provided by each user across the whole group. The paper discusses how combining the two approaches improves overall spam detection.

Researchers aren't just looking into the technology behind spam, but also the economics. A project headed by researchers from CMU and the University of California at San Diego called "An Inquiry into the Nature and Causes of the Wealth of Internet Miscreants" studies the underground economy that the Internet has fostered. Examining activities such as credit-card fraud, identity theft, spamming, phishing, online credential theft, and the sale of compromised hosts, the paper attempts to explain how Internet abuse once considered a hobby has now turned into a multimillion-dollar business.