Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

File Type Identification and E-Mail Spam Filtering


Affiliations
1 Department of Computer Science and Engineering, Anna University, Chennai, India
     

   Subscribe/Renew Journal


The widespread use of email has provided malicious users an easy way to distribute harmful content to the internal network. Hackers can easily circumvent the protection offered by a firewall by tunneling through the email protocol, since it does not analyze email content. Organizations often fail to acknowledge that there is a great risk of crucial data being stolen from within the company. Identifying the true type of a computer file is a difficult and important problem as hackers and malicious users use either non-standard file formats or change the extensions of files while storing or transmitting them over a network bypassing the firewall from filtering. This makes recovering data out of these files difficult and confidential data being sent away from organizations in different allowable file formats. Previous methods of file type recognition include fixed file extensions, fixed “magic numbers” stored with the files, and proprietary descriptive file wrappers. All of these methods have significant limitations. Hence it is proposed to have  an content based approach for   generating “fingerprints” of file types based on a set of known input files, then using the fingerprints to recognize the true type of unknown files based on their content, rather than metadata associated with them.
 E-mail spam has become an epidemic problem that can negatively affect the usability of electronic mail as a communication means. Besides wasting users’ time and effort to scan and delete the massive amount of junk e-mails received, it consumes network bandwidth and storage space, slows down e-mail servers, and provides a medium to distribute harmful and/or offensive content. Inspired by the success of fuzzy similarity in text classification and document retrieval, the approach investigates its effectiveness in filtering spam based on the textual content of e-mail messages.

Keywords

Fingerprint (File Print), Spam, Signature.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 216

PDF Views: 5




  • File Type Identification and E-Mail Spam Filtering

Abstract Views: 216  |  PDF Views: 5

Authors

R. Dhanalakshmi
Department of Computer Science and Engineering, Anna University, Chennai, India
C. Chellappan
Department of Computer Science and Engineering, Anna University, Chennai, India

Abstract


The widespread use of email has provided malicious users an easy way to distribute harmful content to the internal network. Hackers can easily circumvent the protection offered by a firewall by tunneling through the email protocol, since it does not analyze email content. Organizations often fail to acknowledge that there is a great risk of crucial data being stolen from within the company. Identifying the true type of a computer file is a difficult and important problem as hackers and malicious users use either non-standard file formats or change the extensions of files while storing or transmitting them over a network bypassing the firewall from filtering. This makes recovering data out of these files difficult and confidential data being sent away from organizations in different allowable file formats. Previous methods of file type recognition include fixed file extensions, fixed “magic numbers” stored with the files, and proprietary descriptive file wrappers. All of these methods have significant limitations. Hence it is proposed to have  an content based approach for   generating “fingerprints” of file types based on a set of known input files, then using the fingerprints to recognize the true type of unknown files based on their content, rather than metadata associated with them.
 E-mail spam has become an epidemic problem that can negatively affect the usability of electronic mail as a communication means. Besides wasting users’ time and effort to scan and delete the massive amount of junk e-mails received, it consumes network bandwidth and storage space, slows down e-mail servers, and provides a medium to distribute harmful and/or offensive content. Inspired by the success of fuzzy similarity in text classification and document retrieval, the approach investigates its effectiveness in filtering spam based on the textual content of e-mail messages.

Keywords


Fingerprint (File Print), Spam, Signature.