Open Access
Subscription Access
Open Access
Subscription Access
File Type Identification and E-Mail Spam Filtering
Subscribe/Renew Journal
The widespread use of email has provided malicious users an easy way to distribute harmful content to the internal network. Hackers can easily circumvent the protection offered by a firewall by tunneling through the email protocol, since it does not analyze email content. Organizations often fail to acknowledge that there is a great risk of crucial data being stolen from within the company. Identifying the true type of a computer file is a difficult and important problem as hackers and malicious users use either non-standard file formats or change the extensions of files while storing or transmitting them over a network bypassing the firewall from filtering. This makes recovering data out of these files difficult and confidential data being sent away from organizations in different allowable file formats. Previous methods of file type recognition include fixed file extensions, fixed “magic numbers” stored with the files, and proprietary descriptive file wrappers. All of these methods have significant limitations. Hence it is proposed to have an content based approach for generating “fingerprints” of file types based on a set of known input files, then using the fingerprints to recognize the true type of unknown files based on their content, rather than metadata associated with them.
E-mail spam has become an epidemic problem that can negatively affect the usability of electronic mail as a communication means. Besides wasting users’ time and effort to scan and delete the massive amount of junk e-mails received, it consumes network bandwidth and storage space, slows down e-mail servers, and provides a medium to distribute harmful and/or offensive content. Inspired by the success of fuzzy similarity in text classification and document retrieval, the approach investigates its effectiveness in filtering spam based on the textual content of e-mail messages.
E-mail spam has become an epidemic problem that can negatively affect the usability of electronic mail as a communication means. Besides wasting users’ time and effort to scan and delete the massive amount of junk e-mails received, it consumes network bandwidth and storage space, slows down e-mail servers, and provides a medium to distribute harmful and/or offensive content. Inspired by the success of fuzzy similarity in text classification and document retrieval, the approach investigates its effectiveness in filtering spam based on the textual content of e-mail messages.
Keywords
Fingerprint (File Print), Spam, Signature.
User
Subscription
Login to verify subscription
Font Size
Information
Abstract Views: 210
PDF Views: 5