Open Access Open Access  Restricted Access Subscription Access

A Schematic Representation of User Model Transfer for Email Virus Detection


Affiliations
1 Dept. of CSE, Sai Sudhir Institute of Engineering & Technology, Hyderabad, India
2 Dept of CSE, Tirumala Engineering College, Bogaram, Hyderabad, India
3 GITAM University, Vizag, India
 

Systems for learning to detect anomalous email behavior, such as worms and viruses, tend to build either per user models or a single global model. Global models leverage a larger training corpus but often model individual users poorly. Per-user models capture fine grained behaviors but can take a long time to accumulate sufficient training data. Approaches that combine global and per-user information have the potential to address these limitations. We use the Latent Dirichlet Allocation model to transition smoothly from the global prior to a particular user’s empirical model as the amount of user data grows. Preliminary results demonstrate long-term accuracy comparable to per-user models, while also showing near-ideal performance almost immediately on new users.

Keywords

Vulnerability, Baye’s Classification, Latent Dirichlet Allocation, Per-User Mixture Model, Global Mixture Model, SMTP Engines.
User
Notifications
Font Size

Abstract Views: 166

PDF Views: 0




  • A Schematic Representation of User Model Transfer for Email Virus Detection

Abstract Views: 166  |  PDF Views: 0

Authors

M. Sreedhar Reddy
Dept. of CSE, Sai Sudhir Institute of Engineering & Technology, Hyderabad, India
Manoj Alimilla
Dept of CSE, Tirumala Engineering College, Bogaram, Hyderabad, India
P. Viswanath Raghava
GITAM University, Vizag, India

Abstract


Systems for learning to detect anomalous email behavior, such as worms and viruses, tend to build either per user models or a single global model. Global models leverage a larger training corpus but often model individual users poorly. Per-user models capture fine grained behaviors but can take a long time to accumulate sufficient training data. Approaches that combine global and per-user information have the potential to address these limitations. We use the Latent Dirichlet Allocation model to transition smoothly from the global prior to a particular user’s empirical model as the amount of user data grows. Preliminary results demonstrate long-term accuracy comparable to per-user models, while also showing near-ideal performance almost immediately on new users.

Keywords


Vulnerability, Baye’s Classification, Latent Dirichlet Allocation, Per-User Mixture Model, Global Mixture Model, SMTP Engines.