Skip to Main content Skip to Navigation

Contribution au classement statistique mutualisé de messages électroniques (spam)

Abstract : Since the 90's, different machine learning methods were investigated and applied to the email classification problem (spam filtering), with very good but not perfect results. It was always considered that these methods are well adapted to filter messages to a single user and not filter to messages of a large set of users, like a community. Our approach was, at first, look for a better understanding of handled data, with the help of a corpus of real messages, before studying new algorithms. With the help of a logistic regression classifier with online active learning, we could show, empirically, that with a simple classification algorithm coupled with a learning strategy well adapted to the real context it's possible to get results which are as good as those we can get with more complex algorithms. We also show, empirically, with the help of messages from a small group of users, that the efficiency loss is not very high when the classifier is shared by a group of users.
Document type :
Complete list of metadata
Contributor : Bibliothèque MINES ParisTech Connect in order to contact the contributor
Submitted on : Monday, October 31, 2011 - 9:36:24 AM
Last modification on : Wednesday, November 17, 2021 - 12:30:54 PM
Long-term archiving on: : Wednesday, February 1, 2012 - 2:20:46 AM


  • HAL Id : pastel-00637173, version 1


José Márcio Martins da Cruz. Contribution au classement statistique mutualisé de messages électroniques (spam). Autre [cs.OH]. École Nationale Supérieure des Mines de Paris, 2011. Français. ⟨NNT : 2011ENMP0027⟩. ⟨pastel-00637173⟩



Record views


Files downloads