A. Com, Alexa Top Global Sites

A. Dell-'aera, Thug: a new low-interaction honeyclient, 2012.

. Anonymous, How to join Anonymous. http://anoninsiders.net/ how-to-join-anonymous-1527/index.html, 2013.

J. Bau, E. Bursztein, D. Gupta, and J. Mitchell, State of the Art: Automated Black-Box Web Application Vulnerability Testing, 2010 IEEE Symposium on Security and Privacy, pp.332-345, 2010.
DOI : 10.1109/SP.2010.27

C. M. Bishop, nformation Science and Statistics, Pattern Recognition and Machine Learning, 2006.

R. Böhme and G. Schwartz, Modeling cyber-insurance: Towards a unifying framework, Ninth Workshop on the Economics of Information Security, 2010.

K. Borgolte, C. Kruegel, and G. Vigna, Delta, Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security, CCS '13, 2013.
DOI : 10.1145/2508859.2516725

J. Caballero, C. Grier, C. Kreibich, and V. Paxson, Measuring pay-perinstall: The commoditization of malware distribution, Proceedings of the USENIX Security Symposium, 2011.

X. Chen, B. Francia, M. Li, B. Mckinnon, and A. Seker, Shared information and program plagiarism detection. Information Theory, IEEE Transactions on, vol.50, issue.7, pp.1545-1551, 2004.

S. Commtouch, Compromised Websites -An Owner's Perspective. http://stopbadware.org/pdfs/compromisedwebsites-an-owners-perspective .pdf, 2012.

M. Cova, C. Kruegel, and G. Vigna, Detection and analysis of drive-by-download attacks and malicious JavaScript code, Proceedings of the 19th international conference on World wide web, WWW '10, 2010.
DOI : 10.1145/1772690.1772720

N. Cristianini and J. Shawe-taylor, An introduction to support vector machines and other kernel-based learning methods, 2000.
DOI : 10.1017/CBO9780511801389

C. Curtsinger, B. Livshits, B. G. Zorn, and C. Seifert, Zozzle: Fast and precise in-browser javascript malware detection, USENIX Security Symposium, pp.33-48, 2011.

W. De-vries, Hosting provider antagonist automatically fixes vulnerabilities in customers' websites. https://www.antagonist.nl/bloghosting-provider-antagonist-automatically- fixes-vulnerabilities-in-customers-websites, 2012.

J. Delgado and R. Davidson, Knowledge Bases and User Profiling in Travel and Hospitality Recommender Systems, Proceedings of the ENTER 2002 Conference, pp.1-16, 2002.
DOI : 10.1016/S0160-7383(98)00010-3

B. Eshete, A. Villafiorita, and K. Weldemariam, BINSPECT: Holistic Analysis and Detection of Malicious Web Pages, Security and Privacy in Communication Networks, pp.149-166, 2010.
DOI : 10.1016/j.eswa.2009.05.023

. European-cybercrime-centre, European Cybercrime Centre (EC3) calls on young crime fighters everywhere, 2013.

B. Feinstein and D. Peck, Caffeine Monkey: Automated Collection, Detection and Analysis of Malicious JavaScript, Proceedings of the Black Hat Security Conference, 2007.

S. Garera, N. Provos, M. Chew, and A. D. Rubin, A framework for detection and measurement of phishing attacks, Proceedings of the 2007 ACM workshop on Recurring malcode, WORM '07, 2007.
DOI : 10.1145/1314389.1314391

D. Goodin, SQL injection taints BusinessWeek.com, 2008.

]. D. Goodin, Potent malware link infects almost 300,000 webpages, 2009.

G. Security and T. , Making the web safer

G. Keizer, Is Stuxnet the 'best' malware ever? http://www. infoworld.com/print, 2010.

W. Hobson, Cyber-criminals use SEO on topical trends, 2010.

A. Ikinci, T. Holz, and F. Freiling, Monkey-Spider: Detecting Malicious Websites with Low-Interaction Honeyclients, Proceedings of Sicherheit, Schutz und Zuverlässigkeit, 2008.

L. Invernizzi, P. M. Comparetti, S. Benvenuti, C. Kruegel, M. Cova et al., EvilSeed: A Guided Approach to Finding Malicious Web Pages, 2012 IEEE Symposium on Security and Privacy, pp.428-442, 2012.
DOI : 10.1109/SP.2012.33

G. Jacob, P. M. Comparetti, M. Neugschwandtner, C. Kruegel, and G. Vigna, A Static, Packer-Agnostic Filter to Detect Similar Malware Samples, Detection of Intrusions and Malware, and Vulnerability Assessment, pp.102-122, 2013.
DOI : 10.1007/978-3-642-37300-8_6

J. Jang, M. Woo, and D. Brumley, Towards Automatic Software Lineage Inference, Proceedings of the USENIX Security Symposium, 2013.

J. P. John, F. Yu, Y. Xie, A. Krishnamurthy, and M. Abadi, deseo: combating search-result poisoning, Proceedings of the 20th USENIX conference on Security, pp.20-20, 2011.

J. P. John, F. Yu, Y. Xie, A. Krishnamurthy, and M. Abadi, deSEO: Combating Search-Result Poisoning, Proceedings of the USENIX Security Symposium, 2011.

J. P. John, F. Yu, Y. Xie, A. Krishnamurthy, and M. Abadi, Heat-seeking honeypots, Proceedings of the 20th international conference on World wide web, WWW '11, pp.207-216, 2011.
DOI : 10.1145/1963405.1963437

J. P. John, F. Yu, Y. Xie, A. Krishnamurthy, and M. Abadi, Heat-seeking honeypots, Proceedings of the 20th international conference on World wide web, WWW '11, 2011.
DOI : 10.1145/1963405.1963437

. Jsunpack, [55] Kaspersky, Kaspersky Security Bulletin, 2010.

C. Ke, J. Oliver, and Y. Xiang, Analysis of the Australian Web Threat Landscape, 2013.

J. Kornblum, Identifying almost identical files using context triggered piecewise hashing, Digital Investigation, pp.91-97, 2006.
DOI : 10.1016/j.diin.2006.06.015

L. Ullman, Understand your hosting, five critical e-commerce security tips in five days

C. Leita and M. Dacier, SGNET: A Worldwide Deployable Framework to Support the Analysis of Malware Threat Models, 2008 Seventh European Dependable Computing Conference, 2008.
DOI : 10.1109/EDCC-7.2008.15

F. L. Lévesque, J. Nsiempba, J. M. Fernandez, S. Chiasson, and A. Somayaji, A clinical study of risk factors related to malware infections, Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security, CCS '13, 2013.
DOI : 10.1145/2508859.2516747

A. Liaw and M. Wiener, Classification and regression by randomforest, R News, p.18, 2002.

P. Likarish, E. Jung, and I. Jo, Obfuscated malicious javascript detection using classification techniques, 2009 4th International Conference on Malicious and Unwanted Software (MALWARE), 2009.
DOI : 10.1109/MALWARE.2009.5403020

L. Lu, R. Perdisci, and W. Lee, SURF, Proceedings of the 18th ACM conference on Computer and communications security, CCS '11, pp.467-476, 2011.
DOI : 10.1145/2046707.2046762

URL : https://hal.archives-ouvertes.fr/hal-00480752

J. Ma, L. Saul, S. Savage, and G. Voelker, Beyond blacklists, Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '09, 2009.
DOI : 10.1145/1557019.1557153

G. Maier, A. Feldmann, V. Paxson, R. Sommer, M. W. Vallentin et al., An assessment of overt malicious activity manifest in residential networks Expertise recommender: a flexible recommendation system and architecture, Detection of Intrusions and Malware, and Vulnerability Assessment Proceedings of the 2000 ACM conference on Computer supported cooperative work, pp.144-163, 2000.

S. E. Middleton, N. R. Shadbolt, and D. C. De-roure, Ontological user profiling in recommender systems, ACM Transactions on Information Systems, vol.22, issue.1, pp.54-88, 2004.
DOI : 10.1145/963770.963773

T. Moore and R. Clayton, Examining the impact of website take-down on phishing, Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit on , eCrime '07, pp.1-13, 2007.
DOI : 10.1145/1299015.1299016

T. Moore and R. Clayton, The consequence of non-cooperation in the fight against phishing, 2008 eCrime Researchers Summit, pp.1-14, 2008.
DOI : 10.1109/ECRIME.2008.4696968

T. Moore and R. Clayton, Evil Searching: Compromise and Recompromise of Internet Hosts for Phishing, Financial Cryptography, pp.256-272, 2009.
DOI : 10.1007/978-3-642-03549-4_16

T. Moore and R. Clayton, Financial cryptography and data security. chapter Evil Searching: Compromise and Recompromise of Internet Hosts for Phishing, pp.256-272, 2009.

A. Moshchuk, T. Bragin, S. Gribble, and H. Levy, A Crawler-based Study of Spyware in the Web, Proceedings of the Symposium on Network and Distributed System Security (NDSS), 2006.

M. Müter, F. Freiling, T. Holz, and J. Matthews, A generic toolkit for converting web applications into high-interaction honeypots, 2007.

J. Nazario, PhoneyC: A Virtual Client Honeypot, Proceedings of the USENIX Workshop on Large-Scale Exploits and Emergent Threats, 2009.

V. Nicomette, M. Kaâniche, E. Alata, and M. Herrb, Set-up and deployment of a high-interaction honeypot: experiment and lessons learned, Journal in Computer Virology, vol.39, issue.5, 2010.
DOI : 10.1007/s11416-010-0144-2

URL : https://hal.archives-ouvertes.fr/hal-00762596

L. Olejnik, C. Castelluccia, and A. Janc, Why Johnny Can't Browse in Peace: On the Uniqueness of Web Browsing History Patterns, 5th Workshop on Hot Topics in Privacy Enhancing Technologies, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00747841

K. Onarlioglu, U. O. Yilmaz, E. Kirda, and D. Balzarotti, Insights into User Behavior in Dealing with Internet Attacks, Proceedings of the Symposium on Network and Distributed System Security (NDSS), 2012.

O. Foundation and T. Spiderlabs, Owasp modsecurity core rule set project. https://www.owasp.org/index, p.2012

Y. Peng, G. Wang, G. Kou, and Y. Shi, An empirical study of classification algorithm evaluation for financial risk prediction, <ce:title>The Impact of Soft Computing for the Progress of Artificial Intelligence</ce:title>, pp.2906-2915, 2011.
DOI : 10.1016/j.asoc.2010.11.028

F. Pouget, M. Dacier, and V. H. Pham, V.h.: Leurre.com: on the advantages of deploying a large scale distributed honeypot platform, In: ECCE 2005, E-Crime and Computer Conference, pp.29-30, 2005.
URL : https://hal.archives-ouvertes.fr/hal-00129109

O. D. Project, DMOZ Open Directory Project, 2013.

N. Provos, A virtual honeypot framework, Proceedings of the USENIX Security Symposium, pp.1-14, 2004.

N. Provos, P. Mavrommatis, M. A. Rajab, and F. Monrose, All Your iFrames Point to Us, Proceedings of the USENIX Security Symposium, 2008.

N. Provos, D. Mcnamee, P. Mavrommatis, K. Wang, and N. Modadugu, The ghost in the browser analysis of web-based malware, Proceedings of the first conference on First Workshop on Hot Topics in Understanding Botnets, HotBots'07, pp.4-4, 2007.

N. Provos, M. A. Rajab, and P. Mavrommatis, Cybercrime 2.0, Communications of the ACM, vol.52, issue.4, pp.46-47, 2009.
DOI : 10.1145/1498765.1498782

Q. Research and . Inc, Data leak probe to PI industry

J. Quinlan, C4.5: Programs for machine learning, 1993.

]. D. Ramsbrock, R. Berthier, and M. Cukier, Profiling Attacker Behavior Following SSH Compromises, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07), 2007.
DOI : 10.1109/DSN.2007.76

P. Ratanaworabhan, B. Livshits, B. , and Z. , Nozzle: a defense against heap-spraying code injection attacks, Proceedings of the USENIX Security Symposium, 2009.

K. Rieck, T. Krueger, and A. Dewald, Cujo, Proceedings of the 26th Annual Computer Security Applications Conference on, ACSAC '10, 2010.
DOI : 10.1145/1920261.1920267

M. Roesch, Snort ? Lightweight Intrusion Detection for Networks, Proceedings of LISA '99: 13th Systems Administration Conference, 1999.

V. Roussev, Data Fingerprinting with Similarity Digests, Advances in Digital Forensics VI IFIP Advances in Information and Communication Technology, pp.207-226
DOI : 10.1007/978-3-642-15506-2_15

URL : https://hal.archives-ouvertes.fr/hal-01060620

A. Saebjornsen, J. Willcock, T. Panas, D. Quinlan, and Z. Su, Detecting code clones in binary executables, Proceedings of the eighteenth international symposium on Software testing and analysis, ISSTA '09, pp.117-128, 2009.
DOI : 10.1145/1572272.1572287

S. Ninja, Share prices and data breaches. https

C. Seifert and R. Steenson, Capture-HPC. https://projects. honeynet.org/capture-hpc, 2008.

C. Seifert, I. Welch, and P. Komisarczuk, Identification of malicious web pages through analysis of underlying DNS and web server relationships, 2008 33rd IEEE Conference on Local Computer Networks (LCN), 2008.
DOI : 10.1109/LCN.2008.4664306

C. Seifert, I. Welch, and P. Komisarczuk, Identification of Malicious Web Pages with Static Heuristics, 2008 Australasian Telecommunication Networks and Applications Conference, 2008.
DOI : 10.1109/ATNAC.2008.4783302

S. Shin and G. Gu, Conficker and beyond, Proceedings of the 26th Annual Computer Security Applications Conference on, ACSAC '10, pp.151-160, 2010.
DOI : 10.1145/1920261.1920285

R. Sommer and V. Paxson, Outside the Closed World: On Using Machine Learning for Network Intrusion Detection, 2010 IEEE Symposium on Security and Privacy, 2010.
DOI : 10.1109/SP.2010.25

S. Stigler, Fisher and the 5% Level, CHANCE, vol.21, issue.4, pp.12-12, 2008.
DOI : 10.1037/h0074554

B. Stone-gross, M. Cova, L. Cavallaro, B. Gilbert, M. Szydlowski et al., Your botnet is my botnet, Proceedings of the 16th ACM conference on Computer and communications security, CCS '09, 2009.
DOI : 10.1145/1653662.1653738

B. Stone-gross, M. Cova, C. Kruegel, and G. Vigna, Peering through the iframe, 2011 Proceedings IEEE INFOCOM, pp.411-415, 2011.
DOI : 10.1109/INFCOM.2011.5935193

S. J. Vaughan-nichols, How the Syrian Electronic Army took out the New York Times and Twitter sites. http://www.zdnet.com/how- the-syrian-electronic-army-took-out-the-new-york- times-and-twitter-sites-7000019989, 2013.

Y. Wang, D. Beck, X. Jiang, R. Roussev, C. Verbowski et al., Automated Web Patrol with Strider HoneyMonkeys: Finding Web Sites that Exploit Browser Vulnerabilities, Proceedings of the Symposium on Network and Distributed System Security (NDSS), 2006.

F. Wilcoxon, Individual comparisons by ranking methods, Breakthroughs in Statistics, pp.196-202, 1992.

G. Wondracek, T. Holz, C. Platzer, E. Kirda, and C. Kruegel, Is the internet for porn? An insight into the online adult industry, WEIS 2010, 9th Workshop on the Economics of Information Security, 2010.

H. Zhang, The Optimality of Naive Bayes, FLAIRS2004 conference, 2004.

J. Zhang, C. Seifert, J. W. Stokes, and W. Lee, ARROW, Proceedings of the 20th international conference on World wide web, WWW '11, pp.187-196, 2011.
DOI : 10.1145/1963405.1963435

URL : https://hal.archives-ouvertes.fr/hal-00762014