.. Évaluation-sur-des-challenges-internationaux, 121 4.5.1 Intérêt des challenges internationaux, Feature Selection Challenge, p.121

S. Bay, Multivariate Discretization for Set Mining, Knowledge and Information Systems, vol.3, issue.4, pp.491-512, 2001.
DOI : 10.1007/PL00011680

P. Bertier and J. M. Bouroche, Analyse des données multidimensionnelles, 1981.

C. L. Blake and C. J. Merz, UCI Repository of machine learning databases, 1998.

M. Boullé, Khiops: A Discretization Method of Continuous Attributes with Guaranteed Resistance to Noise, Proceeding of the Third International Conference on Machine Learning and Data Mining in Pattern Recognition, pp.50-64, 2003.
DOI : 10.1007/3-540-45065-3_5

M. Boullé, Khiops: A Statistical Discretization Method of Continuous Attributes, Machine Learning, pp.53-69, 2004.
DOI : 10.1023/B:MACH.0000019804.29836.05

L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and regression trees. California, 1984.

J. Catlett, On changing continuous attributes into ordered discrete attributes, Proceedings of the European Working Session on Learning, pp.87-102, 1991.
DOI : 10.1007/BFb0017012

J. Dougherty, R. Kohavi, and M. Sahami, Supervised and Unsupervised Discretization of Continuous Features, Proceedings of the 12th International Conference on Machine Learning, pp.194-202, 1995.
DOI : 10.1016/B978-1-55860-377-6.50032-3

T. Elomaa and J. Rousu, Finding optimal multi-splits for numerical attributes in decision tree learning, 1996.

T. Elomaa and J. Rousu, General and efficient multisplitting of numerical attributes, Machine Learning, pp.201-244, 1999.

U. Fayyad and K. Irani, On the handling of continuous-valued attributes in decision tree generation, Machine Learning, pp.87-102, 1992.
DOI : 10.1007/BF00994007

W. D. Fischer, On Grouping for Maximum Homogeneity, Journal of the American Statistical Association, vol.40, issue.284, pp.789-798, 1958.
DOI : 10.2307/1907923

T. Fulton, S. Kasif, and S. Salzberg, Efficient Algorithms for Finding Multi-way Splits for Decision Trees, Proceeding of the Twelfth International Conference on Machine Learning, 1995.
DOI : 10.1016/B978-1-55860-377-6.50038-4

R. C. Holte, Very simple classification rules perform well on most commonly used datasets, Machine Learning, pp.63-90, 1993.

G. V. Kass, An Exploratory Technique for Investigating Large Quantities of Categorical Data, Applied Statistics, vol.29, issue.2, pp.119-127, 1980.
DOI : 10.2307/2986296

R. Kerber, Chimerge discretization of numeric attributes, Proceedings of the 10th International Conference on Artificial Intelligence, pp.123-128, 1991.

R. Kohavi and M. Sahami, Error-based and entropy-based discretization of continuous features, Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, pp.114-119, 1996.

I. Kononenko, I. Bratko, and E. Roskar, Experiments in automatic learning of medical diagnostic rules, 1984.

Y. Lechevallier, Recherche d'une partition optimale sous contrainte d'ordre total, 1990.
URL : https://hal.archives-ouvertes.fr/inria-00075311

H. Liu, F. Hussain, C. L. Tan, and M. Dash, Discretization: An enabling technique, Data Mining and Knowledge Discovery, vol.6, issue.4, pp.393-423, 2002.
DOI : 10.1023/A:1016304305535

J. R. Quinlan, C4.5: Programs for machine learning, 1993.

J. Rissanen, Modeling by shortest data description, Automatica, vol.14, issue.5, pp.465-471, 1978.
DOI : 10.1016/0005-1098(78)90005-5

P. M. Vitanyi and M. Li, Minimum description length induction, Bayesianism, and Kolmogorov complexity, IEEE Transactions on Information Theory, vol.46, issue.2, pp.46-446, 2000.
DOI : 10.1109/18.825807

URL : http://arxiv.org/abs/cs/9901014

D. A. Zighed, S. Rabaseda, and R. Rakotomalala, FUSINTER: A Method for Discretization of Continuous Attributes, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol.06, issue.03, pp.307-326, 1998.
DOI : 10.1142/S0218488598000264

D. A. Zighed, S. Rabaseda, R. Rakotomalala, and F. Feschet, Discretization methods in supervised learning, In Encyclopedia of Computer Science and Technology, vol.40, pp.35-50, 1999.

D. A. Zighed and R. Rakotomalala, Graphes d'induction, pp.327-359, 2000.

M. Springer-references and . Asseraf, Metric on decision trees and optimal partition problem, International Conference on Human System Learning, Proceedings of CAPS'3, 2000.

N. C. Berckman, Value grouping for binary decision trees, 1995.

C. L. Blake and C. J. Merz, UCI Repository of machine learning databases Web URL http, 1998.

M. Boullé, Khiops: A Statistical Discretization Method of Continuous Attributes, Machine Learning, pp.53-69, 2004.
DOI : 10.1023/B:MACH.0000019804.29836.05

M. Boullé, A robust method for partitioning the values of categorical attributes Revue des Nouvelles Technologies de l'Information, Extraction et gestion des connaissances, pp.173-182, 2004.

M. Boullé, A Bayesian Approach for Supervised Discretization. Data Mining V, Eds Zanasi, pp.199-208, 2004.

L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and Regression Trees. California, 1984.

B. Cestnik, I. Kononenko, and I. Bratko, ASSISTANT 86: A knowledge-elicitation tool for sophisticated users, Progress in Machine Learning, 1987.

P. A. Chou, Optimal partitioning for classification and regression trees, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.13, issue.4, pp.340-354, 1991.
DOI : 10.1109/34.88569

T. G. Dietterich, Approximate Statistical Tests for Comparing Supervised Classification Methods, Neural Computation, vol.10, issue.7, 1998.

J. Dougherty, R. Kohavi, and M. Sahami, Supervised and Unsupervised Discretization of Continuous Features, Proceedings of the Twelf International Conference on Machine Learning, pp.194-202, 1995.
DOI : 10.1016/B978-1-55860-377-6.50032-3

T. Fulton, S. Kasif, and S. Salzberg, Efficient Algorithms for Finding Multi-way Splits for Decision Trees, Proc. Thirteenth International Joint Conference on Artificial Intelligence, pp.244-255, 1995.
DOI : 10.1016/B978-1-55860-377-6.50038-4

D. J. Hand and K. Yu, Idiot Bayes ? not so stupid after all?, International Statistical Review, vol.69, pp.385-398, 2001.
DOI : 10.1111/j.1751-5823.2001.tb00465.x

C. N. Hsu, H. J. Huang, and T. Wong, Implications of the Dirichlet Assumption for Discretization of Continuous Variables in Naive Bayesian Classifiers, Machine Learning, pp.235-263, 2003.

G. V. Kass, An Exploratory Technique for Investigating Large Quantities of Categorical Data, Applied Statistics, vol.29, issue.2, pp.119-127, 1980.
DOI : 10.2307/2986296

R. Kass and A. Raftery, Bayes Factors, Journal of the American Statistical Association, vol.2, issue.430, pp.773-795, 1995.
DOI : 10.1080/01621459.1995.10476572

R. Kerber, Chimerge discretization of numeric attributes, Proceedings of the 10 th International Conference on Artificial Intelligence, pp.123-128, 1991.

S. Kullback, Information Theory and Statistics, 1959.

B. Annexe, A Bayes Optimal Approach for Value Partitioning BOULLE 1452

W. Langley, K. Iba, and . Thompson, An analysis of Bayesian classifiers, Proceedings of the 10th national conference on Artificial Intelligence, pp.223-228, 1992.

Y. Lechevallier, Recherche d'une partition optimale sous contrainte d'ordre total, 1990.
URL : https://hal.archives-ouvertes.fr/inria-00075311

D. Pyle, Data Preparation for Data Mining, 1999.

J. R. Quinlan, Induction of decision trees, Machine Learning, pp.81-106, 1986.
DOI : 10.1007/BF00116251

J. R. Quinlan, C4.5: Programs for Machine Learning, 1993.

G. Ritschard, D. A. Zighed, and N. Nicoloyannis, Maximizing association by grouping rows or columns of a crosstable, Math??matiques et sciences humaines, issue.154, pp.154-15581, 2001.
DOI : 10.4000/msh.2841

G. Ritschard, Partition BIC optimale de l'espace des prédicteurs, pp.99-110, 2003.

G. Schwarz, Estimating the Dimension of a Model, The Annals of Statistics, vol.6, issue.2, pp.461-464, 1978.
DOI : 10.1214/aos/1176344136

S. Inc, AnswerTree 3.0 User's Guide, 2001.

Y. Yang and G. Webb, On Why Discretization Works for Naive-Bayes Classifiers, Proceedings of the 16th Australian Joint Conference on Artificial Intelligence (AI), 2003.
DOI : 10.1007/978-3-540-24581-0_37

D. A. Zighed and R. Rakotomalala, Graphes d'induction References [Bay01] S. Bay. Multivariate discretization for set mining, Machine Learning, pp.327-359491, 2000.

L. Breiman, J. H. Friedman, R. A. Olshen, C. J. Stone-[-bm96-]-c, C. J. Blake et al., Classification and Regression Trees. California: Wadsworth International UCI repository of machine learning databases A Bayes optimal approach for partitioning the values of categorical attributes, MODL: a Bayes optimal discretization method for continuous attributes. Machine Learning, pp.1431-1452131, 1984.

M. Boullé, M. Boullé, P. Chapman, J. Clinton, R. Kerber et al., Regularization and averaging of the selective na¨?vena¨?ve Bayes classifier Optimization algorithms for bivariate evaluation of data grid models Advances in Data Analysis and Classification, International Joint Conference on Neural Networks CRISP-DM 1.0 : step-by-step data mining guide, pp.2989-2997, 2000.

D. B. Carr, R. J. Littlefield, W. L. Nicholson, and J. S. Littlefield, Scatterplot Matrix Techniques for Large N, Journal of the American Statistical Association, vol.82, issue.398, pp.424-436, 1987.
DOI : 10.2307/2289444

N. Friedman, D. Geiger, M. Goldsmidtfi92-]-u, K. Fayyad, and . Irani, Bayesian network classifiers On the handling of continuous-valued attributes in decision tree generation [GE03] I. Guyon and A. Elisseeff. An introduction to variable and feature selection, Machine Learning, pp.131-16387, 1992.
DOI : 10.1023/A:1007465528199

I. Guyon, S. Gunn, A. B. Hur, G. S. Drorhan80-]-t, and . Han, Feature Extraction: Foundations And Applications Design and Analysis of the NIPS2003 Challenge Multiple mutual informations an multiple interactions in frequency data Variable neighborhood search: principles and applications, HM01] P. Hansen and N. Mladenovic, pp.237-26326, 1980.
DOI : 10.1007/978-3-540-35488-8

G. V. Kass, G. Kohavi, M. Kwedlo, P. Kretowski, W. Langley et al., Wrappers for feature selection An evolutionary algorithm using multivariate discretization for decision rule induction An analysis of Bayesian classifiers A latent variable model for multivariate discretization, Principles of Data Mining and Knowledge Discovery Sixth European Working Session on Learning (EWSL91) 10th national conference on Artificial Intelligence The Seventh International Workshop on Artificial Intelligence and Statistics, pp.119-127273, 1980.

W. J. Mcgill, G. Govaertpyl99, ]. E. Shannon, H. Steck, and T. Jaakkola, Multivariate information transmission Block clustering of contingency table and mixture model Data preparation for data mining Programs for Machine Learning A mathematical theory of communication Bell systems technical journal Predictive discretization during model selection, VR03] R. Vilalta and I. Rish. A decomposition of classes via clustering to explain and improve naive Bayes Proceedings of the 14th European Conference on Machine Learning, pp.93-111, 1948.

G. I. Webb, J. R. Boughton, and Z. Wang, Not So Naive Bayes: Aggregating One-Dependence Estimators, Machine Learning, pp.5-24, 2000.
DOI : 10.1007/s10994-005-4258-6

D. A. Zighed, G. Ritschard, W. Erray, and V. M. Scuturici, Decision trees with optimal joint partitioning, International Journal of Intelligent Systems, vol.44, issue.7, pp.693-718, 2005.
DOI : 10.1002/int.20091

C. Annexe, . Avl62-]-g, E. M. Adelson-velskii, and . Landis, An algorithm for the organization of information UCI repository of machine learning databases, Optimal Bivariate Evaluation for Supervised Learning References, pp.263-2661259, 1962.

M. Boullé, Khiops: A Statistical Discretization Method of Continuous Attributes, Machine Learning, pp.53-69, 2004.
DOI : 10.1023/B:MACH.0000019804.29836.05

M. Boullé, A Bayes optimal approach for partitioning the values of categorical attributes, Journal of Machine Learning Research, vol.6, pp.1431-1452, 2005.

M. Boullé, MODL: A Bayes optimal discretization method for continuous attributes, Machine Learning, pp.131-165, 2006.
DOI : 10.1007/s10994-006-8364-x

M. Boullé, Optimal bivariate evaluation for supervised learning using data grid models Advances in Data Analysis and Classification, 2007. submitted. [Cat91] J. Catlett. On changing continuous attributes into ordered discrete attributes, Proceedings of the European Working Session on Learning, pp.87-102, 1991.

J. Dougherty, R. Kohavi, M. Sahamier96-]-t, J. Elomaa, and . Rousu, Supervised and unsupervised discretization of continuous features Finding optimal multi-splits for numerical attributes in decision tree learning, Proceedings of the 12th International Conference on Machine Learning NeuroCOLT. [FI92] U. Fayyad and K. Irani. On the handling of continuous-valued attributes in decision tree generation. Machine Learning, pp.194-20287, 1992.

D. Annexe, . Hm01-]-p, N. Hansen, and . Mladenovic, Optimization Algorithms for Bivariate Data Grid Models Variable neighborhood search: principles and applications Very simple classification rules perform well on most commonly used datasets, Machine Learning, pp.449-46763, 1993.

G. V. Kass, An Exploratory Technique for Investigating Large Quantities of Categorical Data, Applied Statistics, vol.29, issue.2, pp.119-127, 1980.
DOI : 10.2307/2986296

I. Kononenko, I. Bratko, E. Roskarker91-]-r, and . Kerber, Experiments in automatic learning of medical diagnostic rules Chimerge discretization of numeric attributes, Proceedings of the 10th International Conference on Artificial Intelligence, pp.123-128, 1984.

H. Liu, F. Hussain, C. L. Tan, and M. Dash, Discretization: An enabling technique, Qui93] J.R. Quinlan. C4.5: Programs for Machine Learning, pp.393-423, 1993.

T. Fulton, S. Kasif, and S. Salzberg, Efficient algorithms for finding multiway splits for decision trees, Proceedings of the Twelfth International Conference on Machine Learning, pp.244-251, 1995.

D. A. Zighed, S. Rabaseda, R. L. Rakotomalala-]-s, T. Horowitz, and . Pavlidis, FUSINTER: A Method for Discretization of Continuous Attributes, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol.06, issue.03, pp.307-326368, 1976.
DOI : 10.1142/S0218488598000264

J. Rissanen, Modeling by shortest data description, Automatica, vol.14, issue.5, pp.465-471, 1978.
DOI : 10.1016/0005-1098(78)90005-5

U. Ndili, R. Nowak, and M. Figueiredo, Coding theoretic approach to image segmentation, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205), 2001.
DOI : 10.1109/ICIP.2001.958055

V. Venkatachalam, R. D. Nowak, R. G. Baraniuk, and M. A. Figueiredo, Unsupervised sar image segmentation using recursive partitioning, Proc. SPIE, Algorithms for Synthetic Aperture Radar Imagery VII, pp.121-129, 2000.

C. E. Shannon, A mathematical theory of communication, Bell systems technical journal, 1948.

M. Boullé, MODL: A Bayes optimal discretization method for continuous attributes, Machine Learning, pp.131-165, 2006.
DOI : 10.1007/s10994-006-8364-x

L. Adelson-velskii, E. M. Adelson-velskii, and . Landis, An algorithm for the organization of information, Doklady Akademii Nauk SSSR, vol.146, issue.3, pp.263-2661259, 1962.

V. Adriaans, Adriaans and P. Vitányi. The power and perils of MDL. ArXiv Computer Science e-prints, 2006.

. Agrawal, Mining association rules between sets of items in large databases, Proceedings of the ACM SIGMOD conference on management of data, pp.207-216, 1993.

. Agrawal, Automatic subspace clustering of high dimensional data for data mining applications, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, pp.94-105, 1998.

. Bay, ]. S. Pazzani, M. J. Bay, and . Pazzani, Detecting change in categorical data, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining , KDD '99, pp.302-306, 1999.
DOI : 10.1145/312129.312263

]. S. Bay, Multivariate Discretization for Set Mining, Machine Learning, pp.491-512, 2001.
DOI : 10.1007/PL00011680

. Bekkerman, Multi-way distributional clustering via pairwise interactions, Proceedings of the 22nd international conference on Machine learning , ICML '05, pp.41-48, 2005.
DOI : 10.1145/1102351.1102357

]. N. Berckman, Value grouping for binary decision trees, 1995.

]. J. Berger, The case of objective bayesian analysis, Bayesian Analysis, vol.1, issue.3, pp.385-402, 2006.

S. J. Bernardo, A. F. Bernardo, and . Smith, Bayesian theory, 2000.
DOI : 10.1002/9780470316870

R. Birgé, Y. Birgé, and . Rozenholc, How many bins should be put in a regular histogram, Laboratoire de Probabilités et Modèles Aléatoires, 2002.
DOI : 10.1051/ps:2006001

M. L. Blake, C. J. Blake, and . Merz, UCI repository of machine learning databases, 1996.

]. H. Bock, Simultaneous clustering of objects and variables, Analyse des Données et Informatique, pp.187-203, 1979.

M. Boullé and C. Hue, Optimal Bayesian 2D-Discretization for Variable Ranking in Regression, Ninth international conference on discovery science, pp.53-64, 2006.
DOI : 10.1007/11893318_9

L. Boullé, A. Boullé, and . Larrue, Segmentation d'image couleur par grille de rectangles optimale selon une approche Bayesienne, 2007.

]. M. Boullé, Khiops: A Discretization Method of Continuous Attributes with Guaranteed Resistance to Noise, Proceedings of the Third International Conference on Machine Learning and Data Mining in Pattern Recognition, volume 2734 of LNAI, pp.50-64, 2003.
DOI : 10.1007/3-540-45065-3_5

]. M. Boullé, Khiops: A Statistical Discretization Method of Continuous Attributes, Machine Learning, pp.53-69, 2004.
DOI : 10.1023/B:MACH.0000019804.29836.05

]. M. Boullé, MODL : une méthode quasi-optimale de discrétisation supervisée, 2004.

]. M. Boullé, A Bayes optimal approach for partitioning the values of categorical attributes, Journal of Machine Learning Research, vol.6, pp.1431-1452, 2005.

]. M. Boullé, Optimal bin number for equal frequency discretizations in supervized learning, Journal of intelligent data analysis, vol.9, issue.2, pp.175-188, 2005.

]. M. Boullé, MODL: A Bayes optimal discretization method for continuous attributes, Machine Learning, pp.131-165, 2006.
DOI : 10.1007/s10994-006-8364-x

]. M. Boullé, Compression-based averaging of selective naive Bayes classifiers, Journal of Machine Learning Research, vol.8, pp.1659-1685, 2007.

]. M. Boullé, Optimal bivariate evaluation for supervised learning using data grid models Advances in Data Analysis and Classification, 2007.

]. M. Breiman, J. H. Friedman, R. A. Olshen, C. J. Stone, and ]. Breiman, Optimization algorithms for bivariate evaluation of data grid models Advances in Data Analysis and Classification Report on preliminary experiments with data grid models in the agnostic learning vs. prior knowledge challenge Classification and Regression Trees. California : Wadsworth International Bagging predictors, Proceedings of International Joint Conference on Neural Networks Machine Learning Machine Learning, pp.123-1405, 1984.

]. G. Castellan and ]. J. Catlett, Modified akaike's criterion for histogram density estimation On changing continuous attributes into ordered discrete attributes, Proceedings of the European Working Session on Learning, pp.87-102, 1991.

S. Cawley, Predictive uncertainty in environmental modelling Cestnik, I. Kononenko, and I. Bratko. Assistant 86 : A knowledge-elicitation tool for sophisticated users, International Joint Conference on Neural Networks Proceedings of the 2nd European Working Session on Learning, pp.11096-11103, 1987.
DOI : 10.1016/j.neunet.2007.04.024

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.324.3950

L. Chao, Y. Chao, and . Li-chapelle, Multivariate Interdependent Discretization for Continuous Attribute, Third International Conference on Information Technology and Applications (ICITA'05), pp.167-172, 2005.
DOI : 10.1109/ICITA.2005.188

. Chapman, CRISP-DM 1.0 : step-by-step data mining guide Nonparametric estimation of conditional quantiles using quantile regression trees Piecewisepolynomial regression trees, This is the first paper on polynomial regression trees, pp.561-576143, 1994.

N. S. Chlebus, S. H. Chlebus, and . Nguyen, On finding optimal discretizations for two attributes. Rough Sets and Current Trends in Computing, pp.537-544, 1998.

]. P. Chou, Optimal partitioning for classification and regression trees, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.13, issue.4, pp.340-354, 1991.
DOI : 10.1109/34.88569

C. , G. Chu, and Z. Ghahramani, Gaussian processes for ordinal regression, Journal of Machine Learning Research, vol.6, pp.1019-1041, 2005.

C. , K. Chu, S. G. Keerthi-]-w, and . Cochran, Some Methods for Strengthening the Common ?? 2 Tests, ICML '05 : Proceedings of the 22nd international conference on Machine LearningLinton, 2003 ] J. Connor-Linton. Chi square tutorial Crammer and Y. Singer. Pranking with ranking Proceedings of the Fourteenth Annual Conference on Neural Information Processing Systems (NIPS), pp.417-451, 1954.
DOI : 10.2307/3001616

. Dhillon, Information-theoretic co-clustering Mdl estimation for small sample sizes and its application to segmenting binary strings Supervised and unsupervised discretization of continuous features [ El-Yaniv and Souroujon, 2001 ] R. El-Yaniv and O. Souroujon. Iterative double clustering for unsupervised and semi-supervised learning, Proceedings of The Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining CVPR '97 : Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition Proceedings of the 12th International Conference on Machine Learning Proceedings of ECML-01, 12th European Conference on Machine Learning, pp.89-98, 1995.

R. Elomaa, J. R. Elomaa, and . Elomaa, Approximation algorithms for minimizing empirical error by axis-parallel hyperplanes Estimation of conditional densities and sensitivity measures in nonlinear dynamical systems ROC graphs : Notes and practical considerations for researchers On the handling of continuous-valued attributes in decision tree generation, Machine Learning Machine Learning : ECML 2005 Machine Learning Fayyad, G. Piatetsky-Shapiro, and P. Smyth. From data mining to knowledge discovery : An overview. In Advances in Knowledge Discovery and Data Mining, pp.201-244, 1992.

B. Ferrandiz, M. Ferrandiz, . Boullé-]-s, . D. Ferrandiz-]-w, . A. Fischer-]-r et al., Supervised evaluation of voronoi partitions Journal of intelligent data analysis Apprentissage supervisé à partir de données séquentielles On grouping for maximum of homogeneity The use of multiple measurements in taxonomic problems, Frank, L. Trigg, G. Holmes, and I. Witten. Naive Bayes for regression, pp.269-284789, 1936.

S. W. Frey, D. J. Frey, and . Slate, Letter recognition using Holland-style adaptive classifiers, Machine Learning, vol.2, issue.2, pp.161-182, 1991.
DOI : 10.1007/BF00114162

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.466.6265

G. Friedman, M. Friedman, and . Goldszmidt, Discretizing continuous attributes while learning bayesian networks, International Conference on Machine Learning, pp.157-165, 1996.

. Fulton, Efficient algorithms for finding multi-way splits for decision trees Subjective bayesian analysis : principles and practice, Proceedings of the Twelfth International Conference on Machine Learning Govaert and M. Nadif. Clustering with block mixture models . Pattern Recognition, pp.244-251403, 1995.

N. Govaert, M. Govaert, . Nadif-]-g, M. N. Govaert, and . Grünwald, An EM algorithm for the block mixture model, Govaert and Nadif Revue des Nouvelles Technologies de l'Information, pp.643-647457, 2005.
DOI : 10.1109/TPAMI.2005.69

]. I. Guyon and A. Elisseeff-guyon, An introduction to variable and feature selection Feature Extraction : Foundations And Applications Performance prediction challenge Agnostic learning vs. prior knowledge challenge, International Joint Conference on Neural Networks International Joint Conference on Neural Networks, pp.1157-1182, 2003.

]. I. Bibliographie-[-guyon, . J. Guyon-]-d, K. Hand, and . Yu, Design of experiments of the nips 2003 variable selection benchmark Idiot bayes ? not so stupid after all ?, Datasets.pdf. [ Hand and Yu, pp.69385-399, 2001.

M. Hansen, N. Hansen, and . Mladenovic, Variable neighborhood search: Principles and applications, European Journal of Operational Research, vol.130, issue.3, pp.449-467, 2001.
DOI : 10.1016/S0377-2217(00)00100-4

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.93.1769

B. Hansen, J. A. Yu, and . Herbrich, Direct clustering of a data matrix Advances in Large Margin Classifiers Robust Nonparametric Statistical Methods Bayesian model averaging : A tutorial Very simple classification rules perform well on most commonly used datasets A new probabilistic approach in rank regression with optimal bayesian partitioning Accepted for publication Une approche non paramétrique bayesienne pour l'estimation de densité conditionnelle sur les rangs Probability Theory The Logic of Science A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization Chimerge discretization of numeric attributes, Hettmansperger and McKean Machine Learning Extraction et gestion des connaissances Proceedings of 14th International Conference on Machine Learning Kass. An exploratory technique for investigating large quantities of categorical data Proceedings of the 10th International Conference on Artificial Intelligence, pp.746-774123, 1972.

K. , R. Kira, L. A. Rendell, and ]. Knuth, A practical approach to feature selection Sorting and Searching, ML92 : Proceedings of the ninth international workshop on Machine learning The Art of Computer Programming, pp.249-256, 1992.

J. Kohavi, G. Kohavi, and . John, Wrappers for feature subset selection, Artificial Intelligence, vol.97, issue.1-2, pp.273-324, 1997.
DOI : 10.1016/S0004-3702(97)00043-X

. Kohavi, ]. R. Sahami, M. Kohavi, and . Sahami, Error-based and entropy-based discretization of continuous features, Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, pp.114-119, 1996.

]. R. Kohavi, The power of decision tables, Proceedings of the European Conference on Machine Learning, pp.174-189, 1995.
DOI : 10.1007/3-540-59286-5_57

. Kononenko, Experiments in automatic learning of medical diagnostic rules, 1984.

C. A. Kurgan, .. J. Kurgan, and . Cios, CAIM discretization algorithm, IEEE Transactions on Knowledge and Data Engineering, vol.16, issue.2, pp.145-153, 2004.
DOI : 10.1109/TKDE.2004.1269594

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.12.9104

. Kwedlo, ]. W. Kretowski, M. Kwedlo, and . Kretowski, An Evolutionary Algorithm Using Multivariate Discretization for Decision Rule Induction, Principles of Data Mining and Knowledge Discovery, pp.392-397, 1999.
DOI : 10.1007/978-3-540-48247-5_48

S. Langley, S. Langley, and . Sage, Induction of Selective Bayesian Classifiers, Proceedings of the 10th Conference on Uncertainty in Artificial Intelligence, pp.399-406, 1994.
DOI : 10.1016/B978-1-55860-332-5.50055-9

. Langley, An analysis of Bayesian classifiers, 10th national conference on Artificial Intelligence, pp.223-228, 1992.

]. Y. Lechevallier, Recherche d'une partition optimale sous contrainte d'ordre total, 1990.
URL : https://hal.archives-ouvertes.fr/inria-00075311

. Lecun, Comparison of learning algorithms for handwritten digit recognition, International Conference on Artificial Neural Networks, pp.53-60, 1995.

. Li, ]. M. Vitanyi, P. M. Li, and . Vitanyi, An Introduction to Kolmogorov Complexity and Its Applications, 1997.

. Liu, Discretization : An enabling technique, Data Mining and Knowledge Discovery, vol.4, issue.6, pp.393-423, 2002.

]. W. Maass, Efficient agnostic PAC-learning with simple hypothesis, Proceedings of the seventh annual conference on Computational learning theory , COLT '94, pp.67-75, 1994.
DOI : 10.1145/180139.181016

]. R. Mamdouh, Data Preparation for Data Mining Using SAS, 2006.

]. P. Mccullagh, Regression model for ordinal data (with discussion) Quantile regression forests, Journal of the Royal Statistical Society B Journal of Machine Learning Research, vol.42, issue.7, pp.109-127983, 1980.

C. Monti, G. F. Monti, F. Cooper, R. Muhlenbach, and . Rakotomalala, A latent variable model for multivariate discretization Multivariate supervised discretization, a neighborhood graph approach, The Seventh International Workshop on Artificial Intelligence and Statistics Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM'02) Munteanu, 1996 ] P. Munteanu. Extraction de connaissances dans les bases de données Parole : Apport de l'apprentissage symbolique, pp.249-254, 1996.

G. and G. Nagesh, Block clustering of contingency table and mixture model A scalable parallel subspace clustering algorithm for massive data sets, Intelligent Data Analysis International Conference on Parallel Processing, pp.249-259, 2000.

]. R. Neal and ]. Provost, Probabilistic inference using Markov chain Monte Carlo methods Compression-based discretization of continuous attributes The case against accuracy estimation for comparing induction algorithms Data preparation for data mining, International Conference on Machine Learning Proceedings of the Fifteenth International Conference on Machine Learning Quinlan. C4.5 : Programs for Machine Learning, pp.456-463, 1986.

]. A. Raftery, Y. Zheng, and ]. Ritschard, Modeling by shortest data description Maximisation de l'association par regroupement de lignes ou de colonnes d'un tableau croisé Performance d'une heuristique d'agrégation optimale bidimensionnelle Partition bic optimale de l'espace des prédicteurs. Revue des Nouvelles Technologies de l'Information Le choix bayésien Principes et pratique Estimating the dimension of a model Averaged shifted histograms : Effective nonparametric density estimators in several dimensions A mathematical theory of communication, Extraction et gestion des connaissances systems technical journal, pp.465-471, 1948.

L. Shashua, A. Shashua, and . Levin, Ranking with large margin principles : two approaches, Proceedings of the Fiveteenth Annual Conference on Neural Information Processing Systems (NIPS), 2002.

T. Slonim, N. Slonim, and . Takeuchi, Document clustering using word clusters via the information bottleneck method [ Sturges, 1926 ] H.A. Sturges. The choice of a class interval Nonparametric quantile estimation, Research and Development in Information Retrieval The nature of statistical learning theory, pp.208-21565, 1926.

L. M. Vitányi, M. Vitányi, and . Li, Minimum description length induction, Bayesianism, and Kolmogorov complexity, Witten and Frank, 2000 ] I.H. Witten and E. Frank. Data Mining, pp.446-464, 2000.
DOI : 10.1109/18.825807

W. Yang, A comparative study of discretization methods for naive-Bayes classifiers Sipina-w(c) for windows : User's guide, Proceedings of the Pacific Rim Knowledge Acquisition Workshop Zighed and Rakotomalala, pp.159-173, 1996.

R. A. Zighed, R. Zighed, and . Rakotomalala, Graphes d'induction, 2000.

. Zighed, Fusinter : a method for discretization of continuous attributes for supervised learning Fuzziness and Knowledge-Based Systems Decision trees with optimal joint partitioning, International Journal of Uncertainty International Journal of Intelligent System, vol.6, issue.207, pp.307-326693, 1998.