H. Akaike, Information theory and an extension of the maximum likelihood principle, Proc. 2nd International Symposium on Information Theory, 1973.

Y. Altun, M. Johnson, and T. Hofmann, Investigating loss functions and optimization methods for discriminative learning of label sequences, Proceedings of the 2003 conference on Empirical methods in natural language processing -, 2003.
DOI : 10.3115/1119355.1119374

Y. Altun, D. Mcallester, and M. Belkin, Maximum margin semi-supervised learning for structured variables, Advances in Neural Information Processing Systems (NIPS), 2005.

G. Andrew and J. Gao, Scalable training of l1-regularized log-linear models, Proceedings of the 24th international conference on Machine learning (ICML), pp.33-40, 2007.

F. Bach, Active learning for misspecified generalized linear models, Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS), 2006.

M. Balcan, A. Blum, and S. Vempala, Kernels as features: On kernels, margins, and low-dimensional mappings, Machine Learning, vol.44, issue.5, pp.79-94, 2006.
DOI : 10.1007/s10994-006-7550-1

L. E. Baum, T. Petrie, G. Soules, and N. Weiss, A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains, The Annals of Mathematical Statistics, vol.41, issue.1, pp.164-171, 1970.
DOI : 10.1214/aoms/1177697196

Y. Benajiba and P. Rosso, Arabic named entity recognition using conditional random fields, Arabic Language and local languages processing: Status Updates and Prospects, 6th Int. Conf. on Language Resources and Evaluation, 2008.

O. Bender, F. J. Och, and H. Ney, Maximum entropy models for named entity recognition, Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 -, pp.148-151, 2003.
DOI : 10.3115/1119176.1119196

A. L. Berger, V. J. Pietra, P. , and S. A. , A maximum entropy approach to natural language processing, Comput. Linguist, vol.22, issue.1, pp.39-71, 1996.

J. Besag, Statistical analysis of non-lattice data. The Statistician, pp.179-195, 1975.

S. Bickel, M. Brückner, and T. Scheffer, Discriminative learning for differing training and test distributions, Proceedings of the 24th international conference on Machine learning, ICML '07, 2007.
DOI : 10.1145/1273496.1273507

J. A. Bilmes, A gentle tutorial on the em algorithm and its application to parameter estimation for gaussian mixture and hidden markov models, 1998.

A. Blum and T. Mitchell, Combining labeled and unlabeled data with co-training, Proceedings of the eleventh annual conference on Computational learning theory , COLT' 98, pp.92-100, 1998.
DOI : 10.1145/279943.279962

P. Blunsom, Maximum entropy markov models for semantic role labelling, Proceedings of the Australasian Language Technology Workshop, 2004.

P. Blunsom and T. Cohn, Discriminative word alignment with conditional random fields, Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the ACL , ACL '06, pp.65-72, 2006.
DOI : 10.3115/1220175.1220184

L. Bottou, Une Approche théorique de l'Apprentissage Connexionniste: Applicationsàcationsà la Reconnaissance de la Parole, 1991.

L. Bottou, Stochastic Learning, Advanced Lectures on Machine Learning, pp.146-168, 2004.
DOI : 10.1007/978-1-4757-2440-0

L. Bottou, Stochastic gradient descent (SGD) implementation, 2007.

G. Bouchard and B. Triggs, The trade-off between generative and discriminative classifiers, IASC 16th International Symposium on Computational Statistics, pp.721-728, 2004.

S. Boyd and L. Vandenberghe, Convex Optimization, 2004.

U. Brefeld and T. Scheffer, Semi-supervised learning for structured output variables, Proceedings of the 23rd international conference on Machine learning , ICML '06, pp.145-152, 2006.
DOI : 10.1145/1143844.1143863

O. Cappé and E. Moulines, Recursive computation of the score and observed information matrix in hidden markov models, IEEE/SP 13th Workshop on Statistical Signal Processing, 2005, 2005.
DOI : 10.1109/SSP.2005.1628685

O. Cappé, E. Moulines, and T. And-rydén, Inference in Hidden Markov Models, 2005.

X. Carreras, L. Padró, and L. , Named Entity Extraction using AdaBoost, proceeding of the 6th conference on Natural language learning , COLING-02, pp.167-170, 2002.
DOI : 10.3115/1118853.1118857

X. Carreras and L. , Phrase recognition by filtering and ranking with perceptrons, Proceedings of the International Conference on Recent Advances in Natural Language Processing, 2003.
DOI : 10.1075/cilt.260.22car

V. Castelli and T. Cover, The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter, IEEE Transactions on Information Theory, vol.42, issue.6, pp.2102-2117, 1996.
DOI : 10.1109/18.556600

O. Chapelle, B. Schölkopf, and A. Zien, Semi-Supervised Learning, 2006.
DOI : 10.7551/mitpress/9780262033589.001.0001

O. Chapelle and A. Zien, Semi-supervised classification by low density separation, Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics, 2005.

E. Charniak and M. Johnson, -best parsing and MaxEnt discriminative reranking, Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics , ACL '05, pp.173-180, 2005.
DOI : 10.3115/1219840.1219862

S. F. Chen and R. Rosenfeld, A survey of smoothing techniques for maximum entropy models, IEEE transactions on Speech and Audio Processing, pp.37-50, 2000.

X. Chen, S. Chen, X. , and K. , K-similar conditional random fields for semisupervised sequence labeling, Advanced Language Processing and Web Information Technology, pp.21-26, 2008.

I. Cohen, G. Cozman, F. Sebe, N. , C. Cirelo et al., Semisupervised learning of classifiers: theory, algorithms, and their application to human-computer interaction, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.26, issue.12, pp.261553-1567, 2004.
DOI : 10.1109/TPAMI.2004.127

T. Cohn, Efficient Inference in Large Conditional Random Fields, Proceedings of the 17th European Conference on Machine Learning, pp.606-613, 2006.
DOI : 10.1007/11871842_58

T. Cohn and P. Blunsom, Semantic role labelling with tree conditional random fields, Proceedings of the Ninth Conference on Computational Natural Language Learning, CONLL '05, pp.169-172, 2005.
DOI : 10.3115/1706543.1706573

T. Cohn, A. Smith, and M. Osborne, Scaling conditional random fields using error-correcting codes, Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics , ACL '05, pp.10-17, 2005.
DOI : 10.3115/1219840.1219842

M. Collins and N. Duffy, New ranking algorithms for parsing and tagging, Proceedings of the 40th Annual Meeting on Association for Computational Linguistics , ACL '02, pp.489-496, 2002.
DOI : 10.3115/1073083.1073128

M. Collins, A. Globerson, T. Koo, X. Carreras, and P. L. Bartlett, Exponentiated gradient algorithms for conditional random fields and max-margin markov networks, J. Mach. Learn. Res, vol.9, pp.1775-1822, 2008.

A. Corduneanu and T. Jaakkola, On information regularization, the Proceedings of the 19th conference on Uncertainty in Artificial Intelligence (UAI), 2003.

C. Cortes, M. Mohri, M. Riley, and A. Rostamizadeh, Sample Selection Bias Correction Theory, Proceedings of The 19th International Conference on Algorithmic Learning Theory, 2008.
DOI : 10.1007/978-3-540-87987-9_8

C. Cortes and V. Vapnik, Support-vector networks, Machine Learning, vol.1, issue.3, p.273, 1995.
DOI : 10.1007/BF00994018

A. Culotta, D. Kulp, and A. Mccallum, Gene prediction with conditional random fields, 2005.

A. Culotta, A. Mccallum, and J. Betz, Integrating probabilistic extraction models and data mining to discover relations and patterns in text, Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics -, pp.296-303, 2006.
DOI : 10.3115/1220835.1220873

J. N. Darroch and D. Ratcliff, Generalized Iterative Scaling for Log-Linear Models, The Annals of Mathematical Statistics, vol.43, issue.5, pp.1470-1480, 1972.
DOI : 10.1214/aoms/1177692379

I. Daumé and H. , Semi-supervised or semi-unsupervised?, NAACL Workshop on Semi-supervised Learning for NLP, 2009.

J. Davis and M. Goadrich, The relationship between Precision-Recall and ROC curves, Proceedings of the 23rd international conference on Machine learning , ICML '06, 2006.
DOI : 10.1145/1143844.1143874

D. Pietra, S. , D. Pietra, V. J. Lafferty, and J. D. , Inducing features of random fields, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.19, issue.4, pp.380-393, 1997.
DOI : 10.1109/34.588021

A. Dempster, N. Laird, R. , and D. , Maximum likelihood from incomplete data via the EM algorithm, Journal of The Royal Statistical Society Series B, vol.39, issue.1, pp.1-38, 1977.

F. Denis, R. Gilleron, A. Laurent, and M. Tommasi, Co-training from positive and unlabeled examples, Proceedings of the ICML Workshop: the Continuum from Labeled Data to Unlabeled Data in Machine Learning and Data Mining, pp.80-87, 2003.

S. J. Derose, Grammatical category disambiguation by statistical optimization, Computational Linguistics, vol.2, issue.14, pp.31-39, 1988.

T. G. Dietterich, A. Ashenfelter, and Y. Bulatov, Training conditional random fields via gradient tree boosting, Twenty-first international conference on Machine learning , ICML '04, 2004.
DOI : 10.1145/1015330.1015428

M. Dudík, S. J. Phillips, and R. E. Schapire, Performance Guarantees for Regularized Maximum Entropy Density Estimation, Proceedings of the 17th annual Conference on Learning Theory, pp.472-486, 2004.
DOI : 10.1007/978-3-540-27819-1_33

B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, Least angle regression, Annals of Statistics, vol.2, issue.32, pp.407-499, 2004.

D. Elworthy, Does Baum-Welch re-estimation help taggers?, Proceedings of the fourth conference on Applied natural language processing -, 1994.
DOI : 10.3115/974358.974371

J. R. Finkel, A. Kleeman, and C. D. Manning, Efficient, feature-based, conditional random field parsing, Proceedings of ACL-08: HLT, pp.959-967, 2008.

R. Florian, A. Ittycheriah, H. Jing, and T. Zhang, Named entity recognition through classifier combination, Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 -, pp.168-171, 2003.
DOI : 10.3115/1119176.1119201

Y. Freund and R. E. Schapire, Experiments with a new boosting algorithm, Proceedings of the thirteenth International Conference on Machine Learning, pp.148-156, 1996.

J. Friedman, T. Hastie, H. Höfling, and R. Tibshirani, Pathwise coordinate optimization, The Annals of Applied Statistics, vol.1, issue.2, pp.302-332, 2007.
DOI : 10.1214/07-AOAS131

URL : http://arxiv.org/abs/0708.1485

J. Friedman, T. Hastie, and R. Tibshirani, Regularization Paths for Generalized Linear Models via Coordinate Descent, Journal of Statistical Software, vol.33, issue.1, 2008.
DOI : 10.18637/jss.v033.i01

M. Galley, A skip-chain conditional random field for ranking meeting utterances by importance, Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, EMNLP '06, 2006.
DOI : 10.3115/1610075.1610126

V. Goel and W. J. Byrne, Minimum Bayes-risk automatic speech recognition, Computer Speech & Language, vol.14, issue.2, pp.115-135, 2000.
DOI : 10.1006/csla.2000.0138

Y. Grandvalet and Y. Bengio, Semi-supervised learning by entropy minimization, Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS), pp.529-536, 2004.

S. S. Gross, O. Russakovsky, C. B. Do, and S. Batzoglou, Training conditional random fields for maximum labelwise accuracy, Advances in Neural Information Processing Systems, 2006.

I. Guyon and A. Elisseeff, An introduction to variable and feature selection, Journal of Machine Learning Research, vol.3, pp.1157-1182, 2003.

T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference and Prediction, 2001.

M. R. Hestenes and E. Stiefel, Methods of conjugate gradients for solving linear systems, Journal of Research of the National Bureau of Standards, vol.49, issue.6, pp.409-437, 1952.
DOI : 10.6028/jres.049.044

A. Holub and P. Perona, A Discriminative Framework for Modelling Object Classes, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), pp.664-671, 2005.
DOI : 10.1109/CVPR.2005.25

C. Ireland and S. Kullback, Contingency tables with given marginals, Biometrika, vol.55, issue.1, 1968.
DOI : 10.1093/biomet/55.1.179

T. Jebara, Machine Learning: Discriminative And Generative, 2004.
DOI : 10.1007/978-1-4419-9011-2

F. Jiao, S. Wang, C. H. Lee, R. Greiner, and D. Schuurmans, Semi-supervised conditional random fields for improved sequence segmentation and labeling, Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the ACL , ACL '06, 2006.
DOI : 10.3115/1220175.1220202

T. Joachims, Transductive inference for text classification using support vector machines, Proceedings of the International Conference on Machine Learning (ICML), pp.200-209, 1999.

M. I. Jordan, Learning in graphical models, 1999.
DOI : 10.1007/978-94-011-5014-9

M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul, An Introduction to Variational Methods for Graphical Models, Machine Learning, p.183, 1999.
DOI : 10.1007/978-94-011-5014-9_5

F. Jousse, R. Gilleron, I. Tellier, and M. Tommasi, Champs conditionnels aléatoires pour l'annotation d'arbres, 8` eme Conférence francophone sur l'Apprentissage automatique (CAp'2006), pp.171-186, 2006.

F. Jousse, R. Gilleron, I. Tellier, and M. Tommasi, Conditional random fields for xml trees, Proceedings of the ECML Workshop on Mining and Learning in Graphs, 2006.
URL : https://hal.archives-ouvertes.fr/inria-00118761

T. Kanamori and H. Shimodaira, Active learning algorithm using the maximum weighted log-likelihood estimator, Journal of Statistical Planning and Inference, vol.116, issue.1, pp.149-162, 2003.
DOI : 10.1016/S0378-3758(02)00234-3

J. Kazama and J. Tsujii, Evaluation and extension of maximum entropy models with inequality constraints, Proceedings of the 2003 conference on Empirical methods in natural language processing -, pp.137-144, 2003.
DOI : 10.3115/1119355.1119373

Y. Kim, Application of maximum entropy markov models on the protein secondary structure prediction, 2001.

R. Kindermann and J. L. Snell, Markov Random Fields and Their Applications, 1980.
DOI : 10.1090/conm/001

D. Klein and C. D. Manning, Conditional structure versus conditional estimation in NLP models, Proceedings of the ACL-02 conference on Empirical methods in natural language processing , EMNLP '02, pp.9-16, 2002.
DOI : 10.3115/1118693.1118695

D. Klein and C. D. Manning, Corpus-based induction of syntactic structure, Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics , ACL '04, p.478, 2004.
DOI : 10.3115/1218955.1219016

T. Koo, A. Globerson, X. Carreras, C. , and M. , Structured prediction models via the matrix-tree theorem, Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp.141-150, 2007.

T. Koski, Hidden Markov models for bioinformatics, 2001.
DOI : 10.1007/978-94-010-0612-5

B. Krishnapuram, L. Carin, M. A. Figueiredo, and A. J. Hartemink, Sparse multinomial logistic regression: fast algorithms and generalization bounds, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.27, issue.6, p.27, 2005.
DOI : 10.1109/TPAMI.2005.127

T. Kudo, CRF++: Yet another CRF toolkit, 2005.

T. Kudo, K. Yamamoto, and Y. Matsumoto, Applying conditional random fields to japanese morphological analysis, Proceedings of EMNLP 2004, pp.230-237, 2004.

J. Kupiec, Robust part-of-speech tagging using a hidden Markov model, Computer Speech & Language, vol.6, issue.3, pp.225-242, 1992.
DOI : 10.1016/0885-2308(92)90019-Z

J. Lafferty, A. Mccallum, and F. Pereira, Conditional random fields: probabilistic models for segmenting and labeling sequence data, Proceedings of the International Conference on Machine Learning (ICML), pp.282-289, 2001.

J. A. Lasserre, C. M. Bishop, and T. P. Minka, Principled Hybrids of Generative and Discriminative Models, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 1 (CVPR'06), pp.87-94, 2006.
DOI : 10.1109/CVPR.2006.227

S. S. Lauritzen, Graphical Models, 1996.

C. Lee, M. Schmidt, A. Murtha, A. Bistritz, J. Sander et al., Segmenting Brain Tumors with Conditional Random Fields and Support Vector Machines, International Conference on Computer Vision workshop (ICCV CVBIA), 2005.
DOI : 10.1007/11569541_47

S. Lee, H. Lee, P. Abbeel, and A. Ng, Efficient l1 regularized logistic regression, Proceedings of the Twenty-first National Conference on Artificial Intelligence (AAAI- 06), pp.1-9, 2006.

D. D. Lewis, Naive (Bayes) at forty: The independence assumption in information retrieval, ECML, 1998.
DOI : 10.1007/BFb0026666

P. Liang and M. Jordan, An asymptotic analysis of generative, discriminative, and pseudolikelihood estimators, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.584-591, 2008.
DOI : 10.1145/1390156.1390230

Y. Liu, J. Carbonell, P. Weigele, and V. Gopalakrishnan, Segmentation Conditional Random Fields (SCRFs): A New Approach for Protein Fold Recognition, Proc. of the 9th Ann. Intl. Conf. on Comput, pp.14-18, 2005.
DOI : 10.1007/11415770_31

R. Malouf, A comparison of algorithms for maximum entropy parameter estimation, proceeding of the 6th conference on Natural language learning , COLING-02, 2002.
DOI : 10.3115/1118853.1118871

G. Mann and A. Mccallum, Efficient computation of entropy gradient for semisupervised conditional random fields, NAACL/HLT, pp.109-112, 2007.

G. Mann and A. Mccallum, Simple, robust, scalable semi-supervised learning via expectation regularization, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.593-600, 2007.
DOI : 10.1145/1273496.1273571

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.120.3681

G. Mann and A. Mccallum, Generalized expectation criteria for semi-supervised learning of conditional random fields, Proceedings of Association of Computational Linguistics, 2008.

Y. Mao and G. Lebanon, Isotonic conditional random fields and local sentiment flow, Advances in Neural Information Processing Systems 19, pp.961-968, 2007.

A. Mccallum, Efficiently inducing features of conditional random fields, Proceedings of the conference Uncertainty in Artificial Intelligence (UAI), 2003.

A. Mccallum, D. Freitag, and F. Pereira, Maximum entropy markov models for information extraction and segmentation, Proc. 17th International Conf. on Machine Learning (ICML), 2000.

A. Mccallum and W. Li, Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons, Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 -, pp.188-191, 2003.
DOI : 10.3115/1119176.1119206

L. Meier, S. Van-de-geer, and P. Bühlmann, The group lasso for logistic regression, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.68, issue.1, pp.53-71, 2008.
DOI : 10.1111/j.1467-9868.2007.00627.x

B. Mérialdo, Tagging english text with a probabilistic model. Computational linguistics, 1993.

C. Merz, . St, D. Clair, and W. Bond, SeMi-supervised adaptive resonance theory (SMART2), [Proceedings 1992] IJCNN International Joint Conference on Neural Networks, pp.851-856, 1992.
DOI : 10.1109/IJCNN.1992.227046

T. Minka, Discriminative models, not discriminative training, 2005.

A. Ng and M. Jordan, On discriminative vs. generative classifiers: A comparison of logistic regression and naive Bayes, NIPS, 2002.

K. Nigam, A. K. Mccallum, S. Thrun, M. , and T. , Text classification from labeled and unlabeled documents using EM, Machine Learning, pp.103-134, 2000.

J. Nocedal, Updating quasi-Newton matrices with limited storage, Mathematics of Computation, vol.35, issue.151, pp.773-782, 1980.
DOI : 10.1090/S0025-5718-1980-0572855-7

J. Nocedal and S. Wright, Numerical Optimization, 2006.
DOI : 10.1007/b98874

F. J. Och and H. Ney, A Systematic Comparison of Various Statistical Alignment Models, Computational Linguistics, vol.22, issue.1, pp.19-51, 2003.
DOI : 10.1109/89.817451

N. Okazaki, CRFsuite: A fast implementation of conditional random fields (CRFs), 2007.

O. Neill and T. , Normal Discrimination with Unclassified Observations, Journal of the American Statistical Association, vol.22, issue.364, pp.821-826, 1978.
DOI : 10.1080/01621459.1978.10480106

C. Pal, C. Sutton, and A. Mccallum, Sparse Forward-Backward Using Minimum Divergence Beams for Fast Training Of Conditional Random Fields, 2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings, 2006.
DOI : 10.1109/ICASSP.2006.1661342

F. Peng, F. Feng, and A. Mccallum, Chinese segmentation and new word detection using conditional random fields, Proceedings of the 20th international conference on Computational Linguistics , COLING '04, 2004.
DOI : 10.3115/1220355.1220436

S. Perkins, K. Lacker, and J. Theiler, Grafting: Fast, incremental feature selection by gradient descent in function space, Journal of Machine Learning Research (JMLR), vol.3, pp.1333-1356, 2003.

W. Press, S. Teukolsky, W. Vetterling, and B. Flannery, Numerical Recipes in C, 1992.

Y. A. Qi, M. Szummer, and T. P. Minka, Bayesian conditional random fields, Tenth International Workshop on Artificial Intelligence and Statistics (AISTATS), 2005.

Y. A. Qi, M. Szummer, and T. P. Minka, Diagram structure recognition by bayesian conditional random fields, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2005.

X. Qian, X. Jiang, Q. Zhang, X. Huang, and L. Wu, Sparse higher order conditional random fields for improved sequence labeling, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, pp.849-856, 2009.
DOI : 10.1145/1553374.1553483

A. Quattoni, M. Collins, D. , and T. , Conditional random fields for object recognition, Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS), 2004.

A. Quattoni, S. Wang, L. Morency, M. Collins, D. et al., Hidden Conditional Random Fields, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.29, issue.10, pp.291848-1852, 2007.
DOI : 10.1109/TPAMI.2007.1124

L. R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition, Proceedings of the IEEE, pp.257-286, 1989.

F. Ramos, D. Fox, and H. Durrant-whyte, CRF-Matching: Conditional Random Fields for Feature-Based Scan Matching, Robotics: Science and Systems III, 2007.
DOI : 10.15607/RSS.2007.III.026

A. Rathnaparkhi, Maximum Entropy Models for Natural Language Ambiguity Resolution, 1998.

S. Riezler and A. Vasserman, Incremental feature selection and l1 regularization for relaxed maximum-entropy modeling, Proceedings of EMNLP 2004, pp.174-181, 2004.

P. Rigollet, Generalization error bounds in semi-supervised classification under the cluster assumption, J. Mach. Learn. Res, vol.8, pp.1369-1392, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00022528

L. Rigouste, O. Cappé, and F. Yvon, Inference and evaluation of the multinomial mixture model for text clustering, Information Processing & Management, vol.43, issue.5, pp.1260-1280, 2007.
DOI : 10.1016/j.ipm.2006.11.001

URL : https://hal.archives-ouvertes.fr/hal-00080133

R. Rosenfeld, A maximum entropy approach to adaptive statistical language modelling, Computer Speech & Language, vol.10, issue.3, pp.187-228, 1996.
DOI : 10.1006/csla.1996.0011

A. Rozenknop, Modèles syntaxiques probabilistes non-g` enératifs, 2002.

Y. D. Rubinstein and T. Hastie, Discriminative vs informative learning, KDD, pp.49-53, 1997.

S. Sarawagi and W. W. Cohen, Semi-markov conditional random fields for information extraction, Advances in Neural Information Processing Systems (NIPS*18), 2004.

G. Schwartz, Estimating the Dimension of a Model, The Annals of Statistics, vol.6, issue.2, pp.461-464, 1978.
DOI : 10.1214/aos/1176344136

H. Scudder, Probability of error of some adaptive pattern-recognition machines, IEEE Transactions on Information Theory, vol.11, issue.3, pp.363-371, 1965.
DOI : 10.1109/TIT.1965.1053799

M. Seeger, Learning with labeled and unlabeled data, 2002.

T. J. Sejnowski and C. R. Rosenberg, Parallel networks that learn to pronounce english text, Complex Systems, vol.1, 1987.

F. Sha and F. Pereira, Shallow parsing with conditional random fields, Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology , NAACL '03, pp.213-220, 2003.
DOI : 10.3115/1073445.1073473

F. Sha and L. K. Saul, Analysis and extension of spectral methods for nonlinear dimensionality reduction, Proceedings of the 22nd international conference on Machine learning , ICML '05, pp.785-792, 2005.
DOI : 10.1145/1102351.1102450

H. Shimodaira, Improving predictive inference under covariate shift by weighting the log-likelihood function, Journal of Statistical Planning and Inference, vol.90, issue.2, pp.227-244, 2000.
DOI : 10.1016/S0378-3758(00)00115-4

S. M. Siddiqi and A. W. Moore, Fast inference and learning in large-state-space HMMs, Proceedings of the 22nd international conference on Machine learning , ICML '05, pp.800-807, 2005.
DOI : 10.1145/1102351.1102452

A. Smith and M. Osborne, Using gazetteers in discriminative information extraction, Proceedings of the Tenth Conference on Computational Natural Language Learning, CoNLL-X '06, 2006.
DOI : 10.3115/1596276.1596302

J. C. Spall, Introduction to Stochastic Search and Optimization, 2003.
DOI : 10.1002/0471722138

M. Stanke and S. Waack, Gene prediction with a hidden Markov model and a new intron submodel, Bioinformatics, vol.19, issue.Suppl 2, pp.215-225, 2003.
DOI : 10.1093/bioinformatics/btg1080

M. Sugiyama, M. Krauledat, and K. Müller, Covariate shift adaptation by importance weighted cross validation, Journal of Machine Learning Research, vol.8, pp.985-1005, 2007.
DOI : 10.1007/978-3-642-21551-3_31

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.386.3425

Y. Sung, C. Boulis, C. Manning, and D. Jurafsky, Regularization, adaptation , and non-independent feature improve hidden conditional random fields for phone classification, IEEE Automatic Speech Recognition and Understanding Workshop, 2007.

C. Sutton and A. Mccallum, Piecewise training for undirected models, UAI, 2005.

C. Sutton and A. Mccallum, An introduction to conditional random fields for relational learning, Introduction to Statistical Relational Learning, 2006.

C. Sutton and A. Mccallum, Piecewise pseudolikelihood for efficient training of conditional random fields, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.863-870, 2007.
DOI : 10.1145/1273496.1273605

C. Sutton, K. Rohanimanesh, and A. Mccallum, Dynamic conditional random fields, Twenty-first international conference on Machine learning , ICML '04, 2004.
DOI : 10.1145/1015330.1015422

J. Suzuki, A. Fujino, and H. Isozaki, Semi-supervised structured output learning based on a hybrid generative and discriminative approach, Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 2007.

J. Suzuki and H. Isozaki, Semi-supervised sequential labeling and segmentation using giga-word scale unlabeled data, Proceedings of ACL-08, 2008.

J. Suzuki, H. Isozaki, X. Carreras, C. , and M. , An empirical study of semisupervised structured conditional models for dependency parsing, Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp.551-560, 2009.

M. Szafranski, Y. Grandvalet, and P. Morizet-mahoudeaux, Hierarchical penalization, Advances in Neural Information Processing Systems, pp.1457-1464, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00267338

M. Szummer and T. Jaakkola, Information regularization with partially labeled data, NIPS, 2002.

M. F. Tappen, C. Liu, E. H. Adelson, F. , and W. T. , Learning Gaussian Conditional Random Fields for Low-Level Vision, 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007.
DOI : 10.1109/CVPR.2007.382979

R. Tibshirani, Regression shrinkage and selection via the lasso, J.R.Statist.Soc.B, vol.58, issue.1, pp.267-288, 1996.

E. F. Tjong-kim-sang and S. Buchholz, Introduction to the CoNLL-2000 shared task, Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning -, pp.127-132, 2000.
DOI : 10.3115/1117601.1117631

E. F. Tjong-kim-sang and F. De-meulder, Introduction to the CoNLL-2003 shared task, Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 -, pp.155-158, 2003.
DOI : 10.3115/1119176.1119195

K. Toutanova, D. Klein, C. D. Manning, and Y. Singer, Feature-rich part-ofspeech tagging with a cyclic dependency network, NAACL '03: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pp.173-180, 2003.

K. Toutanova and C. D. Manning, Enriching the knowledge sources used in a maximum entropy part-of-speech tagger, Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics -, pp.63-70, 2000.
DOI : 10.3115/1117794.1117802

H. Tseng, P. Chang, G. Andrew, D. Jurafsky, and C. Manning, A conditional random field word segmenter, Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing, pp.168-171, 2005.

Y. Tsuboi, H. Kashima, S. Mori, H. Oda, and Y. Matsumoto, Training conditional random fields using incomplete annotations, Proceedings of the 22nd International Conference on Computational Linguistics, COLING '08, 2008.
DOI : 10.3115/1599081.1599194

Z. Tu, Learning Generative Models via Discriminative Approaches, 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007.
DOI : 10.1109/CVPR.2007.383035

V. Vapnik, Statistical Learning Theory, 1998.

S. V. Vishwanathan, N. N. Schraudolph, M. Schmidt, M. , and K. , Accelerated training of conditional random fields with stochastic gradient methods, Proceedings of the 23rd international conference on Machine learning , ICML '06, pp.969-976, 2006.
DOI : 10.1145/1143844.1143966

A. J. Viterbi, Error bounds for convolutional codes and an asymptotically optimum decoding algorithm, IEEE Transactions on Information Theory, vol.13, issue.2, pp.260-269, 1967.
DOI : 10.1109/TIT.1967.1054010

M. J. Wainwright and M. I. Jordan, Graphical Models, Exponential Families, and Variational Inference, Foundations and Trends?? in Machine Learning, vol.1, issue.1???2, 2003.
DOI : 10.1561/2200000001

H. M. Wallach, Efficient Training of Conditional Random Fields, 2002.

Y. Watanabe, M. Asahara, and Y. Matsumoto, Graph-based approach to named entity categorization in wikipedia using conditional random fields, Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp.649-657, 2007.

H. White, Maximum Likelihood Estimation of Misspecified Models, Econometrica, vol.50, issue.1, pp.1-25, 1982.
DOI : 10.2307/1912526

Y. Yang and J. O. Pedersen, A comparative study on feature selection in text categorization, ICML, pp.412-420, 1997.

D. Yarowsky, Unsupervised word sense disambiguation rivaling supervised methods, Proceedings of the 33rd annual meeting on Association for Computational Linguistics -, pp.189-196, 1995.
DOI : 10.3115/981658.981684

M. Yuan and Y. Lin, Model selection and estimation in regression with grouped variables, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.58, issue.1, pp.49-67, 2005.
DOI : 10.1198/016214502753479356

T. Zhang, F. Damerau, J. , and D. , Text chunking using regularized Winnow, Proceedings of the 39th Annual Meeting on Association for Computational Linguistics , ACL '01, 2001.
DOI : 10.3115/1073012.1073081

URL : http://acl.ldc.upenn.edu/P/P01/P01-1069.pdf

T. Zhang and D. Johnson, A robust risk minimization based named entity recognition system, Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 -, 2003.
DOI : 10.3115/1119176.1119210

X. Zhang, D. Aberdeen, and S. V. Vishwanathan, Conditional random fields for multi-agent reinforcement learning, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.1143-1150, 2007.
DOI : 10.1145/1273496.1273640

P. Zhao, G. Rocha, Y. , and B. , The composite absolute penalties family for grouped and hierarchical variable selection, The Annals of Statistics, vol.37, issue.6A, 2009.
DOI : 10.1214/07-AOS584

X. Zhu, Semi-supervised learning literature survey, 2005.

X. Zhu, Semi-Supervised Learning with Graphs, 2005.

X. Zhu and Z. Ghahramani, Learning from labeled and unlabeled data with label propagation, 2002.

X. Zhu, J. Kandola, Z. Ghahramani, and J. Lafferty, Nonparametric transforms of graph kernels for semi-supervised learning, Advances in Neural Information Processing Systems (NIPS) 17, 2005.

H. Zou and T. Hastie, Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.5, issue.2, pp.301-320, 2005.
DOI : 10.1073/pnas.201162998