#. and *. 0. %+, DDDDD 11

*. , %. !. , and #. <=hh-<k-'dkk-dfddd-@-'hhh-"-hhhfff-@hff-=-'===-@<hh-k-'&-dd-<<kf-"-e-*, !% 333 * %

+. &. Ss, %%" 2" ! SS" * ; ",#5 "5 )** --5 (266 ))))((((' 67 U -R+5 SS, <#' 0)% T "))))).'/KKH' 6. References, pp.222-228

S. S. Lin and F. Yvon, Discriminative Training of Finite State Decoding Graphs, Proc. InterSpeech, pp.733-736, 2005.

L. R. Bahl, F. Jelinek, and R. L. Mercer, A Maximum Likelihood Approach to Continuous Speech Recognition, IEEE Trans. on Pattern Analysis and Machine Intelligence, issue.5, pp.179-190, 1983.

D. Povey and P. C. Woodland, Minimum Phone Error and I-Smoothing for Improved Discriminative Training, Proc. ICASSP, pp.105-108, 2002.

J. Gao, H. Yu, W. Yuan, and P. Xu, Minimum sample risk methods for language modeling, Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing , HLT '05, 2005.
DOI : 10.3115/1220575.1220602

B. Roark, M. Saraclar, and M. Collins, Corrective language modeling for large vocabulary ASR with the perceptron algorithm, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004.
DOI : 10.1109/ICASSP.2004.1326094

M. Mohri, Finite-State Transducers in Language and Speech Processing, Computational Linguistics, vol.23, issue.2, pp.269-311, 1997.

H. K. Kuo, B. Kingsbury, and G. Zweig, Discriminative Training of Decoding Graphs for Large Vocabulary Continuous Speech Recognition, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '07, 2007.
DOI : 10.1109/ICASSP.2007.367159

S. J. Young, N. H. Russel, and J. H. Thornton, Token Passing: a Simple Conceptual Model for Connected Speech Recognition Systems, 1989.

M. Mohri and M. Riley, A Weight Pushing Algorithm for Large Vocabulary Speech Recognition, Proc. EuroSpeech, pp.1603-1606, 2001.

G. Gravier, J. Bonastre, S. Galliano, E. Geoffrois, K. Mctait et al., The ESTER evaluation campaign of Rich Transcription of French Broadcast News, Proc. LREC, 2004.

G. Saon, D. Povey, and G. Zweig, Anatomy of an Extremely Fast LVCSR Decoder, Proc. InterSpeech, pp.549-552, 2005.

. Lee, Discriminative Training of Language Models for Speech Recognition, Proc. ICASSP'02, 2002.

A. V. Aho, R. Sethi, and J. D. Ullman, Compilers: Principles, Techniques and Tools, 1986.

J. Aldrich, R.A. Fisher and the Making of Maximum Likelihood, Statistical Science, pp.162-176, 1997.

C. Allauzen, M. Mohri, and B. Roark, Generalized algorithms for constructing statistical language models, Proceedings of the 41st Annual Meeting on Association for Computational Linguistics , ACL '03, pp.40-47, 2003.
DOI : 10.3115/1075096.1075102

Y. Altun, M. Johnson, and T. Hofmann, Investigating loss functions and optimization methods for discriminative learning of label sequences, Proceedings of the 2003 conference on Empirical methods in natural language processing -, pp.145-152, 2003.
DOI : 10.3115/1119355.1119374

L. Armijo, Minimization of functions having Lipschitz continuous first partial derivatives, Pacific Journal of Mathematics, vol.16, issue.1, pp.1-3, 1966.
DOI : 10.2140/pjm.1966.16.1

Z. Aydin, T. Akgun, A. , and Y. , A Modified Stack Decoder for Protein Secondary Structure Prediction, IEEE International Conference on Acoustics, Speech and Signal Processing, 2005.

L. Azzopardi, M. Girolami, and K. Van-rijsbergen, Investigating the relationship between language model perplexity and IR precision-recall measures, Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval , SIGIR '03, 2003.
DOI : 10.1145/860435.860505

M. Bacchiani and M. Ostendorf, Joint lexicon, acoustic unit inventory and model design, Speech Communication, pp.99-114, 1999.
DOI : 10.1016/S0167-6393(99)00033-3

L. R. Bahl, P. F. Brown, P. V. De-souza, and R. L. Mercer, Maximum mutual information estimation of hidden Markov model parameters for speech recognition, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing, pp.49-52, 1986.
DOI : 10.1109/ICASSP.1986.1169179

L. R. Bahl, F. Jelinek, and R. L. Mercer, A Maximum Likelihood Approach to Continuous Speech Recognition, Machine Intelligence, pp.179-190, 1983.

L. E. Baum, An Equality and Associated Maximization Technique in Statistical Estimation for Probabilistic Functions of Markov Processes, pp.1-8, 1972.

L. E. Baum and J. A. Eagon, An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology, Bulletin of the American Mathematical Society, vol.73, issue.3, pp.360-363, 1967.
DOI : 10.1090/S0002-9904-1967-11751-8

J. R. Bellegarda, Statistical language model adaptation: review and perspectives, Speech Communication, 2004.
DOI : 10.1016/j.specom.2003.08.002
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.91.4893

R. Bellman, On the Theory of Dynamic Programming, Proceedings of National Academy of Science of the USA, pp.716-719, 1952.

K. Beulen, S. Ortmanns, and C. Elting, Dynamic programming search techniques for across-word modelling in speech recognition, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258), pp.609-612, 1999.
DOI : 10.1109/ICASSP.1999.759740

A. Biem, Minimum classification error training for online handwriting recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.28, issue.7, pp.1041-1051, 2006.
DOI : 10.1109/TPAMI.2006.146

C. M. Bishop, Neural Networks for Pattern Recognition, 1995.

J. R. Blum, Multidimensional Stochastic Approximation Methods, Annals of Mathematical Statistics, pp.737-744, 1954.
DOI : 10.1214/aoms/1177728659

D. Caseiro and I. Trancoso, Using Dynamic WFST Composition for Recognizing Broadcast News, Proceedings of International Conference on Spoken Language Processing, 2002.

S. F. Chen, Compiling Large?Context Phonetic Decision Trees into Finite?State Transducers, pp.1169-1172, 2003.

S. F. Chen, D. Beeferman, and R. Rosenfeld, Evaluation Metrics For Language Models, DARPA Broadcast News Transcription and Understanding Workshop, 1998.

S. F. Chen and J. Goodman, An Empirical Study of Smoothing Techniques for Language Modeling, Proceedings of the 34th Annual Meeting of the ACL, pp.310-318, 1996.

S. F. Chen and R. Rosenfeld, A survey of smoothing techniques for ME models, IEEE Transactions on Speech and Audio Processing, pp.37-50, 2000.
DOI : 10.1109/89.817452

Z. Chen, M. J. Li, L. , and K. F. , Discriminative Training on Language Model, International Conference on Spoken Language Processing, 2000.

J. T. Chien, C. H. Huang, K. Shinoda, and S. Furui, Towards Optimal Bayes Decision for Speech Recognition, Proceedings of International Conference on Acoustics, Speech, and Signal Processing, pp.45-48, 2006.

G. Chollet, J. L. Cochard, P. Langlais, and R. Van-kommer, Swiss?French Polyphone: a Telephone Speech Database to Develop Interactive Voice Servers, Linguistic Databases, 1995.

W. Chou, Topics on Minimum Classification Error Rate Based Discriminant Function Approach to Speech Recognition, International Symposium on Chinese Spoken Language Processing, 2000.

Y. L. Chow, Maximum mutual information estimation of HMM parameters for continuous speech recognition using the N-best algorithm, International Conference on Acoustics, Speech, and Signal Processing, pp.701-704, 1990.
DOI : 10.1109/ICASSP.1990.115863

P. Clarkson and R. Rosenfeld, Statistical Language Modeling Using The CMU?Cambridge Toolkit, Proceedings of EUROSPEECH, pp.2707-2710, 1997.

P. R. Clarkson and A. J. Robinson, Language model adaptation using mixtures and an exponentially decaying cache, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, pp.799-802, 1997.
DOI : 10.1109/ICASSP.1997.596049

M. Collins, Discriminative training methods for hidden Markov models, Proceedings of the ACL-02 conference on Empirical methods in natural language processing , EMNLP '02, pp.1-8, 2002.
DOI : 10.3115/1118693.1118694

M. Collins and N. Duffy, New ranking algorithms for parsing and tagging, Proceedings of the 40th Annual Meeting on Association for Computational Linguistics , ACL '02, pp.263-270, 2002.
DOI : 10.3115/1073083.1073128

M. Collins and T. Koo, Discriminative Reranking for Natural Language Parsing, Proc. 17th International Conf. on Machine Learning, pp.175-182, 2000.
DOI : 10.1145/1968.1972

C. Darken and J. Moody, Towards Faster Stochastic Gradient Search, Neural Information Processing Systems, pp.1009-1016, 1992.
DOI : 10.1109/nnsp.1992.253713
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.42.2884

H. Davis, R. Biddulph, and S. Balashek, Automatic Recognition of Spoken Digits, The Journal of the Acoustical Society of America, vol.24, issue.6, pp.637-642, 1952.
DOI : 10.1121/1.1906946

V. Digalakis, P. Monaco, and H. Murveit, Genones: generalized mixture tying in continuous hidden Markov model-based speech recognizers, IEEE Transactions Speech and Audio Processing, pp.281-289, 1996.
DOI : 10.1109/89.506931

E. W. Dijkstra, A note on two problems in connexion with graphs, Numerische Mathematik, vol.4, issue.1, pp.269-271, 1959.
DOI : 10.1007/BF01386390

B. J. Driessen, N. Sadegh, and K. S. Kwok, A Robust Line Search for Learning Control, Proceedings of the 37th IEEE Conference on Decision and Control, pp.3888-3892, 1998.

R. O. Duda and P. E. Hart, Bayes Decision Theory Pattern Classification and Scene Analysis, pp.10-43, 1973.

S. Finette, A. Bleier, W. Swindel, and K. Haber, Breast Tissue Classification Using Diagnostic Ultrasound and Pattern Recognition Techniques: I. Methods of Pattern Recognition, Ultrasonic Imaging, pp.55-70, 1983.

S. Galliano, E. Geoffrois, D. Mostefa, K. Choukri, J. F. Bonastre et al., The ESTER Phase II Evaluation Campaign for the Rich Transcription of French Broadcast News, Proceedings of INTERSPEECH, pp.1149-1152, 2005.

J. Gao and M. Zhang, Improving Langauge Model Size Reduction Using Better Pruning Criteria, p.2002, 2002.

I. J. Good, THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS, Biornetrika, pp.237-264, 1953.
DOI : 10.1093/biomet/40.3-4.237

J. Goodman, Putting it all together: language model combination, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100), pp.1647-1650, 2000.
DOI : 10.1109/ICASSP.2000.862064

J. Goodman, Exponential Priors for Maximum Entropy Models, Proceedings of HLTNAACL, pp.305-312, 2004.

J. Goodman and J. Gao, Language Model Size Reduction by Pruning and Clustering, Proceedings of International Conference on Spoken Language Processing, 2000.

P. S. Gopalakrishnan, D. Kanevsky, A. Nadas, and D. Nahamoo, An inequality for rational functions with applications to some statistical estimation problems, IEEE Transactions on Information Theory, vol.37, issue.1, pp.107-113, 1991.
DOI : 10.1109/18.61108

G. Gravier, J. F. Bonastre, S. Galliano, E. Geoffrois, K. M. Tait et al., The ESTER evaluation campaign of Rich Transcription of French Broadcast News, Proceedings of Language Evaluation and Resources Conference, 2004.

R. Haeb?umbach and H. Ney, Improvements in beam search for 10000-word continuous-speech recognition, IEEE Transactions on Speech and Audio Processing, pp.353-356, 1994.
DOI : 10.1109/89.279287

P. E. Hart, N. J. Nilsson, R. , and B. , A Formal Basis for the Heuristic Determination of Minimum Cost Paths, IEEE Transactions on Systems Science and Cybernetics SSC4, pp.100-107, 1968.
DOI : 10.1109/TSSC.1968.300136

C. A. Hoare, ACM Algorithm 64: Quicksort, Communications of the ACM, p.321, 1961.

T. Imai, A. Ando, and E. Miyasaka, A new method for automatic generation of speaker-dependent phonological rules, 1995 International Conference on Acoustics, Speech, and Signal Processing, pp.864-867, 1995.
DOI : 10.1109/ICASSP.1995.479831

R. Iyer, M. Ostendorf, and M. Meteer, Analyzing and predicting language model improvements, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings, 1997.
DOI : 10.1109/ASRU.1997.659013

F. Jelinek, Fast Sequential Decoding Algorithm Using a Stack, IBM J. Res. Develop, 1969.
DOI : 10.1147/rd.136.0675

F. Jelinek, Continuous speech recognition by statistical methods, Proceedings of the IEEE, pp.532-536, 1976.
DOI : 10.1109/PROC.1976.10159

F. Jelinek, Statistical Methods for Speech Recognition, 1997.

F. Jelinek and R. L. Mercer, Interpolated Estimation of Markov Source Parameters from Sparse Data, Proceedings of the Workshop on Pattern Recognition in Practice, pp.381-397, 1980.

F. Jelinek, R. L. Mercer, and L. R. Bahl, Design of a linguistic statistical decoder for the recognition of continuous speech, IEEE Transactions on Information Theory, pp.250-256, 1975.
DOI : 10.1109/TIT.1975.1055384

F. T. Johansen, A Comparison of Hybrid HMM Architectures Using Global Discriminative Training, Proceedings of the Fourth International Conference on Spoken Language Processing, pp.498-501, 1996.

B. H. Juang, W. Chou, L. , and C. H. , Minimum Classification Error Rate Methods for Speech Recognition, IEEE Transactions on Speech and Audio Processing, pp.257-265, 1997.

B. H. Juang and S. Katagiri, Discriminative Learning for Minimum Error Classification, IEEE Transactions on Signal Processing, pp.3043-3054, 1992.

B. H. Juang and L. R. Rabiner, Automatic Speech Recognition?A Brief History of the Technology Development, Encyclopedia of Language and Linguistics, 2005.

J. C. Junqua and J. P. Haton, Robustness in Automatic Speech Recognition, Fundamentals and Applications, 1995.

S. Kanthak, H. Ney, M. Riley, and M. Mohri, A Comparison of Two LVR Search Optimization Techniques, Proceedings of International Conference on Spoken Language Processing, 2002.

S. Kanthak, A. Sixtus, S. Molau, R. Schluter, and H. Ney, Fast Search for Large Vocabulary Speech Recognition Verbmobil: Foundations of Speech?to?Speech Translation, pp.63-78, 2000.

S. Katagiri, C. H. Lee, and B. H. Juang, A Generalized Probabilistic Descent Method, Proceedings of the Acoustic Society of Japan, pp.141-142, 1990.

S. M. Katz, Estimation of probabilities from sparse data for the language model component of a speech recognizer, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.35, issue.3, pp.400-401, 1987.
DOI : 10.1109/TASSP.1987.1165125

P. Kenny, R. Hollan, G. Boulianne, H. Garudadri, M. Lennig et al., An A* algorithm for very large vocabulary continuous speech recognition, Proceedings of the workshop on Speech and Natural Language , HLT '91, pp.333-338, 1992.
DOI : 10.3115/1075527.1075606

R. Kneser and H. Ney, Improved backing-off for M-gram language modeling, 1995 International Conference on Acoustics, Speech, and Signal Processing, pp.181-184, 1995.
DOI : 10.1109/ICASSP.1995.479394

H. K. Kuo, E. Fosler?lussier, H. Jiang, L. , and C. H. , Discriminative Training of Language Models for Speech Recognition, Proceedings of IEEE International Conference on Acoustics, Speech, Signal processing, pp.325-328, 2002.

C. R. Lai, S. L. Lu, and Q. W. Zhao, Performance Analysis of Speech Recognition Software, Proceedings of the Fifth Workshop on Computer Architecture Evaluation using Commercial Workloads, 2002.

L. Roux, J. Mcdermott, and E. , Optimization Methods for Discriminative Training, pp.3341-3344, 2005.

S. S. Lin and F. Yvon, Discriminative Training of Finite State Decoding Graphs, pp.733-736, 2005.

R. P. Lippmann, Review of Neural Networks for Speech Recognition, Neural Computation, pp.1-38, 1989.
DOI : 10.1016/0893-6080(88)90341-3

B. Lowerre, THE HARPY SPEECH UNDERSTANDING SYSTEM, 1976.
DOI : 10.1016/B978-0-08-051584-7.50053-X

B. Lowerre and R. Reddy, The Harpy Speech Understanding System Readings in Speech Recognition, pp.576-586, 1990.

D. G. Luenberger, R. Lyer, and M. Ostendorf, Linear and Nonlinear ProgrammingModeling Long Distance Dependence in Language: Topic Mixtures versus Dynamic Cache Models, IEEE Transactions on Speech and Audio Processing, pp.30-39, 1989.

G. D. Magoulas, M. N. Vrahatis, and G. S. Androulakis, Improving the Convergence of the Backpropagation Algorithm Using Learning Rate Adaptation Methods, Neural Computation, pp.1769-1796, 1999.
DOI : 10.1137/1013035

M. Mahajan, D. Beeferman, and X. D. Huang, Improved topic-dependent language modeling using information retrieval techniques, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258), p.144, 1999.
DOI : 10.1109/ICASSP.1999.758182

G. Maurice, The Use of Finite Automata in the Lexical Representation of Natural Language, Lecture Notes in Computer Science, vol.377, 1987.

E. Mcdermott, A. Biem, S. Tenpaku, and S. Katagiri, Discriminative training for large vocabulary telephone-based name recognition, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100), 2000.
DOI : 10.1109/ICASSP.2000.860215

E. Mcdermott and T. J. Hazen, Minimum classification error training of landmark models for real-time continuous speech recognition, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, pp.937-940, 2004.
DOI : 10.1109/ICASSP.2004.1326141

E. Mcdermott, T. J. Hazen, J. Le-roux, A. Nakamura, and S. Katagiri, Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error, IEEE Transactions on Audio, Speech and Language Processing, vol.15, issue.1, pp.203-223, 2007.
DOI : 10.1109/TASL.2006.876778

E. Mcdermott and S. Katagiri, Prototype-based minimum classification error/generalized probabilistic descent training for various speech units, Computer Speech and Language, pp.351-368, 1994.
DOI : 10.1006/csla.1994.1018

E. Mcdermott and S. Katagiri, Minimum Classification Error for Large Scale Speech Recognition Tasks using Weighted Finite State Transducers, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005., pp.113-116, 2005.
DOI : 10.1109/ICASSP.2005.1415063

M. Mohri, On some applications of finite-state automata theory to natural language processing, Natural Language Engineering, vol.2, issue.1, pp.61-80, 1996.
DOI : 10.1017/S135132499600126X

M. Mohri, Finite?State Transducers in Language and Speech Processing, Computational Linguistics, pp.269-311, 1997.

M. Mohri, Minimization algorithms for sequential transducers, Theoretical Computer Science, pp.177-201, 2000.
DOI : 10.1016/S0304-3975(98)00115-7

M. Mohri, Generic Epsilon?Removal and Input Epsilon?Normalization Algorithms for Weighted Transducers, 129?143. Speech Processing Extended Finite State Models of Language: Proceedings of the ECAI Workshop, pp.46-50, 2002.

M. Mohri, F. Pereira, R. , and M. , The design principles of a weighted finite-state transducer library, Theoretical Computer Science, pp.17-32, 2000.
DOI : 10.1016/S0304-3975(99)00014-6

M. Mohri, F. Pereira, R. , and M. , Weighted Finite?State Transducers in Speech Recognition, ISCA ITRW Automatic Speech Recognition: Challenges for the Millenium, pp.97-106, 2000.

M. Mohri and M. Riley, Weighted Determinization and Minimization for Large Vocabulary Speech Recognition, pp.131-134, 1997.

M. Mohri and M. Riley, Network optimizations for large-vocabulary speech recognition, Speech Communication, pp.1-12, 1998.
DOI : 10.1016/S0167-6393(98)00043-0

M. Mohri and M. Riley, Integrated Context?Dependent Networks in Very Large Vocabulary Speech Recognition, pp.811-814, 1999.

M. Mohri and M. Riley, A Weight Pushing Algorithm for Large Vocabulary Speech Recognition, Proceedings of EUROSPEECH, pp.1603-1606, 2001.

J. Mor-'e and D. Thuente, Line Search Algorithms with Guaranteed Sufficient Decrease, ACM Transactions on Mathematical Software, vol.20, pp.286-307, 1994.

A. Nadas, D. Nahamoo, and M. Picheny, On a model-robust training method for speech recognition, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.36, issue.9, pp.1432-1436, 1988.
DOI : 10.1109/29.90371

Z. V. Nekrylova, Rate of convergence of the stochastic gradient method, Cybernetics, vol.11, issue.2, pp.218-222, 1975.
DOI : 10.1007/BF01069860

H. Ney and X. Aubert, A Word?Graph Algorithm for Large?Vocabulary Continuous?Speech Recognition, Proceedings of International Conference on Spoken Language Processing, pp.1355-1358, 1994.

H. Ney and S. Ortmanns, Extensions to the word graph method for large vocabulary continuous speech recognition, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, pp.1791-1794, 1997.
DOI : 10.1109/ICASSP.1997.598883

N. Nilsson and M. Kauffman, Artificial Intelligence: A New Synthesis, 1997.

Y. Normandin, Hidden Markov Models, Maximum Mutual Information Estimation and the Speech Recognition Problem, 1991.

Y. Normandin, Optimal splitting of HMM Gaussian mixture components with MMIE training, 1995 International Conference on Acoustics, Speech, and Signal Processing, pp.449-452, 1995.
DOI : 10.1109/ICASSP.1995.479625

F. J. Och, Minimum error rate training in statistical machine translation, Proceedings of the 41st Annual Meeting on Association for Computational Linguistics , ACL '03, pp.160-167, 2003.
DOI : 10.3115/1075096.1075117

J. Odell, V. Valtchev, P. Woodland, Y. , and S. , A one pass decoder design for large vocabulary recognition, Proceedings of the workshop on Human Language Technology , HLT '94, pp.405-410, 1994.
DOI : 10.3115/1075812.1075905

M. Oerder and H. Ney, Word graphs: an efficient interface between continuous-speech recognition and language understanding, IEEE International Conference on Acoustics Speech and Signal Processing, pp.119-122, 1993.
DOI : 10.1109/ICASSP.1993.319246

U. Ohler, S. Harbeck, and H. Niemann, Discriminative Training of Language Model Classifiers, pp.1607-1610, 1999.

S. Ortmanns, A. Eiden, H. Ney, C. , and N. , Look?Ahead Techniques for Fast Beam?Search, International Conference on Acoustics, Speech, and Signal Processing, pp.1783-1786, 1997.

S. Ortmanns, H. Ney, A. , and X. , A word graph algorithm for large vocabulary continuous speech recognition, Computer, Speech and Language, pp.43-72, 1997.
DOI : 10.1006/csla.1996.0022

S. Ortmanns, H. Ney, and A. Eiden, Language-model look-ahead for large vocabulary speech recognition, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96, pp.2095-2098, 1996.
DOI : 10.1109/ICSLP.1996.607215

S. Ortmanns, H. Ney, A. Eiden, C. , and N. , Look?Ahead Techniques for Improved Beam Search, Proceedings of CRIM?FORWISS Workshop, pp.10-22, 1996.

S. Ortmanns, H. Ney, F. Seide, and I. Lindam, A comparison of time conditioned and word conditioned search techniques for large vocabulary speech recognition, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96, pp.2091-2094, 1996.
DOI : 10.1109/ICSLP.1996.607214

C. Paciorek and R. Rosenfeld, Minimum Classification Error Training in Exponential Language Models, Proceedings of the NIST/DARPA Speech Transcription Workshop, 2000.

D. B. Paul, An Efficient A* Stack Decoder Algorithm for Continuous Speech Recognition with a Stochastic Language Model, Proceedings of International Conference on Acoustics, Speech, and Signal Processing, pp.25-28, 1992.

F. Pereira, M. Riley, and R. Sproat, Weighted rational transductions and their application to human language processing, Proceedings of the workshop on Human Language Technology , HLT '94, pp.249-254, 1994.
DOI : 10.3115/1075812.1075870
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.102.3157

F. C. Pereira and M. Riley, Speech Recognition by Composition of Weighted Finite Automata, Finite?State Language Processing, pp.431-453, 1997.

S. D. Pietra, V. D. Pietra, R. L. Mercer, R. , and S. , Adaptive Language Modeling Using Minimum Discriminant Estimation, Proceedings of International Conference on Acoustics, Speech, and Signal Processing, pp.633-636, 1992.

D. Povey and P. C. Woodland, Minimum Phone Error and I?Smoothing for Improved Discriminative Training, Proceedings of International Conference on Acoustics, Speech, and Signal Processing, pp.105-108, 2002.

H. Printz and P. Olsen, Theory and practice of acoustic confusability, Computer Speech & Language, vol.16, issue.1, pp.77-84, 2000.
DOI : 10.1006/csla.2001.0188

L. R. Rabiner, A tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, Proceedings of the IEEE, pp.257-286, 1989.

B. Roark, M. Saraclar, C. , and M. , Corrective language modeling for large vocabulary ASR with the perceptron algorithm, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004.
DOI : 10.1109/ICASSP.2004.1326094

E. Roche and Y. Schabes, Finite?State Language Processing, 1997.

L. Rosasco, D. Vito, E. Caponnetto, A. Piana, M. Verri et al., Are Loss Functions All the Same?, Neural Computation, vol.16, issue.5, pp.1063-1076, 2004.
DOI : 10.1006/jcom.2002.0635

F. Rosenblatt, The perceptron: A probabilistic model for information storage and organization in the brain., Psychological Review, vol.65, issue.6, pp.386-408, 1958.
DOI : 10.1037/h0042519

R. Rosenfeld, Two decades of statistical language modeling: where do we go from here?, Proceedings of the IEEE, 2000.
DOI : 10.1109/5.880083

S. Sakti, S. Nakamura, and K. Markov, Improving Acoustic Model Precision by Incorporating a Wide Phonetic Context Based on a Bayesian Framework, Special Section of Statistical Modeling for Speech Processing, pp.946-953, 2006.
DOI : 10.1093/ietisy/e89-d.3.946

E. Sandness and I. Hetherington, Keyword?Based Discriminative Training of Acoustic Models, Proceedings of International Conference on Spoken Language Processing, 2000.

A. Sankar and C. H. Lee, A maximum-likelihood approach to stochastic matching for robust speech recognition, IEEE Transactions on Speech and Audio Processing, pp.190-202, 1996.
DOI : 10.1109/89.496215

G. Saon, D. Povey, and G. Zweig, Anatomy of an Extremely Fast LVCSR Decoder, pp.549-552, 2005.

R. Schluter and W. Macherey, Comparison of discriminative training criteria, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181), pp.493-496, 1998.
DOI : 10.1109/ICASSP.1998.674475

R. Schluter, W. Macherey, M. Boris, and H. Ney, Comparison of discriminative training criteria and optimization methods for speech recognition, Speech Communication, vol.34, issue.3, pp.287-310, 2001.
DOI : 10.1016/S0167-6393(00)00035-2

R. Schluter, W. Macherey, S. Kanthak, H. Ney, and L. Welling, Comparison Of Optimization Methods For Discriminative Training Criteria, pp.15-18, 1997.

T. Schultz and A. Waibel, Adaptation of Pronunciation Dictionaries for Recognition of Unseen Languages, 1998.

L. Shen, A. Sarkar, and F. J. Och, Discriminative Reranking for Machine Translation, Proceedings of HLTNAACL, 2004.

H. Shimodaira, J. Rokui, and M. Nakai, Improving the Generalization Performance of the MCE/GPD Learning, Proceedings of International Conference on Spoken Language Processing, 1998.

R. Singh, B. Raj, and R. M. Stern, Automatic Generation of Sub?Word Units for Speech Recognition Systems, IEEE Transactions on Speech and Audio Processing, pp.89-99, 2002.

T. Sloboda and A. Waibel, Dictionary learning for spontaneous speech recognition, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96, 1996.
DOI : 10.1109/ICSLP.1996.607274

F. K. Soong and E. F. Huang, A tree-trellis based fast search for finding the N Best sentence hypotheses in continuous speech recognition, Proceedings of the workshop on Speech and Natural Language , HLT '90, pp.705-708, 1991.
DOI : 10.3115/116580.116591

J. C. Spall, Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control, 2003.
DOI : 10.1002/0471722138

V. Steinbiss, B. H. Tran, and H. Ney, Improvements in Beam Search, Proceedings of International Conference on Spoken Language Processing, pp.2143-2146, 1994.

A. Stolcke, An Extensible Language Modeling Toolkit, Proceedings of International Conference on Spoken Language Processing, 2002.

A. J. Viterbi, Error bounds for Convolutional Codes and an Asymptotically Optimal Decoding Algorithm, IEEE Transactions on Information Theory, pp.260-269, 1967.

V. Warnke, S. Harbeck, E. Noth, H. Niemann, and M. Levit, Discriminative estimation of interpolation parameters for language model classifiers, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258), pp.525-528, 1999.
DOI : 10.1109/ICASSP.1999.758178

B. Widrow and M. E. Hoff, Adaptive Switching Circuits, IRE WESCON Convention Record Part IV, pp.96-104, 1960.

I. H. Witten and T. C. Bell, The zero-frequency problem: estimating the probabilities of novel events in adaptive text compression, IEEE Transactions on Information Theory, pp.1085-1094, 1991.
DOI : 10.1109/18.87000

P. C. Woodland, C. J. Leggetter, J. J. Odell, V. Valtchev, Y. et al., The 1994 HTK large vocabulary speech recognition system, 1995 International Conference on Acoustics, Speech, and Signal Processing, pp.73-76, 1995.
DOI : 10.1109/ICASSP.1995.479276

P. C. Woodland and D. Povey, Large scale discriminative training of hidden Markov models for speech recognition, Computer Speech & Language, vol.16, issue.1, pp.25-47, 2002.
DOI : 10.1006/csla.2001.0182

C. A. Ynoguti, E. S. Morais, and F. Violaro, A comparison between HMM and hybrid ANN-HMM-based systems for continuous speech recognition, ITS'98 Proceedings. SBT/IEEE International Telecommunications Symposium (Cat. No.98EX202), pp.135-140, 1998.
DOI : 10.1109/ITS.1998.713105

S. J. Young, G. Evermann, D. Kershaw, G. Moore, J. Odell et al., The HTK Book, 2002.

S. J. Young, J. Odell, and P. Woodland, Tree-based state tying for high accuracy acoustic modelling, Proceedings of the workshop on Human Language Technology , HLT '94, pp.307-312, 1994.
DOI : 10.3115/1075812.1075885

S. J. Young, N. H. Russel, T. , and J. H. , Token Passing: a Simple Conceptual Model for Connected Speech Recognition Systems, 1989.

X. H. Yu, G. A. Chen, and S. X. Cheng, Dynamic Learning Rate Optimization of the Backpropagation Algorithm, IEEE Transaction on Neural Networks, pp.669-677, 1995.

J. Zheng, J. Butzberger, H. Franco, and A. Stolcke, Improved Maximum Mutual Information Estimation Training of Continuous Density HMMs, Proceedings of EUROSPEECH, 2001.

D. X. Zhou, The covering number in learning theory, Journal of Complexity, vol.18, issue.3, pp.739-767, 2002.
DOI : 10.1006/jcom.2002.0635

X. Zhu and R. Rosenfeld, Improving Trigram Language Modeling with the World Wide Web, Proceedings of International Conference on Acoustics, Speech and Signal Processing, pp.592-597, 2001.