P. Alquier, J. Ridgway, and N. Chopin, On the properties of variational approximations of Gibbs posteriors, The Journal of Machine Learning Research, vol.17, issue.1, pp.8374-8414, 2016.

T. Ando, Concavity of certain maps on positive definite matrices and applications to Hadamard products, Linear Algebra and its Applications, vol.26, pp.203-241, 1979.

J. Baker, P. Fearnhead, E. B. Fox, and C. Nemeth, Control variates for stochastic gradient MCMC, Statistics and Computing, 2018.

S. Barthelmé and N. Chopin, The Poisson transform for unnormalised statistical models, Statistics and Computing, vol.25, issue.4, pp.767-780, 2015.

E. ;. Bernton, V. Perchet, and P. Rigollet, Langevin Monte Carlo and JKO splitting, Proceedings of the 31st Conference On Learning Theory, vol.75, pp.1777-1798, 2018.

J. M. Borwein and O. Chan, Uniform bounds for the complementary incomplete gamma function, Mathematical Inequalities and Applications, vol.12, pp.115-121, 2009.

N. Bou-rabee and M. Hairer, Nonasymptotic mixing of the MALA algorithm, IMA J. Numer. Anal, vol.33, issue.1, pp.80-110, 2013.

S. Boyd and L. Vandenberghe, Convex optimization, 2004.

R. C. Bradley, Basic properties of strong mixing conditions. a survey and some open questions. Probability surveys, vol.2, pp.107-144, 2005.

H. J. Brascamp and E. H. Lieb, On extensions of the Brunn-Minkowski and Prékopa-Leindler theorems, including inequalities for log concave functions, and with an application to the diffusion equation, Journal of Functional Analysis, vol.22, issue.4, pp.366-389, 1976.

S. Brazitikos, A. Giannopoulos, P. Valettas, and B. Vritsiou, Geometry of isotropic convex bodies, vol.196, 2014.

N. Brosse, A. Durmus, É. Moulines, and M. Pereyra, Sampling from a logconcave distribution with compact support with proximal Langevin Monte Carlo, Proceedings of the 2017 Conference on Learning Theory, vol.65, pp.319-342, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01648665

S. Bubeck, Convex optimization: Algorithms and complexity. Foundations and Trends in Machine Learning, vol.8, pp.231-357, 2015.

A. Caimo and N. Friel, Bayesian inference for exponential random graph models, Social Networks, vol.33, issue.1, pp.41-55, 2011.

O. Catoni, PAC-Bayesian supervised classification: the thermodynamics of statistical learning, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00206119

N. Chatterji, N. Flammarion, Y. Ma, P. Bartlett, J. et al., On the theory of variance reduction for stochastic gradient Monte Carlo, Proceedings of the 35th International Conference on Machine Learning, vol.80, pp.764-773, 2018.

X. Cheng and P. Bartlett, Convergence of Langevin MCMC in KL-divergence, Proceedings of ALT2018, 2018.

X. Cheng, N. S. Chatterji, Y. Abbasi-yadkori, P. L. Bartlett, J. et al., Sharp Convergence Rates for Langevin Dynamics in the Nonconvex Setting, 2018.

X. Cheng, N. S. Chatterji, P. L. Bartlett, J. , and M. I. , Underdamped Langevin MCMC: A non-asymptotic analysis, Proceedings of the Conference on Learning Theory, 2018.

O. Collier and A. S. Dalalyan, Minimax estimation of a p-dimensional linear functional in sparse Gaussian models and robust estimation of the mean, 2017.

A. Dalalyan, Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent, Proceedings of the 2017 Conference on Learning Theory, vol.65, pp.678-689, 2017.

A. Dalalyan and A. B. Tsybakov, Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity, Machine Learning, vol.72, pp.39-61, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00265651

A. S. Dalalyan, Theoretical guarantees for approximate sampling from smooth and log-concave densities, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.79, issue.3, pp.651-676, 2017.

A. S. Dalalyan and A. Karagulyan, User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient, Stochastic Processes and their Applications, 2019.

A. S. Dalalyan and L. Riou-durand, On sampling from a log-concave density using kinetic Langevin diffusions, 2018.

A. S. Dalalyan and A. B. Tsybakov, Sparse regression learning by aggregation and Langevin Monte Carlo, COLT 2009 -The 22nd Conference on Learning Theory, pp.1-10, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00773553

B. Delyon, Estimation paramétrique. Unpublished lecture notes, 2018.

L. Desvillettes and C. Villani, On the trend to global equilibrium in spatially inhomogeneous entropy-dissipating systems: the linear Fokker-Planck equation, Comm. Pure Appl. Math, vol.54, issue.1, pp.1-42, 2001.

A. Dieuleveut, A. Durmus, and F. Bach, Bridging the Gap between Constant Step Size Stochastic Gradient Descent and Markov Chains, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01565514

J. Dolbeault, C. Mouhot, and C. Schmeiser, Hypocoercivity for linear kinetic equations conserving mass, Trans. Amer. Math. Soc, vol.367, issue.6, pp.3807-3828, 2015.
URL : https://hal.archives-ouvertes.fr/hal-00482286

R. Douc, E. Moulines, and J. S. Rosenthal, Quantitative bounds on convergence of time-inhomogeneous Markov chains, Ann. Appl. Probab, vol.14, issue.4, pp.1643-1665, 2004.

S. Duane, A. D. Kennedy, B. J. Pendleton, and D. Roweth, Hybrid Monte Carlo, Physics letters B, vol.195, issue.2, pp.216-222, 1987.

A. Durmus, S. Majewski, and B. Miasojedow, Analysis of Langevin Monte Carlo via convex optimization, 2018.

A. Durmus and E. Moulines, High-dimensional Bayesian inference via the Unadjusted Langevin Algorithm, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01304430

A. Durmus and E. Moulines, Nonasymptotic convergence analysis for the unadjusted Langevin algorithm, Ann. Appl. Probab, vol.27, issue.3, pp.1551-1587, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01176132

R. Dwivedi, Y. Chen, M. J. Wainwright, Y. , and B. , Log-concave sampling: Metropolis-hastings algorithms are fast!, Proceedings of the 31st Conference On Learning Theory, vol.75, pp.793-797, 2018.

A. Eberle, Error bounds for Metropolis-Hastings algorithms applied to perturbations of Gaussian measures in high dimensions, The Annals of Applied Probability, vol.24, issue.1, pp.337-377, 2014.

A. Eberle, A. Guillin, and R. Zimmer, Couplings and quantitative contraction rates for Langevin dynamics, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01484275

A. Eberle and M. B. Majka, Quantitative contraction rates for Markov chains on general state spaces, Electronic Journal of Probability, p.24, 2019.

C. J. Geyer, On the convergence of Monte Carlo maximum likelihood calculations, Journal of the Royal Statistical Society. Series B (Methodological), pp.261-274, 1994.

C. J. Geyer, The Wald consistency theorem, 2012.

C. J. Geyer and E. A. Thompson, Constrained Monte Carlo maximum likelihood for dependent data, Journal of the Royal Statistical Society: Series B (Methodological), vol.54, issue.3, pp.657-683, 1992.

M. G. Gu and H. Zhu, Maximum likelihood estimation for spatial models by Markov chain Monte Carlo stochastic approximation, J. R. Stat. Soc. Ser. B Stat. Methodol, vol.63, issue.2, pp.339-355, 2001.

M. Gutmann and A. Hyvärinen, Noise-contrastive estimation: A new estimation principle for unnormalized statistical models, Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp.297-304, 2010.

M. U. Gutmann and A. Hyvärinen, Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics, J. Mach. Learn. Res, vol.13, pp.307-361, 2012.

G. Hargé, A convex/log-concave correlation inequality for Gaussian measure and an application to abstract wiener spaces. Probability theory and related fields, vol.130, pp.415-440, 2004.

B. Helffer and F. Nier, Hypoelliptic estimates and spectral theory for FokkerPlanck operators and Witten Laplacians, Lecture Notes in Mathematics, vol.1862, 2005.

P. Jain and P. Kar, Non-convex optimization for machine learning. Foundations and Trends in Machine Learning, vol.10, pp.142-336, 2017.

G. L. Jones, On the Markov chain central limit theorem, Probab. Surv, vol.1, pp.299-320, 2004.

D. Lamberton and G. Pagès, Recursive computation of the invariant distribution of a diffusion, Bernoulli, vol.8, issue.3, pp.367-405, 2002.
URL : https://hal.archives-ouvertes.fr/hal-00104799

D. Lamberton and G. Pagès, Recursive computation of the invariant distribution of a diffusion: the case of a weakly mean reverting drift, Stoch. Dyn, vol.3, issue.4, pp.435-451, 2003.
URL : https://hal.archives-ouvertes.fr/hal-00104799

M. Ledoux, The concentration of measure phenomenon, 2001.

T. D. Luu, J. Fadili, and C. Chesneau, Sampling from non-smooth distribution through Langevin diffusion, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01492056

A. Lyne, M. Girolami, Y. Atchadé, H. Strathmann, and D. Simpson, On Russian roulette estimates for Bayesian inference with doubly-intractable likelihoods, Statistical science, vol.30, issue.4, pp.443-467, 2015.

D. A. Mcallester, Some PAC-Bayesian theorems, Machine Learning, vol.37, pp.355-363, 1999.

K. L. Mengersen and R. L. Tweedie, Rates of convergence of the Hastings and Metropolis algorithms, The annals of Statistics, vol.24, issue.1, pp.101-121, 1996.

S. P. Meyn and R. L. Tweedie, Markov chains and stochastic stability, 2012.

A. Mnih and Y. W. Teh, A fast and simple algorithm for training neural probabilistic language models, Proceedings of the 29th International Conference on Machine Learning (ICML-12), pp.1751-1758, 2012.

J. Møller, A. N. Pettitt, R. Reeves, and K. K. Berthelsen, An efficient Markov Chain Monte Carlo method for distributions with intractable normalising constants, Biometrika, vol.93, issue.2, pp.451-458, 2006.

I. Murray, Z. Ghahramani, and D. Mackay, MCMC for doubly-intractable distributions, 2012.

P. Natalini and B. Palumbo, Inequalities for the incomplete gamma function, Math. Inequal. Appl, vol.3, issue.1, pp.69-77, 2000.

R. M. Neal, MCMC using Hamiltonian dynamics. Handbook of Markov Chain Monte Carlo, vol.2, p.2, 2011.

E. Nelson, Dynamical Theories of Brownian Motion, 1967.

G. A. Pavliotis, Stochastic processes and applications, Diffusion processes, the Fokker-Planck and Langevin equations, vol.60, 2014.

N. S. Pillai, A. M. Stuart, and A. H. Thiéry, Optimal scaling and diffusion limits for the Langevin algorithm in high dimensions, Ann. Appl. Probab, vol.22, issue.6, pp.2320-2356, 2012.

F. Qi and Q. Luo, Bounds for the ratio of two gamma functions: from Wendel's and related inequalities to logarithmically completely monotonic functions, Banach Journal of Mathematical Analysis, vol.6, issue.2, pp.132-158, 2012.

M. Raginsky, A. Rakhlin, and M. Telgarsky, Non-convex learning via stochastic gradient Langevin dynamics: a nonasymptotic analysis, Proceedings of the 2017 Conference on Learning Theory, vol.65, pp.1674-1703, 2017.

L. Riou-durand and N. Chopin, Noise contrastive estimation: Asymptotic properties, formal comparison with mc-mle, Electron. J. Statist, vol.12, issue.2, pp.3473-3518, 2018.

G. O. Roberts and J. S. Rosenthal, Optimal scaling of discrete approximations to Langevin diffusions, J. R. Stat. Soc. Ser. B Stat. Methodol, vol.60, issue.1, pp.255-268, 1998.

G. O. Roberts and J. S. Rosenthal, General state space Markov chains and MCMC algorithms, Probab. Surv, vol.1, pp.20-71, 2004.

G. O. Roberts and O. Stramer, Langevin diffusions and Metropolis-Hastings algorithms, Methodol. Comput. Appl. Probab, vol.4, issue.4, pp.337-357, 2002.

G. O. Roberts and R. L. Tweedie, Exponential convergence of Langevin distributions and their discrete approximations, Bernoulli, vol.2, issue.4, pp.341-363, 1996.

G. Robins, T. Snijders, P. Wang, M. Handcock, and P. Pattison, Recent developments in exponential random graph (p*) models for social networks, Social networks, vol.29, issue.2, pp.192-215, 2007.

R. T. Rockafellar, R. J. Wets, and .. , Variational analysis, vol.317, 2009.

R. Salakhutdinov and G. Hinton, Deep Boltzmann machines, Artificial Intelligence and Statistics, pp.448-455, 2009.

A. Saumard and J. A. Wellner, Log-concavity and strong log-concavity: a review, Statistics surveys, vol.8, p.45, 2014.

O. Stramer and R. L. Tweedie, Langevin-type models. I. Diffusions with given stationary distributions and their discretizations, Methodol. Comput. Appl. Probab, vol.1, issue.3, pp.283-306, 1999.

O. Stramer and R. L. Tweedie, Langevin-type models. II. Self-targeting candidates for MCMC algorithms, Methodol. Comput. Appl. Probab, vol.1, issue.3, pp.307-328, 1999.

Y. Tat-lee, Z. Song, and S. S. Vempala, Algorithmic Theory of ODEs and Sampling from Well-conditioned Logconcave Densities. arXiv e-prints, 2018.

S. T. Tokdar and R. E. Kass, Importance sampling: a review, Wiley Interdisciplinary Reviews: Computational Statistics, vol.2, issue.1, pp.54-60, 2010.

V. Vapnik, Principles of risk minimization for learning theory, Advances in neural information processing systems, pp.831-838, 1992.

C. Villani, Optimal transport: old and new, vol.338, 2008.

A. Wald, Note on the consistency of the Maximum Likelihood Estimate, Ann. Math. Statistics, vol.20, pp.595-601, 1949.

S. G. Walker, Posterior sampling when the normalizing constant is unknown, Comm. Statist. Simulation Comput, vol.40, issue.5, pp.784-792, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00680015

C. Wang, N. Komodakis, P. , and N. , Markov random field modeling, inference & learning in computer vision & image understanding: A survey, Computer Vision and Image Understanding, vol.117, issue.11, pp.1610-1627, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00858390

S. S. Wilks, The large-sample distribution of the likelihood ratio for testing composite hypotheses, The Annals of Mathematical Statistics, vol.9, issue.1, pp.60-62, 1938.

T. Xifara, C. Sherlock, S. Livingstone, S. Byrne, and M. Girolami, Langevin diffusions and the Metropolis-adjusted Langevin algorithm, Statist. Probab. Lett, vol.91, pp.14-19, 2014.

P. Xu, J. Chen, D. Zou, and Q. Gu, Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization, 2017.

Y. Zhang, P. Liang, and M. Charikar, A hitting time analysis of stochastic gradient Langevin dynamics, Proceedings of the 2017 Conference on Learning Theory, vol.65, pp.1980-2022, 2017.

, compared to the MLE. The MSE ratio depends both on the variance of the proposal distribution ? and the number of artificial data-points m = ? × n (n = 1000). A log-scale is used for both axes, List of Figures 2.1 Estimates and confidence intervals of the Mean Square Error ratios of MC-MLE (left) and NCE (right)

, Estimates and confidence intervals of the Mean Square Error ratios of MC-MLE, compared to the NCE. The MSE ratio depends both on the variance of the proposal distribution ? and the number of artificial datapoints m = ? × n (n = 1000). A log-scale is used for both axes, p.45

, Estimates and confidence intervals of the probability of existence of MC-MLE (left) and NCE (right) estimators. For a fixed n = 1000, the probability of belonging to ? is lower for MC-MLE, especially for small values of the variance of the proposal distribution ? and the number of artificial data-points m = ? × n. A log-scale is used for both axes, p.46

, This plot represents in the plane defined by coordinates ( p/m? 2 , ?) the regions where LMC leads to smaller error than the KLMC (in gray)