On the properties of variational approximations of Gibbs posteriors, The Journal of Machine Learning Research, vol.17, issue.1, pp.8374-8414, 2016. ,
Concavity of certain maps on positive definite matrices and applications to Hadamard products, Linear Algebra and its Applications, vol.26, pp.203-241, 1979. ,
Control variates for stochastic gradient MCMC, Statistics and Computing, 2018. ,
The Poisson transform for unnormalised statistical models, Statistics and Computing, vol.25, issue.4, pp.767-780, 2015. ,
Langevin Monte Carlo and JKO splitting, Proceedings of the 31st Conference On Learning Theory, vol.75, pp.1777-1798, 2018. ,
Uniform bounds for the complementary incomplete gamma function, Mathematical Inequalities and Applications, vol.12, pp.115-121, 2009. ,
Nonasymptotic mixing of the MALA algorithm, IMA J. Numer. Anal, vol.33, issue.1, pp.80-110, 2013. ,
, Convex optimization, 2004.
Basic properties of strong mixing conditions. a survey and some open questions. Probability surveys, vol.2, pp.107-144, 2005. ,
On extensions of the Brunn-Minkowski and Prékopa-Leindler theorems, including inequalities for log concave functions, and with an application to the diffusion equation, Journal of Functional Analysis, vol.22, issue.4, pp.366-389, 1976. ,
Geometry of isotropic convex bodies, vol.196, 2014. ,
Sampling from a logconcave distribution with compact support with proximal Langevin Monte Carlo, Proceedings of the 2017 Conference on Learning Theory, vol.65, pp.319-342, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01648665
, Convex optimization: Algorithms and complexity. Foundations and Trends in Machine Learning, vol.8, pp.231-357, 2015.
Bayesian inference for exponential random graph models, Social Networks, vol.33, issue.1, pp.41-55, 2011. ,
PAC-Bayesian supervised classification: the thermodynamics of statistical learning, 2007. ,
URL : https://hal.archives-ouvertes.fr/hal-00206119
On the theory of variance reduction for stochastic gradient Monte Carlo, Proceedings of the 35th International Conference on Machine Learning, vol.80, pp.764-773, 2018. ,
Convergence of Langevin MCMC in KL-divergence, Proceedings of ALT2018, 2018. ,
, Sharp Convergence Rates for Langevin Dynamics in the Nonconvex Setting, 2018.
Underdamped Langevin MCMC: A non-asymptotic analysis, Proceedings of the Conference on Learning Theory, 2018. ,
Minimax estimation of a p-dimensional linear functional in sparse Gaussian models and robust estimation of the mean, 2017. ,
Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent, Proceedings of the 2017 Conference on Learning Theory, vol.65, pp.678-689, 2017. ,
Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity, Machine Learning, vol.72, pp.39-61, 2008. ,
URL : https://hal.archives-ouvertes.fr/hal-00265651
Theoretical guarantees for approximate sampling from smooth and log-concave densities, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.79, issue.3, pp.651-676, 2017. ,
User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient, Stochastic Processes and their Applications, 2019. ,
On sampling from a log-concave density using kinetic Langevin diffusions, 2018. ,
Sparse regression learning by aggregation and Langevin Monte Carlo, COLT 2009 -The 22nd Conference on Learning Theory, pp.1-10, 2009. ,
URL : https://hal.archives-ouvertes.fr/hal-00773553
Estimation paramétrique. Unpublished lecture notes, 2018. ,
On the trend to global equilibrium in spatially inhomogeneous entropy-dissipating systems: the linear Fokker-Planck equation, Comm. Pure Appl. Math, vol.54, issue.1, pp.1-42, 2001. ,
Bridging the Gap between Constant Step Size Stochastic Gradient Descent and Markov Chains, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01565514
Hypocoercivity for linear kinetic equations conserving mass, Trans. Amer. Math. Soc, vol.367, issue.6, pp.3807-3828, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-00482286
Quantitative bounds on convergence of time-inhomogeneous Markov chains, Ann. Appl. Probab, vol.14, issue.4, pp.1643-1665, 2004. ,
Hybrid Monte Carlo, Physics letters B, vol.195, issue.2, pp.216-222, 1987. ,
Analysis of Langevin Monte Carlo via convex optimization, 2018. ,
High-dimensional Bayesian inference via the Unadjusted Langevin Algorithm, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01304430
Nonasymptotic convergence analysis for the unadjusted Langevin algorithm, Ann. Appl. Probab, vol.27, issue.3, pp.1551-1587, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01176132
Log-concave sampling: Metropolis-hastings algorithms are fast!, Proceedings of the 31st Conference On Learning Theory, vol.75, pp.793-797, 2018. ,
Error bounds for Metropolis-Hastings algorithms applied to perturbations of Gaussian measures in high dimensions, The Annals of Applied Probability, vol.24, issue.1, pp.337-377, 2014. ,
Couplings and quantitative contraction rates for Langevin dynamics, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01484275
Quantitative contraction rates for Markov chains on general state spaces, Electronic Journal of Probability, p.24, 2019. ,
On the convergence of Monte Carlo maximum likelihood calculations, Journal of the Royal Statistical Society. Series B (Methodological), pp.261-274, 1994. ,
The Wald consistency theorem, 2012. ,
Constrained Monte Carlo maximum likelihood for dependent data, Journal of the Royal Statistical Society: Series B (Methodological), vol.54, issue.3, pp.657-683, 1992. ,
Maximum likelihood estimation for spatial models by Markov chain Monte Carlo stochastic approximation, J. R. Stat. Soc. Ser. B Stat. Methodol, vol.63, issue.2, pp.339-355, 2001. ,
Noise-contrastive estimation: A new estimation principle for unnormalized statistical models, Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp.297-304, 2010. ,
Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics, J. Mach. Learn. Res, vol.13, pp.307-361, 2012. ,
A convex/log-concave correlation inequality for Gaussian measure and an application to abstract wiener spaces. Probability theory and related fields, vol.130, pp.415-440, 2004. ,
Hypoelliptic estimates and spectral theory for FokkerPlanck operators and Witten Laplacians, Lecture Notes in Mathematics, vol.1862, 2005. ,
Non-convex optimization for machine learning. Foundations and Trends in Machine Learning, vol.10, pp.142-336, 2017. ,
On the Markov chain central limit theorem, Probab. Surv, vol.1, pp.299-320, 2004. ,
Recursive computation of the invariant distribution of a diffusion, Bernoulli, vol.8, issue.3, pp.367-405, 2002. ,
URL : https://hal.archives-ouvertes.fr/hal-00104799
Recursive computation of the invariant distribution of a diffusion: the case of a weakly mean reverting drift, Stoch. Dyn, vol.3, issue.4, pp.435-451, 2003. ,
URL : https://hal.archives-ouvertes.fr/hal-00104799
The concentration of measure phenomenon, 2001. ,
Sampling from non-smooth distribution through Langevin diffusion, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01492056
On Russian roulette estimates for Bayesian inference with doubly-intractable likelihoods, Statistical science, vol.30, issue.4, pp.443-467, 2015. ,
Some PAC-Bayesian theorems, Machine Learning, vol.37, pp.355-363, 1999. ,
Rates of convergence of the Hastings and Metropolis algorithms, The annals of Statistics, vol.24, issue.1, pp.101-121, 1996. ,
Markov chains and stochastic stability, 2012. ,
A fast and simple algorithm for training neural probabilistic language models, Proceedings of the 29th International Conference on Machine Learning (ICML-12), pp.1751-1758, 2012. ,
An efficient Markov Chain Monte Carlo method for distributions with intractable normalising constants, Biometrika, vol.93, issue.2, pp.451-458, 2006. ,
, MCMC for doubly-intractable distributions, 2012.
Inequalities for the incomplete gamma function, Math. Inequal. Appl, vol.3, issue.1, pp.69-77, 2000. ,
MCMC using Hamiltonian dynamics. Handbook of Markov Chain Monte Carlo, vol.2, p.2, 2011. ,
Dynamical Theories of Brownian Motion, 1967. ,
Stochastic processes and applications, Diffusion processes, the Fokker-Planck and Langevin equations, vol.60, 2014. ,
Optimal scaling and diffusion limits for the Langevin algorithm in high dimensions, Ann. Appl. Probab, vol.22, issue.6, pp.2320-2356, 2012. ,
Bounds for the ratio of two gamma functions: from Wendel's and related inequalities to logarithmically completely monotonic functions, Banach Journal of Mathematical Analysis, vol.6, issue.2, pp.132-158, 2012. ,
Non-convex learning via stochastic gradient Langevin dynamics: a nonasymptotic analysis, Proceedings of the 2017 Conference on Learning Theory, vol.65, pp.1674-1703, 2017. ,
Noise contrastive estimation: Asymptotic properties, formal comparison with mc-mle, Electron. J. Statist, vol.12, issue.2, pp.3473-3518, 2018. ,
Optimal scaling of discrete approximations to Langevin diffusions, J. R. Stat. Soc. Ser. B Stat. Methodol, vol.60, issue.1, pp.255-268, 1998. ,
General state space Markov chains and MCMC algorithms, Probab. Surv, vol.1, pp.20-71, 2004. ,
Langevin diffusions and Metropolis-Hastings algorithms, Methodol. Comput. Appl. Probab, vol.4, issue.4, pp.337-357, 2002. ,
Exponential convergence of Langevin distributions and their discrete approximations, Bernoulli, vol.2, issue.4, pp.341-363, 1996. ,
Recent developments in exponential random graph (p*) models for social networks, Social networks, vol.29, issue.2, pp.192-215, 2007. ,
Variational analysis, vol.317, 2009. ,
Deep Boltzmann machines, Artificial Intelligence and Statistics, pp.448-455, 2009. ,
Log-concavity and strong log-concavity: a review, Statistics surveys, vol.8, p.45, 2014. ,
Langevin-type models. I. Diffusions with given stationary distributions and their discretizations, Methodol. Comput. Appl. Probab, vol.1, issue.3, pp.283-306, 1999. ,
Langevin-type models. II. Self-targeting candidates for MCMC algorithms, Methodol. Comput. Appl. Probab, vol.1, issue.3, pp.307-328, 1999. ,
, Algorithmic Theory of ODEs and Sampling from Well-conditioned Logconcave Densities. arXiv e-prints, 2018.
Importance sampling: a review, Wiley Interdisciplinary Reviews: Computational Statistics, vol.2, issue.1, pp.54-60, 2010. ,
Principles of risk minimization for learning theory, Advances in neural information processing systems, pp.831-838, 1992. ,
Optimal transport: old and new, vol.338, 2008. ,
Note on the consistency of the Maximum Likelihood Estimate, Ann. Math. Statistics, vol.20, pp.595-601, 1949. ,
Posterior sampling when the normalizing constant is unknown, Comm. Statist. Simulation Comput, vol.40, issue.5, pp.784-792, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00680015
Markov random field modeling, inference & learning in computer vision & image understanding: A survey, Computer Vision and Image Understanding, vol.117, issue.11, pp.1610-1627, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00858390
The large-sample distribution of the likelihood ratio for testing composite hypotheses, The Annals of Mathematical Statistics, vol.9, issue.1, pp.60-62, 1938. ,
Langevin diffusions and the Metropolis-adjusted Langevin algorithm, Statist. Probab. Lett, vol.91, pp.14-19, 2014. ,
Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization, 2017. ,
A hitting time analysis of stochastic gradient Langevin dynamics, Proceedings of the 2017 Conference on Learning Theory, vol.65, pp.1980-2022, 2017. ,
, compared to the MLE. The MSE ratio depends both on the variance of the proposal distribution ? and the number of artificial data-points m = ? × n (n = 1000). A log-scale is used for both axes, List of Figures 2.1 Estimates and confidence intervals of the Mean Square Error ratios of MC-MLE (left) and NCE (right)
, Estimates and confidence intervals of the Mean Square Error ratios of MC-MLE, compared to the NCE. The MSE ratio depends both on the variance of the proposal distribution ? and the number of artificial datapoints m = ? × n (n = 1000). A log-scale is used for both axes, p.45
, Estimates and confidence intervals of the probability of existence of MC-MLE (left) and NCE (right) estimators. For a fixed n = 1000, the probability of belonging to ? is lower for MC-MLE, especially for small values of the variance of the proposal distribution ? and the number of artificial data-points m = ? × n. A log-scale is used for both axes, p.46
, This plot represents in the plane defined by coordinates ( p/m? 2 , ?) the regions where LMC leads to smaller error than the KLMC (in gray)