, , vol.74, p.32, 2017.
, , vol.78, p.62, 2015.
, Exploring Weight Symmetry in Deep Neural Networks network symmetry #params top-1 top-5
, Emergence of invariance and disentangling in deep representations, 2017.
Information dropout: Learning optimal representations through noisy computation, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018. ,
The im algorithm: a variational approach to information maximization, Advances in Neural Information Processing Systems, vol.16, 2001. ,
, Deep variational information bottleneck, 2016.
An information-theoretic analysis of deep latent-variable models, vol.5, 2017. ,
Alibaba's ai outguns humans in reading test, vol.4, 2018. ,
Meta-learning by adjusting priors based on extended pac-bayes theory, International Conference on Machine Learning, pp.205-214, 2018. ,
An algorithm for computing the capacity of arbitrary discrete memoryless channels, IEEE Transactions on Information Theory, vol.18, issue.1, pp.14-20, 1972. ,
Stronger generalization bounds for deep nets via a compression approach, 2018. ,
Tabula rasa: Model transfer for object category detection, Computer Vision (ICCV), 2011 IEEE International Conference on, pp.2252-2259 ,
Do deep nets really need to be deep?, Advances in neural information processing systems, pp.2654-2662, 2014. ,
Breaking the curse of dimensionality with convex neural networks, Journal of Machine Learning Research, vol.18, issue.19, pp.1-53, 2017. ,
Weight quantization in boltzmann machines, Neural Networks, vol.4, issue.3, pp.405-409, 1991. ,
Mirror descent and nonlinear projected subgradient methods for convex optimization, Operations Research Letters, vol.31, issue.3, pp.167-175, 2003. ,
On the convergence of block coordinate descent type methods, SIAM J. Optim, 2001. ,
Structured prediction energy networks, International Conference on Machine Learning, vol.2, pp.983-992, 2016. ,
Marginal inference in mrfs using frank-wolfe, NIPS Workshop on Greedy Optimization ,
Mine: mutual information neural estimation, vol.7, 2018. ,
Learning a synaptic learning rule, IJCNN-91-Seattle International Joint Conference on Neural Networks, vol.2, p.969, 1991. ,
Learning long-term dependencies with gradient descent is difficult, IEEE transactions on neural networks, vol.5, issue.2, pp.157-166, 1994. ,
Convex neural networks, Advances in neural information processing systems, vol.4, pp.123-130, 2006. ,
Estimating or propagating gradients through stochastic neurons for conditional computation, 2013. ,
Statistical Decision Theory and Bayesian Analysis, vol.6, 1985. ,
Nonserial dynamic programming, vol.2, 1972. ,
Learning feed-forward one-shot learners, Advances in Neural Information Processing Systems, pp.523-531, 2016. ,
Metalearning with differentiable closed-form solvers, ArXiv, pp.6-11, 2018. ,
The method of multipliers for equality constraints, Constrained optimization and Lagrange Multiplier methods ,
Nonlinear programming. Athena scientific, 1999. ,
Mixture density networks, Citeseer, 1994. ,
Computation of channel capacity and rate-distortion functions, IEEE transactions on Information Theory, vol.18, issue.4, pp.460-473, 1972. ,
Learning to localize objects with structured output regression, European conference on computer vision, pp.2-15, 2008. ,
Latent dirichlet allocation, Journal of machine Learning research, vol.3, pp.993-1022, 2003. ,
Variational inference: A review for statisticians, Journal of the American Statistical Association, vol.112, issue.518, pp.859-877, 2017. ,
Learnability and the vapnik-chervonenkis dimension, Journal of the ACM (JACM), vol.36, issue.4, pp.929-965, 1989. ,
, Weight uncertainty in neural networks, 2015.
First order methods beyond convexity and lipschitz gradient continuity with applications to quadratic inverse problems, SIAM Journal on Optimization, vol.28, issue.3, pp.2131-2151, 2018. ,
Sharesnet: reducing residual network parameter number by sharing weights, Proceedings of the International Conference on Learning Representations, 2017. ,
Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine learning, vol.3, pp.1-122, 2011. ,
Fast approximate energy minimization via graph cuts, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.23, issue.11, pp.1222-1239, 2001. ,
Invariant scattering convolution networks, IEEE transactions on pattern analysis and machine intelligence, vol.35, pp.1872-1886, 2013. ,
Getting started with gans part 2: Colorful mnist ,
An analysis of deep neural network models for practical applications, 2002. ,
Domain generalization by solving jigsaw puzzles, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.6-11, 2019. ,
Model compression as constrained optimization, with application to neural nets, 2017. ,
Learning many related tasks at the same time with backpropagation, Advances in neural information processing systems, pp.657-664, 1995. ,
Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks, 2018 Information Theory and Applications Workshop (ITA), pp.1-10, 2018. ,
The direct extension of admm for multi-block convex minimization problems is not necessarily convergent, Mathematical Programming, vol.155, issue.1-2, pp.57-79, 2016. ,
Coupled end-to-end transfer learning with generalized fisher information, Computer Vision and Pattern Recognition, 2002. ,
A closer look at few-shot classification, vol.5, 2019. ,
Compressing neural networks with the hashing trick, International Conference on Machine Learning, pp.2285-2294, 2015. ,
Compressing convolutional neural networks in the frequency domain, Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '16, pp.1475-1484, 2016. ,
On the properties of neural machine translation: Encoder-decoder approaches, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-8), vol.8, 2014. ,
Fast and scalable structural svm with slack rescaling, Artificial Intelligence and Statistics, pp.667-675, 2016. ,
Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms, Proceedings of the ACL-02 conference on Empirical methods in natural language processing, vol.10, pp.1-8, 2002. ,
Exponentiated gradient algorithms for conditional random fields and max-margin Markov networks, JMLR, vol.9, pp.1775-1822, 2008. ,
Binaryconnect: Training deep neural networks with binary weights during propagations, Advances in neural information processing systems, pp.3123-3131, 2015. ,
, Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1, 2016.
Elements of information theory ,
Probability, frequency and reasonable expectation, American journal of physics, vol.14, issue.1, pp.1-13, 1946. ,
Online passive-aggressive algorithms, Journal of Machine Learning Research, vol.7, pp.551-585, 2006. ,
Approximation by superpositions of a sigmoidal function, Mathematics of control, signals and systems, vol.2, issue.4, pp.303-314, 1989. ,
Approximation by superpositions of a sigmoidal function, Mathematics of control, signals and systems, vol.2, issue.4, pp.303-314, 1989. ,
, Compressing neural networks using the variational information bottleneck, 2018.
SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives, NIPS, pp.1646-1654, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01016843
SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives, Advances in Neural Information Processing Systems, vol.3, pp.1646-1654, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01016843
Imagenet: A large-scale hierarchical image database, Computer Vision and Pattern Recognition, pp.248-255, 2009. ,
Predicting parameters in deep learning, Advances in neural information processing systems, pp.2148-2156 ,
Predicting parameters in deep learning, Advances in Neural Information Processing Systems, vol.26, pp.2148-2156, 2013. ,
Exploiting linear structure within convolutional networks for efficient evaluation, Advances in neural information processing systems, pp.1269-1277, 2014. ,
Exploiting linear structure within convolutional networks for efficient evaluation, Advances in Neural Information Processing Systems, vol.27, pp.1269-1277, 2014. ,
, Pretraining of deep bidirectional transformers for language understanding, vol.6, 2018.
First-order methods of smooth convex optimization with inexact oracle, Mathematical Programming, vol.146, issue.1-2, pp.37-75 ,
Sharp minima can generalize for deep nets, 2017. ,
Flownet: Learning optical flow with convolutional networks, Proceedings of the IEEE International Conference on Computer Vision, vol.7, pp.2758-2766, 2015. ,
Adaptive subgradient methods for online learning and stochastic optimization, Journal of Machine Learning Research, vol.12, pp.2121-2159, 2011. ,
Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data, 2017. ,
Depth map prediction from a single image using a multi-scale deep network, Advances in neural information processing systems, vol.7, pp.2366-2374, 2014. ,
Weight discretization paradigm for optical neural networks, Optical interconnections and networks, vol.1281, pp.164-174, 1990. ,
Training structural SVMs when exact inference is intractable, International Conference on Machine Learning (ICML), pp.304-311, 2008. ,
Model-agnostic meta-learning for fast adaptation of deep networks, Proceedings of the 34th International Conference on Machine Learning, vol.70, pp.2019-2025 ,
An algorithm for quadratic programming, Naval research logistics quarterly, vol.3, issue.1-2, pp.95-110, 1956. ,
, Laurent Itti, and Anima Anandkumar. Born again neural networks, 2018.
Information theoretical properties of Tsallis entropies, Journal of Mathematical Physics, vol.47, issue.2, p.23302, 2006. ,
Conditional neural processes, Proceedings of the 35th International Conference on Machine Learning, vol.80, pp.1704-1713 ,
, , 2018.
Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.6, pp.721-741, 1984. ,
Pacbayesian theory meets bayesian inference, Advances in Neural Information Processing Systems, pp.1884-1892 ,
URL : https://hal.archives-ouvertes.fr/hal-01324072
Dynamic few-shot visual learning without forgetting, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.4367-4375, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01829985
Unsupervised representation learning by predicting image rotations, International Conference on Learning Representations, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01832768
Boosting few-shot visual learning with self-supervision, 2019. ,
Frank-wolfe splitting via augmented lagrangian method, Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, vol.84, pp.1456-1465, 2018. ,
Fast r-cnn, Proceedings of the IEEE international conference on computer vision, vol.7, pp.1440-1448, 2015. ,
Deep neural networks with random gaussian weights: A universal classification strategy?, IEEE Trans. Signal Processing, vol.64, issue.13, pp.3444-3457, 2016. ,
Fixing max-product: Convergent message passing algorithms for MAP LP-relaxations, NIPS, 2007. ,
Convergent propagation algorithms via oriented trees, UAI, pp.133-140, 2007. ,
Evolving modular fast-weight networks for control, International Conference on Artificial Neural Networks, pp.383-389, 2005. ,
Compressing deep convolutional networks using vector quantization, 2014. ,
Some history of the hierarchical bayesian methodology. Trabajos de estadística y de investigación operativa, vol.31, p.489, 1980. ,
Decision-theoretic meta-learning: Versatile and efficient amortization of few-shot learning, 2018. ,
High-performance implementation of the level-3 blas, ACM Trans. Math. Softw, vol.35, issue.1, 2008. ,
, Yangqing Jia, and Kaiming He. Accurate, large minibatch SGD: training
Recasting gradient-based meta-learning as hierarchical bayes, 2018. ,
Practical variational inference for neural networks, Advances in neural information processing systems, pp.2348-2356, 2011. ,
Highway and residual networks learn unrolled iterative estimation, Proceedings of the International Conference on Learning Representations, vol.8, 2017. ,
A tutorial introduction to the minimum description length principle ,
Dynamic network surgery for efficient dnns, Advances In Neural Information Processing Systems, pp.1379-1387, 2016. ,
, International Conference on Learning Representation (ICLR), 2017.
, Proceedings of the International Conference on Learning Representations, vol.8, 2017.
Global optimality in neural network training, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol.4, pp.7331-7339, 2017. ,
, Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding
Learning both weights and connections for efficient neural network, Advances in neural information processing systems, pp.1135-1143 ,
Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding, International Conference on Learning Representations (ICLR), 2016. ,
, DSD: regularizing deep neural networks with dense-sparse-dense training flow. International Conference on Learning Representations, vol.8, 2017.
Nearly-tight vc-dimension bounds for piecewise linear neural networks, Conference on Learning Theory, pp.1064-1068 ,
Optimal brain surgeon and general network pruning, IEEE International Conference on, pp.293-299, 1993. ,
Monte carlo sampling methods using markov chains and their applications, Biometrika, vol.57, issue.1, pp.97-109, 1970. ,
Artificial intelligence: The very idea, 1989. ,
A primal-dual message-passing algorithm for approximated large scale structured prediction, NIPS, pp.838-846, 2010. ,
Norm-product belief propagation: Primal-dual message-passing for approximate inference, IEEE Transactions on Information Theory, vol.56, issue.12, pp.6294-6316, 2002. ,
Direct loss minimization for structured prediction, Advances in Neural Information Processing Systems, pp.1594-1602, 2010. ,
Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, Proceedings of the IEEE international conference on computer vision, vol.6, pp.1026-1034, 2015. ,
Deep residual learning for image recognition, Proceedings of the IEEE conference on computer vision and pattern recognition, pp.770-778, 2016. ,
Deep residual learning for image recognition, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.770-778, 2016. ,
Deep residual learning for image recognition, Proceedings of the IEEE conference on computer vision and pattern recognition, pp.770-778 ,
Convexity arguments for efficient minimization of the bethe and kikuchi free energies, J. Artif. Intell. Res.(JAIR), vol.26, pp.153-190, 2006. ,
, Deep belief networks. Scholarpedia, vol.4, p.5947, 2009.
Distilling the knowledge in a neural network, 2015. ,
Keeping the neural networks simple by minimizing the description length of the weights, Proceedings of the sixth annual conference on Computational learning theory, pp.5-13, 1993. ,
Improving neural networks by preventing co-adaptation of feature detectors, 2012. ,
Long short-term memory, vol.8, 1997. ,
Flat minima, Neural Computation, vol.9, issue.1, pp.1-42, 1997. ,
On the linear convergence of the alternating direction method of multipliers, Mathematical Programming, vol.162, issue.1-2, pp.165-199, 2017. ,
A block successive upper bound minimization method of multipliers for linearly constrained convex optimization, 2014. ,
Mobilenets: Efficient convolutional neural networks for mobile vision applications, 2017. ,
Mobilenets: Efficient convolutional neural networks for mobile vision applications ,
, , vol.8, 2016.
Like what you like: Knowledge distill via neuron selectivity transfer, 2017. ,
Quantized neural networks: Training neural networks with low precision weights and activations, The Journal of Machine Learning Research, vol.18, issue.1, pp.6869-6898, 2017. ,
Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size, 2016. ,
Squeezenet: Alexnet-level accuracy with 50x fewer parameters and <1mb model size, vol.8, 2016. ,
Level-3 blas on a GPU : Picking the low hanging fruit. FLAME working note #37, 2009. ,
Tying word vectors and word classifiers: A loss framework for language modeling, vol.8, 2016. ,
Batch normalization: Accelerating deep network training by reducing internal covariate shift, Proceedings of the 32nd International Conference on Machine Learning (ICML-15), pp.448-456, 2015. ,
Batch normalization: Accelerating deep network training by reducing internal covariate shift ,
, Speeding up convolutional neural networks with low rank expansions, 2014.
Speeding up convolutional neural networks with low rank expansions, Proceedings of the British Machine Vision Conference (BMVC) ,
Decoupled neural interfaces using synthetic gradients, Proceedings of the 34th International Conference on Machine Learning, vol.70, pp.1627-1635 ,
Information theory and statistical mechanics, Physical review, vol.106, issue.4, p.620, 1957. ,
Probability theory: the logic of science, 1996. ,
,
, Theory of Probability. The Clarendon Press, 1939.
Dynamic filter networks, Advances in Neural Information Processing Systems, pp.667-675, 2016. ,
Flattened convolutional neural networks for feedforward acceleration, 2014. ,
Cutting-plane training of structural svms, Machine Learning, vol.77, pp.27-59, 2009. ,
Convex relaxation methods for graphical models: Lagrangian and maximum entropy approaches, 2002. ,
Accelerating stochastic gradient descent using predictive variance reduction, Advances in neural information processing systems, pp.315-323, 2013. ,
Artificial intelligence: The revolution hasn't happened yet, 2019. ,
An introduction to variational methods for graphical models, Machine learning, vol.37, issue.2, pp.183-233, 1999. ,
A comparative study of modern inference techniques for discrete energy minimization problems, Proceedings of the IEEE conference on computer vision and pattern recognition, pp.1328-1335 ,
URL : https://hal.archives-ouvertes.fr/hal-00865699
Deep learning without poor local minima, Advances in Neural Information Processing Systems, vol.4, pp.586-594, 2016. ,
The cutting-plane method for solving convex programs, Journal of the society for Industrial and Applied Mathematics, vol.8, issue.4, pp.703-712, 1960. ,
On large-batch training for deep learning: Generalization gap and sharp minima, 2016. ,
A treatise on probability, Courier Corporation, 1921. ,
, Oriol Vinyals, and Yee Whye Teh. Attentive neural processes, vol.6, 2019.
Sequence-level knowledge distillation, EMNLP, vol.8, 2016. ,
Character-aware neural language models, AAAI, vol.8, 2016. ,
Adam: A method for stochastic optimization, 2014. ,
Auto-encoding variational bayes, 2013. ,
Variational dropout and the local reparameterization trick, Advances in Neural Information Processing Systems, pp.2575-2583, 2015. ,
Improved variational inference with inverse autoregressive flow, Advances in Neural Information Processing Systems, vol.7, pp.4743-4751, 2016. ,
Overcoming catastrophic forgetting in neural networks, Proceedings of the national academy of sciences, pp.201611835-2017 ,
Probabilistic graphical models: principles and techniques, 2009. ,
Foundations of the Theory of Probability, 1933. ,
Rademacher penalties and structural risk minimization, IEEE Transactions on Information Theory, vol.47, issue.5, pp.1902-1914, 2001. ,
MRF optimization via dual decomposition: Message-passing revisited, ICCV, pp.1-8, 2007. ,
Efficient training for pairwise or higher order CRFs via dual decomposition, CVPR, vol.3, pp.1841-1848, 2011. ,
Efficient training for pairwise or higher order CRFs via dual decomposition, Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pp.1841-1848, 2011. ,
Structured prediction models via the matrix-tree theorem, Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp.141-150, 2007. ,
Barrier frank-wolfe for marginal inference, Advances in Neural Information Processing Systems, pp.532-540, 2001. ,
Imagenet classification with deep convolutional neural networks, NIPS, vol.8, 2012. ,
,
Learning multiple layers of features from tiny images ,
Learning multiple layers of features from tiny images, 2001. ,
Imagenet classification with deep convolutional neural networks, Advances in neural information processing systems, pp.1097-1105 ,
Factor graphs and the sum-product algorithm, IEEE Transactions on information theory, vol.47, issue.2, pp.498-519, 2001. ,
, Population empirical bayes, 2014.
Structured learning with approximate inference, Advances in Neural Information Processing Systems, vol.3, pp.785-792, 2007. ,
Structured learning with approximate inference, Advances in neural information processing systems, pp.785-792, 2008. ,
On gradients of functions definable in o-minimal structures, Annales de l'institut Fourier, pp.769-783, 1998. ,
, , 2017.
Blockcoordinate Frank-Wolfe optimization for structural SVMs, ICML, pp.53-61 ,
URL : https://hal.archives-ouvertes.fr/hal-00720158
Blockcoordinate frank-wolfe optimization for structural svms, International Conference on Machine Learning, pp.53-61 ,
URL : https://hal.archives-ouvertes.fr/hal-00720158
Conditional random fields: Probabilistic models for segmenting and labeling sequence data, International Conference on Machine Learning, 2001. ,
Iteration-complexity of first-order augmented Lagrangian methods for convex programming, Mathematical Programming, vol.155, issue.1-2, pp.511-547 ,
,
Machine learning systems design, vol.8, 2019. ,
Adaptive stochastic dual coordinate ascent for conditional random fields, 2018. ,
Speeding-up convolutional neural networks using fine-tuned cp-decomposition, International Conference on Learning Representations (ICLR), 2016. ,
Optimal brain damage, Advances in neural information processing systems, pp.598-605, 1990. ,
Optimal brain damage, Advances in Neural Information Processing Systems, vol.2, pp.598-605, 1990. ,
Gradient-based learning applied to document recognition, Proceedings of the IEEE, vol.86, issue.11, pp.2278-2324, 1998. ,
Deep learning. nature, vol.521, p.436, 2015. ,
Metalearning with differentiable convex optimization, CVPR, 2019. ,
Measuring the intrinsic dimension of objective landscapes, 2018. ,
Deeper, broader and artier domain generalization, Proceedings of the IEEE International Conference on Computer Vision, pp.5542-5550 ,
, Visualizing the loss landscape of neural nets, vol.4, 2017.
Finding Task-Relevant Features for Few-Shot Learning by Category Traversal, CVPR, pp.6-11, 2019. ,
Meta-sgd: Learning to learn quickly for few-shot learning, vol.6, 2017. ,
Learning without forgetting, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.7, 2002. ,
QuickeNing: A generic quasi-Newton algorithm for faster gradient-based optimization ,
, Network in network, vol.4, issue.2, 2013.
, , vol.8, 2013.
How far can we go without convolution: Improving fully-connected networks, 2001. ,
An application of the principle of maximum information preservation to linear systems, Advances in neural information processing systems, pp.186-194, 1989. ,
Learning to propagate labels: Transductive propagation network for few-shot learning, 2018. ,
Learning efficient convolutional networks through network slimming, 2017 IEEE International Conference on, pp.2755-2763 ,
Sur la géométrie semi-et sous-analytique, Ann. Inst. Fourier, vol.43, issue.5, pp.1575-1595, 1993. ,
The benefits of learning with strongly convex approximate inference, ICML, pp.410-418, 2015. ,
Bayesian compression for deep learning, Advances in Neural Information Processing Systems, pp.3288-3298 ,
Depth creates no bad local minima, vol.4, 2017. ,
Shufflenet V2: practical guidelines for efficient CNN architecture design, Computer Vision and Pattern Recognition (CVPR), pp.122-138, 2018. ,
Networks of spiking neurons: the third generation of neural network models, Neural networks, vol.10, issue.9, pp.1659-1671, 1997. ,
, , vol.5, 2015.
Stochastic gradient descent as approximate bayesian inference, The Journal of Machine Learning Research, vol.18, issue.1, pp.4873-4907 ,
Building a large annotated corpus of english: The penn treebank, Computational Linguistics, vol.19, pp.313-330, 1993. ,
AD3: Alternating directions dual decomposition for MAP inference in graphical models, JMLR, vol.16, issue.2, pp.495-545, 2015. ,
Fast training of convolutional networks through ffts, International Conference on Learning Representations, 2014. ,
Pac-bayesian stochastic model selection, Machine Learning, vol.51, pp.5-21, 2003. ,
Programs with common sense. RLE and MIT computation center, 1960. ,
A logical calculus of the ideas immanent in nervous activity, The bulletin of mathematical biophysics, vol.5, issue.4, pp.115-133, 1943. ,
Pointer sentinel mixture models ,
Convexifying the bethe free energy, UAI, 2009. ,
Convergence rate analysis of MAP coordinate minimization algorithms, NIPS, 2002. ,
Smooth and strong: MAP inference with linear convergence, NIPS, pp.298-306 ,
Efficient training of structured SVMs via soft constraints, AISTATS, pp.699-707 ,
Learning efficiently with approximate inference via dual losses, ICML, pp.783-790, 2010. ,
Train and test tightness of lp relaxations in structured prediction, Journal of Machine Learning Research, vol.20, issue.13, pp.1-34 ,
Equation of state calculations by fast computing machines. The journal of chemical physics, vol.21, pp.1087-1092, 1953. ,
Discriminative models, not discriminative training ,
Logical versus analogical or symbolic versus connectionist or neat versus scruffy, AI magazine, vol.12, issue.2, pp.34-34, 1991. ,
A simple neural attentive meta-learner, 2017. ,
, Machine Learning: A Probabilistic Perspective, 2002.
Optimizing symmetric dense matrix-vector multiplication on gpus, High Performance Computing, Networking, Storage and Analysis (SC), pp.11-2011 ,
Smooth minimization of non-smooth functions, Mathematical Programming, vol.103, issue.1, pp.127-152, 2002. ,
Introductory lectures on convex optimization: A basic course, vol.87, p.4, 2013. ,
Gps, a program that simulates human thought, 1961. ,
Exploring generalization in deep learning, Advances in Neural Information Processing Systems, pp.5947-5956 ,
Towards understanding the role of over-parametrization in generalization of neural networks, 2018. ,
On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes, Advances in neural information processing systems, pp.841-848, 2002. ,
Optimization landscape and expressivity of deep cnns, International Conference on Machine Learning, vol.4, pp.3727-3736, 2018. ,
On first-order meta-learning algorithms, vol.6, 2001. ,
, Sensitivity and generalization in neural networks: an empirical study, 2018.
Simplifying neural networks by soft weightsharing, Neural Comput, vol.4, issue.4, pp.473-493, 1992. ,
Structured learning and prediction in computer vision, Foundations and Trends® in Computer Graphics and Vision, vol.6, issue.3-4, pp.185-365, 2001. ,
Coordinate descent converges faster with the Gauss-Southwell rule than random selection, ICML, vol.3, pp.1632-1641, 2015. ,
Necessary and sufficient conditions for existence of the lu factorization of an arbitrary matrix, 2005. ,
, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. Wavenet: A generative model for raw audio, vol.4, 2016.
Openai dota 2 1v1 bot ,
Tadam: Task dependent adaptive metric for improved few-shot learning, Advances in Neural Information Processing Systems (NIPS), 2018. ,
, Skip connections eliminate singularities, vol.4, 2017.
A survey on transfer learning, IEEE Transactions on knowledge and data engineering, vol.22, issue.10, pp.1345-1359, 2010. ,
Statistical field theory, 1988. ,
Bayesian networks: A model of self-activated memory for evidential reasoning, Proc. of Cognitive Science Society (CSS-7), 1985. ,
Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, 1988. ,
Deep learning generalizes because the parameter-function map is biased towards simple functions, 2018. ,
A mean field theory learning algorithm for neural networks, Complex systems, vol.1, pp.995-1019, 1987. ,
Spanning tree approximations for conditional random fields, Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS), pp.408-415, 2009. ,
Entropy and margin maximization for structured output learning, ECML, pp.83-98, 2010. ,
Discriminability-based transfer between neural networks, Advances in neural information processing systems, vol.4, pp.204-211, 1993. ,
Few-shot image recognition by predicting parameters from activations, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.6-11, 2018. ,
Recognizing indoor scenes, Computer Vision and Pattern Recognition, pp.413-420, 2009. ,
On the expressive power of deep neural networks, Proceedings of the 34th International Conference on Machine Learning, vol.70, pp.2847-2854 ,
Optimizing nondecomposable loss functions in structured prediction, IEEE transactions on pattern analysis and machine intelligence, vol.35, pp.911-924, 2013. ,
Semi-supervised learning with ladder networks, Advances in Neural Information Processing Systems, pp.3546-3554, 2015. ,
Xnor-net: Imagenet classification using binary convolutional neural networks, European Conference on Computer Vision, pp.525-542, 2016. ,
Amortized bayesian meta-learning, International Conference on Learning Representations (ICLR), 2018. ,
Optimization as a model for few-shot learning, International Conference on Learning Representation, vol.6, 2016. ,
Quadratic programming relaxations for metric labeling and markov random field map estimation, Proceedings of the 23rd international conference on Machine learning, pp.737-744, 2002. ,
Message-passing for graph-structured linear programs: Proximal methods and rounding schemes, Journal of Machine Learning Research, vol.11, pp.1043-1080, 2002. ,
Stochastic backpropagation and approximate inference in deep generative models, vol.6, 2014. ,
Markov logic networks, Machine learning, vol.62, issue.1-2, pp.107-136, 2006. ,
, Stochastic reformulations of linear systems: algorithms and convergence theory
Modeling by shortest data description, Automatica, vol.14, issue.5, pp.465-471, 1978. ,
An empirical bayes approach to statistics, Herbert Robbins Selected Papers, pp.41-47, 1985. ,
A stochastic approximation method. The annals of mathematical statistics, pp.400-407, 1951. ,
A stochastic approximation method, Herbert Robbins Selected Papers, vol.4, pp.102-109, 1985. ,
Monte Carlo statistical methods, 2013. ,
, Hints for thin deep nets, 2014.
Learning structured models with the auc loss and its generalizations, Artificial Intelligence and Statistics, pp.841-849, 2014. ,
A stochastic gradient method with an exponential convergence rate for finite training sets, NIPS, vol.3, pp.2663-2671, 2012. ,
Learning representations by back-propagating errors, nature, vol.323, issue.6088, p.533, 1986. ,
Imagenet large scale visual recognition challenge, International Journal of Computer Vision, vol.115, issue.3, pp.211-252, 2015. ,
Artificial Intelligence: A Modern Approach, 2009. ,
Meta-learning with latent embedding optimization, International Conference on Learning Representations, 2019. ,
Empirical analysis of the hessian of over-parametrized neural networks, vol.4, 2017. ,
Deep boltzmann machines, Artificial intelligence and statistics, pp.448-455, 2009. ,
A simple neural network module for relational reasoning, NIPS, 2017. ,
Few-shot learning with graph neural networks. ArXiv, abs/1711.04043, 2017. ,
A study of Nesterov's scheme for Lagrangian decomposition and MAP labeling, CVPR, pp.1817-1823, 2011. ,
On the information bottleneck theory of deep learning, International Conference on Learning Representation (ICLR) ,
Unsupervised pre-training across image domains improves lung tissue classification, International MICCAI Workshop on Medical Computer Vision, vol.7, pp.82-93, 2014. ,
Evolutionary principles in self-referential learning. on learning now to learn: The meta-meta-meta, 1987. ,
Learning to control fast-weight memories: An alternative to dynamic recurrent networks, Neural Computation, vol.4, issue.1, pp.131-139, 1992. ,
Deep learning in neural networks: An overview, Neural networks, vol.61, issue.1, pp.85-117, 2015. ,
Convergence rates of inexact proximal-gradient methods for convex optimization, NIPS, p.3, 2011. ,
URL : https://hal.archives-ouvertes.fr/inria-00618152
Non-uniform stochastic average gradient method for training conditional random fields, AIStats, pp.819-828, 2015. ,
Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization, ICML, pp.64-72, 2014. ,
Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization, Mathematical Programming, vol.155, pp.105-145, 2016. ,
Pegasos: Primal estimated sub-gradient solver for svm. Mathematical programming, vol.127, pp.3-30, 2011. ,
A mathematical theory of communication, Bell system technical journal, vol.27, issue.3, pp.379-423, 1948. ,
Textonboost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation, European conference on computer vision, pp.1-15 ,
Mastering the game of go without human knowledge, Nature, vol.550, issue.7676, pp.354-2017 ,
Very deep convolutional networks for large-scale image recognition, ICLR, vol.8, 2001. ,
Very deep convolutional networks for large-scale image recognition, 2014. ,
Very deep convolutional networks for large-scale image recognition, 2002. ,
Prototypical networks for few-shot learning, Advances in Neural Information Processing Systems, pp.4077-4087, 2017. ,
Alesis Novik, Abinash Panda, Evangelos Anagnostopoulos, Liang Pang, Alex Binder, serialhex, and Björn Esser. shogun-toolbox/shogun: Shogun 6.1.0, 2017. ,
Tree block coordinate descent for map in graphical models, Artificial Intelligence and Statistics, pp.544-551, 2009. ,
Tightening LP relaxations for MAP using message passing, UAI, vol.3, pp.503-510, 2008. ,
Introduction to dual composition for inference, Optimization for Machine Learning, 2011. ,
No bad local minima: Data independent training error guarantees for multilayer neural networks, vol.4, 2016. ,
Exponentially vanishing sub-optimal local minima in multilayer neural networks, vol.4, 2017. ,
Striving for simplicity: The all convolutional net, vol.4, 2014. ,
Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, vol.15, pp.1929-1958, 2014. ,
Training very deep networks, Advances in Neural Information Processing Systems, vol.28, pp.2377-2385, 2015. ,
Conditional markov processes, Non-linear transformations of stochastic processes, pp.427-453, 1965. ,
Deep learning face representation from predicting 10,000 classes, Proceedings of the IEEE conference on computer vision and pattern recognition, vol.4, pp.1891-1898, 2014. ,
Learning to compare: Relation network for few-shot learning, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. ,
Going deeper with convolutions, CVPR, vol.8, 2015. ,
Going deeper with convolutions, Proceedings of the IEEE conference on computer vision and pattern recognition, pp.1-9 ,
Rethinking the inception architecture for computer vision, Proceedings of the IEEE conference on computer vision and pattern recognition, pp.2818-2826, 2016. ,
Bethe learning of graphical models via MAP decoding, AIStats, pp.1096-1104, 2016. ,
Max-margin markov networks, Proceedings of the 16th International Conference on Neural Information Processing Systems, pp.25-32, 2003. ,
Bayesian uncertainty estimation for batch normalized deep networks, International Conference on Machine Learning (ICML), 2018. ,
Learning to learn: Introduction and overview, Learning to learn, vol.6, pp.3-17, 1998. ,
Deep learning and the information bottleneck principle, Information Theory Workshop (ITW), pp.1-5, 2015. ,
, The information bottleneck method
Vae with a vampprior, vol.5, 2017. ,
Deeppose: Human pose estimation via deep neural networks, Proceedings of the IEEE conference on computer vision and pattern recognition, vol.7, pp.1653-1660, 2014. ,
Large margin methods for structured and interdependent output variables, Journal of machine learning research, vol.6, pp.1453-1484, 2005. ,
Soft weight-sharing for neural network compression, International Conference on Learning Representations (ICLR) ,
Soft weight-sharing for neural network compression, International Conference on Learning Representations, vol.8, 2017. ,
Abdelrahman Mohamed, Matthai Philipose, Matt Richardson, and Rich Caruana. Do deep convolutional nets really need to be deep and convolutional? In ICLR, 2001. ,
A theory of the learnable, Proceedings of the sixteenth annual ACM symposium on Theory of computing, pp.436-445, 1984. ,
Measuring the vc-dimension of a learning machine, Neural computation, vol.6, issue.5, pp.851-876, 1994. ,
, Alexandros Kouris, and Christos-Savvas Bouganis. Toolflows for mapping convolutional neural networks on fpgas: A survey and future directions, vol.51, p.56, 2018.
Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion, Journal of machine learning research, vol.11, pp.3371-3408, 2002. ,
Matching networks for one shot learning, Advances in Neural Information Processing Systems, vol.29, pp.3630-3638, 2016. ,
Estimating the wrong graphical model: Benefits in the computationlimited setting, JMLR, vol.7, issue.8, pp.1829-1859, 2006. ,
Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, vol.1, pp.1-305, 2008. ,
MAP estimation via agreement on (hyper)trees: Message-passing and linear-programming approaches ,
A new class of upper bounds on the log partition function, IEEE Transactions on Information Theory, vol.51, issue.7, pp.2313-2335, 2005. ,
Predicting the future: A connectionist approach, International Journal of Neural Systems, vol.1, issue.3, pp.193-209, 1990. ,
Caltech-UCSD Birds 200, 2010. ,
Intelligence per kilowatt-hour, vol.3, 2018. ,
Do we still need models or just more data and compute?, vol.8, 2019. ,
Beyond regression:" new tools for prediction and analysis in the behavioral sciences, vol.4, 1974. ,
A linear programming approach to max-sum problem: A review, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.29, issue.7, pp.1165-1179, 2007. ,
No spurious local minima in a two hidden unit relu network, International Conference on Learning Representation Workshop ,
Towards understanding generalization of deep learning: Perspective of loss landscapes, 2017. ,
Training and inference with integers in deep neural networks, 2018. ,
, Zhuowen Tu, and Kaiming He. Aggregated residual transformations for deep neural networks, 2016.
Aggregated residual transformations for deep neural networks, Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on, pp.5987-5995 ,
Information-theoretic limitations of distributed information processing, 2016. ,
Information-theoretic analysis of generalization capability of learning algorithms, Advances in Neural Information Processing Systems, vol.6, pp.2524-2533, 2017. ,
Self-supervised domain adaptation for computer vision tasks, IEEE Access, vol.7, pp.156694-156706, 2019. ,
Cross-domain video concept detection using adaptive svms, Proceedings of the 15th ACM international conference on Multimedia, pp.188-197, 2007. ,
Deep fried convnets, Proceedings of the IEEE International Conference on Computer Vision, pp.1476-1483, 2015. ,
Linear programming relaxations and belief propagation-an empirical study, Journal of Machine Learning Research, vol.7, pp.1887-1907, 2002. ,
Understanding belief propagation and its generalizations, vol.8, pp.236-239, 2001. ,
Constructing free-energy approximations and generalized belief propagation algorithms, IEEE Transactions on Information Theory, vol.51, issue.7, pp.2282-2312, 2005. ,
Dual decomposed learning with factorwise oracle for structural SVM of large output domain, NIPS, pp.5024-5032, 2016. ,
A gift from knowledge distillation: Fast optimization, network minimization and transfer learning, Computer Vision and Pattern Recognition, pp.4133-4141, 2017. ,
How transferable are features in deep neural networks?, Advances in neural information processing systems, pp.3320-3328, 2014. ,
The lovász hinge: A novel convex surrogate for submodular losses, IEEE transactions on pattern analysis and machine intelligence, 2018. ,
Wide residual networks, BMVC, 2004. ,
URL : https://hal.archives-ouvertes.fr/hal-01832503
Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01832769
Wide residual networks, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01832503
Recurrent neural network regularization, 2014. ,
Understanding deep learning requires rethinking generalization, vol.1, p.3, 2016. ,
On the convergence rate of stochastic mirror descent for nonsmooth nonconvex optimization, 2018. ,
Shufflenet: An extremely efficient convolutional neural network for mobile devices, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol.8, pp.6848-6856, 2018. ,
Theoretical properties for neural networks with weight matrices of low displacement rank, International Conference on Machine Learning, vol.8, pp.4082-4090, 2017. ,
Compressibility and generalization in large-scale deep learning, 2018. ,
Semi-supervised learning using gaussian fields and harmonic functions, Proceedings of the 20th International conference on Machine learning (ICML-03), vol.6, pp.912-919, 2003. ,
Learning transferable architectures for scalable image recognition, Proceedings of the IEEE conference on computer vision and pattern recognition, vol.8, pp.8697-8710, 2018. ,