, Cosine Classifier & Avg. Weight Gen, vol.74

, Cosine Classifier & Att. Weight Gen, vol.74

, Dot Product & Avg. Weight Gen 60.30 ± 0

, Ablations Cosine w/ ReLU

, Cosine w/ ReLU. & Avg. Weight Gen

, Cosine Classifier & Att. Weight Gen

. Bibliography,

P. Agrawal, J. Carreira, and J. Malik, Learning to see by moving, Proceedings of the IEEE International Conference on Computer Vision, pp.37-45, 2015.

B. Alexe, T. Deselaers, and V. Ferrari, Measuring the objectness of image windows, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.34, issue.11, pp.2189-2202, 2012.

M. Andrychowicz, M. Denil, S. Gomez, W. Matthew, D. Hoffman et al., Learning to learn by gradient descent by gradient descent, Advances in Neural Information Processing Systems, pp.3981-3989, 2016.

P. Arbeláez, J. Pont-tuset, J. Barron, F. Marques, and J. Malik, Multiscale combinatorial grouping, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.328-335, 2014.

S. Bell, L. Zitnick, K. Bala, and R. Girshick, Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.2874-2883, 2016.

Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, Greedy layer-wise training of deep networks, Advances in Neural Information Processing Systems, pp.153-160, 2007.

P. Bojanowski and A. Joulin, Unsupervised learning by predicting noise, Proceedings of the 34th International Conference on Machine Learning, ICML, pp.517-526, 2017.

C. Juan, S. Caicedo, and . Lazebnik, Active object localization with deep reinforcement learning, Proceedings of the IEEE International Conference on Computer Vision, pp.2488-2496, 2015.

J. Carreira, P. Agrawal, K. Fragkiadaki, and J. Malik, Human pose estimation with iterative error feedback, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.4733-4742, 2016.

N. Chavali, H. Agrawal, A. Mahendru, and D. Batra, Objectproposal evaluation protocol is' gameable, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.835-844, 2016.

L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, Semantic image segmentation with deep convolutional nets and fully connected crfs, International Conference on Learning Representations, 2015.

L. Chen, A. G. Schwing, A. L. Yuille, and R. Urtasun, Learning deep structured models, Proceedings of the 32nd International Conference on Machine Learning, ICML, pp.1785-1794, 2015.

X. Chen, H. Ma, X. Wang, and Z. Zhao, Improving object proposals with multi-thresholding straddling expansion, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.2587-2595, 2015.

M. Cheng, Z. Zhang, W. Lin, and P. Torr, Bing: Binarized normed gradients for objectness estimation at 300fps, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.3286-3293, 2014.

J. Ramazan-gokberk-cinbis, C. Verbeek, and . Schmid, Weakly supervised object localization with multi-fold multiple instance learning, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.39, issue.1, pp.189-203, 2017.

M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler et al., The cityscapes dataset for semantic urban scene understanding, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.3213-3223, 2016.

J. Dai, K. He, Y. Li, S. Ren, and J. Sun, Instance-sensitive fully convolutional networks, European Conference on Computer Vision, pp.534-549, 2016.

J. Dai, K. He, and J. Sun, Convolutional feature masking for joint object and stuff segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.3992-4000, 2015.

J. Dai, K. He, and J. Sun, Instance-aware semantic segmentation via multi-task network cascades, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.3150-3158, 2016.

J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang et al., Deformable convolutional networks, Proceedings of the IEEE International Conference on Computer Vision, pp.764-773, 2017.

N. Dalal and B. Triggs, Histograms of oriented gradients for human detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.886-893, 2005.
URL : https://hal.archives-ouvertes.fr/inria-00548512

J. Deng, W. Dong, R. Socher, L. Li, K. Li et al., Imagenet: A large-scale hierarchical image database, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.248-255, 2009.

. Ms-coco,

C. Doersch, A. Gupta, and A. A. Efros, Unsupervised visual representation learning by context prediction, Proceedings of the IEEE International Conference on Computer Vision, pp.1422-1430, 2015.

C. Doersch and A. Zisserman, Multi-task self-supervised visual learning, Proceedings of the IEEE International Conference on Computer Vision, pp.2051-2060, 2017.

J. Donahue, P. Krähenbühl, and T. Darrell, Adversarial feature learning. International Conference on Learning Representations, 2017.

J. Dong, Q. Chen, S. Yan, and A. Yuille, Towards unified object detection and semantic segmentation, European Conference on Computer Vision, pp.299-314, 2014.

A. Dosovitskiy, J. T. Springenberg, M. Riedmiller, and T. Brox, Discriminative unsupervised feature learning with convolutional neural networks, Advances in Neural Information Processing Systems, pp.766-774, 2014.

D. Eigen and R. Fergus, Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture, Proceedings of the IEEE International Conference on Computer Vision, pp.2650-2658, 2015.

N. Einecke and J. Eggert, A multi-block-matching approach for stereo, IEEE Intelligent Vehicles Symposium (IV), pp.585-592, 2015.

M. Everingham, L. Van-gool, C. K. Williams, J. Winn, and A. Zisserman, The pascal visual object classes (voc) challenge, International Journal of Computer Vision, vol.88, issue.2, pp.303-338, 2010.

M. Everingham, C. Van-gool, . Williams, A. Winn, and . Zisserman, The pascal visual object classes challenge, 2012.
URL : https://hal.archives-ouvertes.fr/inria-00548597

M. Everingham, . Van-gool, . Williams, A. Winn, and . Zisserman, The pascal visual object classes challenge, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00548597

C. Farabet, C. Couprie, L. Najman, and Y. Lecun, Learning hierarchical features for scene labeling, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.35, issue.8, pp.1915-1929, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00742077

F. Pedro, R. B. Felzenszwalb, D. Girshick, D. Mcallester, and . Ramanan, Object detection with discriminatively trained part-based models, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.32, issue.9, pp.1627-1645, 2010.

C. Finn, P. Abbeel, and S. Levine, Model-agnostic meta-learning for fast adaptation of deep networks, Proceedings of the 34th International Conference on Machine Learning, ICML, pp.1126-1135, 2017.

P. Fischer, A. Dosovitskiy, E. Ilg, P. Häusser, C. Haz?rba? et al., Flownet: Learning optical flow with convolutional networks, Proceedings of the IEEE International Conference on Computer Vision, pp.2758-2766, 2015.

V. Garcia and J. Bruna, Few-shot learning with graph neural networks, International Conference on Learning Representations, 2018.

A. Ghodrati, A. Diba, M. Pedersoli, T. Tuytelaars, and L. Van-gool, Deepproposal: Hunting objects by cascading deep convolutional layers, Proceedings of the IEEE International Conference on Computer Vision, pp.2578-2586, 2015.

S. Gidaris and N. Komodakis, Object detection via a multi-region & semantic segmentation-aware cnn model, Proceedings of the IEEE International Conference on Computer Vision, pp.1134-1142, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01245664

S. Gidaris and N. Komodakis, Locnet: Improving localization accuracy for object detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.789-798, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01832507

R. Girshick, Fast r-cnn, Proceedings of the IEEE International Conference on Computer Vision, pp.1440-1448, 2015.

R. Girshick, J. Donahue, T. Darrell, and J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.580-587, 2014.

A. Gonzalez-garcia, A. Vezhnevets, and V. Ferrari, An active search strategy for efficient object class detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.3022-3031, 2015.

I. Goodfellow, J. Pouget-abadie, M. Mirza, B. Xu, D. Warde-farley et al., Generative adversarial nets, Advances in Neural Information Processing Systems, pp.2672-2680, 2014.

P. Goyal, P. Dollár, R. Girshick, P. Noordhuis, L. Wesolowski et al., Yangqing Jia, and Kaiming He. Accurate, large minibatch sgd: Training imagenet in 1 hour, 2017.

F. Guney and A. Geiger, Displets: Resolving stereo ambiguities using object knowledge, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.4165-4175, 2015.

J. Guo and S. Gould, Deep cnn ensemble with data augmentation for object detection, 2015.

S. Gupta and J. Malik, Visual semantic role labeling, 2015.

P. Bharath-hariharan, R. Arbeláez, J. Girshick, and . Malik, Simultaneous detection and segmentation, European Conference on Computer Vision, pp.297-312, 2014.

B. Hariharan and R. Girshick, Low-shot visual recognition by shrinking and hallucinating features, Proceedings of the IEEE International Conference on Computer Vision, pp.3037-3046, 2017.

M. Havaei, A. Davy, D. Warde-farley, A. Biard, A. Courville et al., Brain tumor segmentation with deep neural networks, Medical Image Analysis, 2016.

Z. Hayder, X. He, and M. Salzmann, Learning to co-generate object proposals with a deep structured network, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.2565-2573, 2016.

K. He, X. Zhang, S. Ren, and J. Sun, Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, Proceedings of the IEEE International Conference on Computer Vision, pp.1026-1034, 2015.

K. He, X. Zhang, S. Ren, and J. Sun, Spatial pyramid pooling in deep convolutional networks for visual recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.37, issue.9, pp.1904-1916, 2015.

K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, 2015.

K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.770-778, 2016.

E. Geoffrey, . Hinton, R. Ruslan, and . Salakhutdinov, Reducing the dimensionality of data with neural networks, Science, 2006.

S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Computation, vol.9, issue.8, pp.1735-1780, 1997.

E. Hoffer and N. Ailon, Deep metric learning using triplet network, International Workshop on Similarity-Based Pattern Recognition, pp.84-92, 2015.

D. Hoiem, Y. Chodpathumwan, and Q. Dai, Diagnosing error in object detectors, European Conference on Computer Vision, pp.340-353

. Springer, , 2012.

K. P. Berthold, B. G. Horn, and . Schunck, Determining optical flow, Artificial Intelligence, vol.17, issue.1-3, pp.185-203, 1981.

J. Hosang, R. Benenson, and B. Schiele, How good are detection proposals, really?, Proceedings of the British Machine Vision Conference, 2014.

J. Hosang, R. Benenson, P. Dollár, and B. Schiele, What makes for effective detection proposals?, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.38, pp.814-830, 2016.

Y. Fu-jie-huang, Y. Boureau, and . Lecun, Unsupervised learning of invariant feature hierarchies with applications to object recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.1-8, 2007.

S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, Proceedings of the 32nd International Conference on Machine Learning, ICML, pp.448-456, 2015.

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long et al., Caffe: Convolutional architecture for fast feature embedding, Proceedings of the ACM International Conference on Multimedia, 2014.

?. Kaiser, O. Nachum, A. Roy, and S. Bengio, Learning to remember rare events, International Conference on Learning Representations, 2017.

V. Kantorov, M. Oquab, M. Cho, and I. Laptev, Contextlocnet: Context-aware deep network models for weakly supervised localization, European Conference on Computer Vision, pp.350-365, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01421772

A. Karpathy and L. Fei-fei, Deep visual-semantic alignments for generating image descriptions, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.3128-3137, 2015.

D. Kingma and J. Ba, Adam: A method for stochastic optimization, International Conference on Learning Representations, 2015.

G. Koch, R. Zemel, and R. Salakhutdinov, Siamese neural networks for one-shot image recognition, ICML Deep Learning Workshop, vol.2, 2015.

. Vladlen-koltun, Efficient inference in fully connected crfs with gaussian edge potentials, Advances in Neural Information Processing Systems, pp.109-117, 2011.

P. Krähenbühl, C. Doersch, J. Donahue, and T. Darrell, Data-dependent initializations of convolutional neural networks, International Conference on Learning Representations, 2016.

P. Krähenbühl and V. Koltun, Geodesic object proposals, European Conference on Computer Vision, pp.725-739, 2014.

P. Krähenbühl and V. Koltun, Learning to propose objects, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.1574-1582, 2015.

A. Krizhevsky and G. Hinton, Learning multiple layers of features from tiny images, 2009.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems, pp.1097-1105, 2012.

W. Kuo, B. Hariharan, and J. Malik, Deepbox: Learning objectness with convolutional networks, Proceedings of the IEEE International Conference on Computer Vision, pp.2479-2487, 2015.

J. Lafferty, A. Mccallum, and F. Pereira, Conditional random fields: Probabilistic models for segmenting and labeling sequence data, Proceedings of the 18th International Conference on Machine Learning, ICML, pp.282-289, 2001.

G. Larsson, M. Maire, and G. Shakhnarovich, Learning representations for automatic colorization, European Conference on Computer Vision, pp.577-593, 2016.

G. Larsson, M. Maire, and G. Shakhnarovich, Colorization as a proxy task for visual understanding, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.6874-6883, 2017.

Y. Lecun, B. Boser, S. John, D. Denker, R. E. Henderson et al., Backpropagation applied to handwritten zip code recognition, Neural computation, 1989.

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE, vol.86, issue.11, pp.2278-2324, 1998.

H. Lee, R. Grosse, R. Ranganath, and A. Ng, Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations, Proceedings of the 26th International Conference on Machine Learning, ICML, pp.609-616, 2009.

K. Lenc and A. Vedaldi, R-cnn minus r, Proceedings of the British Machine Vision Conference, 2015.

M. Leordeanu, A. Radu, and R. Sukthankar, Features in concert: Discriminative feature selection meets unsupervised clustering, 2014.

K. Li, B. Hariharan, and J. Malik, Iterative instance segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.3659-3667, 2016.

R. Liao, A. Schwing, R. Zemel, and R. Urtasun, Learning deep parsimonious representations, Advances in Neural Information Processing Systems, pp.5076-5084, 2016.

M. Lin, Q. Chen, and S. Yan, Network in network, 2013.

T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona et al., Microsoft coco: Common objects in context, European Conference on Computer Vision, pp.740-755, 2014.

J. Long, E. Shelhamer, and T. Darrell, Fully convolutional networks for semantic segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.3431-3440, 2015.

G. David and . Lowe, Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision, vol.60, issue.2, pp.91-110, 2004.

Y. Lu, T. Javidi, and S. Lazebnik, Adaptive object detection using adjacency and zoom prediction, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.2351-2359, 2016.

C. Luo, J. Zhan, X. Xue, L. Wang, R. Ren et al., Cosine normalization: Using cosine similarity instead of dot product in neural networks, International Conference on Artificial Neural Networks, pp.382-391, 2018.

W. Luo, G. Alexander, R. Schwing, and . Urtasun, Efficient deep learning for stereo matching, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.5695-5703, 2016.

L. Andrew, . Maas, Y. Awni, A. Hannun, and . Ng, Rectifier nonlinearities improve neural network acoustic models, Proceedings of the 30th International Conference on Machine Learning, ICML, 2013.

L. Van-der-maaten and G. Hinton, Visualizing data using t-sne, Journal of Machine Learning Research, vol.9, pp.2579-2605, 2008.

J. Masci, U. Meier, D. Cire?an, and J. Schmidhuber, Stacked convolutional auto-encoders for hierarchical feature extraction, Artificial Neural Networks and Machine Learning-ICANN 2011, pp.52-59, 2011.

F. Massa, C. Bryan, M. Russell, and . Aubry, Deep exemplar 2d-3d detection by adapting from real to rendered views, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.6024-6033, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01801049

N. Mayer, E. Ilg, P. Häusser, P. Fischer, D. Cremers et al., A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.4040-4048, 2016.

T. Mensink, J. Verbeek, F. Perronnin, and G. Csurka, Metric learning for large scale image classification: Generalizing to new classes at near-zero cost, European Conference on Computer Vision, pp.488-501, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00722313

M. Menze and A. Geiger, Object scene flow for autonomous vehicles, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.3061-3070, 2015.

M. Menze, C. Heipke, and A. Geiger, Joint 3d estimation of vehicles and scene flow, ISPRS Workshop on Image Sequence Analysis, 2015.

N. Mishra, M. Rohaninejad, X. Chen, and P. Abbeel, Meta-learning with temporal convolutions, 2017.

R. Mottaghi, X. Chen, X. Liu, N. Cho, S. Lee et al., The role of context for object detection and semantic segmentation in the wild, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.891-898, 2014.

T. Munkhdalai and H. Yu, , 2017.

V. Nair and G. E. Hinton, Rectified linear units improve restricted boltzmann machines, Proceedings of the 27th International Conference on Machine Learning, ICML, pp.807-814, 2010.

A. Newell, K. Yang, and J. Deng, Stacked hourglass networks for human pose estimation, European Conference on Computer Vision, pp.483-499

. Springer, , 2016.

H. Noh, S. Hong, and B. Han, Learning deconvolution network for semantic segmentation, Proceedings of the IEEE International Conference on Computer Vision, pp.1520-1528, 2015.

M. Noroozi and P. Favaro, Unsupervised learning of visual representations by solving jigsaw puzzles, European Conference on Computer Vision, pp.69-84

. Springer, , 2016.

M. Noroozi, H. Pirsiavash, and P. Favaro, Representation learning by learning to count, Proceedings of the IEEE International Conference on Computer Vision, pp.5898-5906, 2017.

W. Ouyang, P. Luo, X. Zeng, S. Qiu, Y. Tian et al., Deepid-net: multi-stage and deformable deep convolutional neural networks for object detection, 2014.

E. Oyallon, E. Belilovsky, and S. Zagoruyko, Scaling the scattering transform: Deep hybrid networks, Proceedings of the IEEE International Conference on Computer Vision, pp.5618-5627, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01495734

E. Oyallon and S. Mallat, Deep roto-translation scattering for object classification, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.2865-2873, 2015.

G. Papandreou, I. Kokkinos, and P. Savalle, Modeling local and global deformations in deep learning: Epitomic convolution, multiple instance learning, and sliding window detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.390-399, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01263611

D. Pathak, R. Girshick, P. Dollár, T. Darrell, and B. Hariharan, Learning features by watching objects move, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.2701-2710, 2017.

D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A. Efros, Context encoders: Feature learning by inpainting, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.2536-2544, 2016.

X. Peng and C. Schmid, Multi-region two-stream r-cnn for action detection, European Conference on Computer Vision, pp.744-759, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01349107

O. Pedro, R. Pinheiro, P. Collobert, and . Dollar, Learning to segment object candidates, Advances in Neural Information Processing Systems, 1990.

T. Pedro-o-pinheiro, R. Lin, P. Collobert, and . Dollár, Learning to refine object segments, European Conference on Computer Vision, pp.75-91

. Springer, , 2016.

H. Qi, M. Brown, and D. Lowe, Learning with imprinted weights, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.5822-5830, 2018.

A. Radford, L. Metz, and S. Chintala, Unsupervised representation learning with deep convolutional generative adversarial networks, International Conference on Learning Representations, 2016.

S. Ravi and H. Larochelle, Optimization as a model for few-shot learning, International Conference on Learning Representations, 2017.

A. Sylvestre-alvise-rebuffi, G. Kolesnikov, C. H. Sperl, and . Lampert, icarl: Incremental classifier and representation learning, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2001.

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, You only look once: Unified, real-time object detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.779-788, 2016.

K. Shaoqing-ren, R. He, J. Girshick, and . Sun, Faster r-cnn: Towards real-time object detection with region proposal networks, Advances in Neural Information Processing Systems, pp.91-99, 2015.

K. Shaoqing-ren, R. He, X. Girshick, J. Zhang, and . Sun, Object detection networks on convolutional feature maps, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.39, issue.7, pp.1476-1481, 2017.

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh et al., Imagenet large scale visual recognition challenge, International Journal of Computer Vision, vol.115, issue.3, pp.211-252, 2015.

C. Russell, P. Kohli, H. S. Philip, and . Torr, Associative hierarchical crfs for object class image segmentation, Proceedings of the IEEE International Conference on Computer Vision, pp.739-746, 2009.

A. Santoro, S. Bartunov, M. Botvinick, D. Wierstra, and T. Lillicrap, Meta-learning with memory-augmented neural networks, Proceedings of the 33rd International Conference on Machine Learning, ICML, pp.1842-1850, 2016.

A. Santoro, S. Bartunov, M. Botvinick, D. Wierstra, and T. Lillicrap, One-shot learning with memory-augmented neural networks, 2016.

D. Scharstein, H. Hirschmüller, Y. Kitajima, G. Krathwohl, N. Ne?i? et al., High-resolution stereo datasets with subpixel-accurate ground truth, German Conference on Pattern Recognition, pp.31-42, 2014.

J. Schmidhuber, J. Zhao, and M. Wiering, Shifting inductive bias with success-story algorithm, adaptive levin search, and incremental self-improvement, Machine Learning, vol.28, pp.105-130, 1997.

G. Alexander, R. Schwing, and . Urtasun, Fully connected deep structured networks, 2015.

A. Seki and M. Pollefeys, Patch based confidence prediction for dense disparity map, Proceedings of the British Machine Vision Conference, 2016.

P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus et al., Overfeat: Integrated recognition, localization and detection using convolutional networks, International Conference on Learning Representations, 2014.

J. Kevin, S. Shih, D. Singh, and . Hoiem, Where to look: Focus regions for visual question answering, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.4613-4621, 2016.

J. Shotton, M. Johnson, and R. Cipolla, Semantic texton forests for image categorization and segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.1-8, 2008.

J. Shotton, T. Sharp, A. Kipman, A. Fitzgibbon, M. Finocchio et al., Real-time human pose recognition in parts from single depth images, Communications of the ACM, vol.56, pp.116-124, 2013.

J. Shotton, J. Winn, C. Rother, and A. Criminisi, Textonboost for image understanding: Multi-class object recognition and segmentation by jointly modeling texture, layout, and context, International Journal of Computer Vision, vol.81, issue.1, pp.2-23, 2009.

A. Shrivastava, A. Gupta, and R. Girshick, Training region-based object detectors with online hard example mining, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.761-769, 2016.

N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, Indoor segmentation and support inference from rgbd images, European Conference on Computer Vision, pp.746-760, 2012.

K. Simonyan and A. Zisserman, Very deep convolutional networks for largescale image recognition, International Conference on Learning Representations, 2015.

J. Snell, K. Swersky, and R. S. Zemel, Prototypical networks for few-shot learning, Advances in Neural Information Processing Systems, pp.4077-4087, 2017.

K. Rupesh-kumar-srivastava, J. Greff, and . Schmidhuber, , 2015.

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed et al., Going deeper with convolutions, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.1-9, 2015.

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed et al., Going deeper with convolutions, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.1-9, 2015.

C. Szegedy and S. Reed, Dumitru Erhan, and Dragomir Anguelov. Scalable, high-quality object detection, 2014.

C. Szegedy, A. Toshev, and D. Erhan, Deep neural networks for object detection, Advances in Neural Information Processing Systems, pp.2553-2561, 2013.

O. Teboul and I. Kokkinos, Loic Simon, Panagiotis Koutsourakis, and Nikos Paragios, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.2273-2280, 2011.

S. Thrun, Lifelong learning algorithms. Learning to learn, vol.8, pp.181-209, 1998.

E. A. Koen, . Van-de-sande, R. R. Jasper, T. Uijlings, A. W. Gevers et al., Segmentation as selective search for object recognition, Proceedings of the IEEE International Conference on Computer Vision, pp.1879-1886, 2011.

A. Vedaldi, M. Varun-gulshan, A. Varma, and . Zisserman, Multiple kernels for object detection, IEEE 12th International Conference on, 2009.

O. Vinyals, C. Blundell, T. Lillicrap, and D. Wierstra, Matching networks for one shot learning, Advances in Neural Information Processing Systems, pp.3630-3638, 2016.

X. Wang and A. Gupta, Unsupervised learning of visual representations using videos, Proceedings of the IEEE International Conference on Computer Vision, pp.2794-2802, 2015.

Y. Wang, R. Girshick, M. Hebert, and B. Hariharan, Low-shot learning from imaginary data, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.7278-7286, 2018.

K. Yamaguchi, D. Mcallester, and R. Urtasun, Efficient joint segmentation, occlusion labeling, stereo and flow estimation, European Conference on Computer Vision, pp.756-771, 2014.

J. Yang, D. Parikh, and D. Batra, Joint unsupervised learning of deep representations and image clusters, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.5147-5156, 2016.

D. Yoo, S. Park, J. Lee, A. S. Paek, and I. S. Kweon, Attentionnet: Aggregating weak directions for accurate object detection, Proceedings of the IEEE International Conference on Computer Vision, pp.2659-2667, 2015.

F. Yu and V. Koltun, Multi-scale context aggregation by dilated convolutions, International Conference on Learning Representations, 2016.

S. Zagoruyko and N. Komodakis, Learning to compare image patches via convolutional neural networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.4353-4361, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01246261

S. Zagoruyko and N. Komodakis, Wide residual networks, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01832503

S. Zagoruyko, A. Lerer, T. Lin, P. O. Pinheiro, and S. Gross, Soumith Chintala, and Piotr Dollár. A multipath network for object detection, Proceedings of the British Machine Vision Conference, 2016.

J. Zbontar and Y. Lecun, Computing the stereo matching cost with a convolutional neural network, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.1592-1599, 2015.

Y. Jure?bontar and . Lecun, Stereo matching by training a convolutional neural network to compare image patches, Journal of Machine Learning Research, vol.17, issue.1, pp.2287-2318, 2016.

D. Matthew, R. Zeiler, and . Fergus, Visualizing and understanding convolutional networks, European Conference on Computer Vision, pp.818-833, 2014.

X. Zeng, W. Ouyang, J. Yan, H. Li, T. Xiao et al., Crafting gbd-net for object detection, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.40, issue.9, pp.2109-2123, 2018.

X. Zeng, W. Ouyang, B. Yang, J. Yan, and X. Wang, Gated bi-directional cnn for object detection, European Conference on Computer Vision, pp.354-369, 2016.

R. Zhang, P. Isola, and A. A. Efros, Colorful image colorization, European Conference on Computer Vision, pp.649-666, 2016.

R. Zhang, P. Isola, and A. A. Efros, Split-brain autoencoders: Unsupervised learning by cross-channel prediction, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.1058-1067, 2017.

Y. Zhang, K. Sohn, R. Villegas, G. Pan, and H. Lee, Improving object detection with deep convolutional networks via bayesian optimization and structured prediction, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.249-258, 2015.

Q. Zhao, Z. Liu, and B. Yin, Cracking bing and beyond, Proceedings of the British Machine Vision Conference, 2014.

S. Zheng, S. Jayasumana, B. Romera-paredes, V. Vineet, Z. Su et al., Conditional random fields as recurrent neural networks, Proceedings of the IEEE International Conference on Computer Vision, pp.1529-1537, 2015.

B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva, Learning deep features for scene recognition using places database, Advances in Neural Information Processing Systems, pp.487-495, 2014.

Y. Zhu, R. Urtasun, R. Salakhutdinov, and S. Fidler, segdeepm: Exploiting segmentation and context in deep neural networks for object detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.4703-4711, 2015.

L. Zitnick and P. Dollár, Edge boxes: Locating object proposals from edges, European Conference on Computer Vision, pp.391-405, 2014.