]. S. Agarwal and D. Roth, Learning a Sparse Representation for Object Detection, Proc. European Conference on Computer Vision, pp.113-130, 2002.
DOI : 10.1007/3-540-47979-1_8

]. F. Aherne, N. Thacker, and P. Rockett, The Bhattacharyya metric as an absolute similarity measure for frequency coded data, Kybernetika, vol.34, issue.4, pp.363-368, 1998.

]. A. Alahi, R. Ortiz, and P. Vandergheynst, FREAK: Fast Retina Keypoint, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012.
DOI : 10.1109/CVPR.2012.6247715
URL : http://infoscience.epfl.ch/record/175537

]. X. Anguera, J. Xu, and N. Oliver, Multimodal photo annotation and retrieval on a mobile phone, Proceeding of the 1st ACM international conference on Multimedia information retrieval, MIR '08, pp.188-194, 2008.
DOI : 10.1145/1460096.1460127

R. Arandjelovic and A. Zisserman, Three things everyone should know to improve object retrieval, 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp.2911-2918, 2012.
DOI : 10.1109/CVPR.2012.6248018

]. P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik, From contours to regions: An empirical evaluation, 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009.
DOI : 10.1109/CVPR.2009.5206707

]. S. Ayache, G. Quénot, and A. Tseng, The LIGVID system for video retrieval and concept annotation, Proc. International Conference on Multimedia Information Retrieval, pp.385-388, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00953840

]. X. Bai and G. Sapiro, Geodesic Matting: A Framework for Fast Interactive Image and??Video Segmentation and Matting, Proc. IEEE International Conference on Computer Vision, 2007.
DOI : 10.1007/s11263-008-0191-z

]. X. Bai, J. Wang, D. Simons, and G. Sapiro, Video SnapCut: robust video object cutout using localized classifiers, Proc. ACM SIGGRAPH Conference, 2009.

]. W. Bailer, W. Weiss, C. Schober, and G. Thallinger, A Video Browsing Tool for Content Management in Media Post-Production, Proc. International Conf. on Advances in Multimedia Modelling, pp.658-659, 2012.
DOI : 10.1007/978-3-642-27355-1_69

]. H. Bay, T. Tuytelaars, and L. Van-gool, Surf: Speeded up robust features, Proc. European Conference on Computer Vision, pp.404-417, 2006.
DOI : 10.1007/11744023_32
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.679.3046

]. H. Bay, A. Ess, T. Tuytelaars, and L. Van-gool, Speeded-Up Robust Features (SURF), Computer Vision and Image Understanding, vol.110, issue.3, pp.346-359, 2008.
DOI : 10.1016/j.cviu.2007.09.014
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.205.738

]. A. Berg, T. L. Berg, and J. Malik, Shape Matching and Object Recognition Using Low Distortion Correspondences, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), pp.26-33, 2005.
DOI : 10.1109/CVPR.2005.320

M. Bertini, A. D. Bimbo, A. Ferracani, L. Landucci, and D. Pezzatini, Interactive multi-user video retrieval systems, Multimedia Tools and Applications, pp.1-27, 2011.
DOI : 10.1007/s11042-011-0888-9

]. J. Besag, Statistical analysis of dirty pictures*, Journal of Applied Statistics, vol.6, issue.5-6, pp.259-302, 1986.
DOI : 10.1016/0031-3203(83)90012-2

]. D. Blei, A. Ng, A. , and M. Jordan, Latent Dirichlet allocation, Proc. Neural Information Processing Systems Conf, 2002.

]. A. Bosch, A. Zisserman, and X. Munoz, Representing shape with a spatial pyramid kernel, Proceedings of the 6th ACM international conference on Image and video retrieval, CIVR '07, 2007.
DOI : 10.1145/1282280.1282340

]. A. Bosch, A. Zisserman, and X. Munoz, Scene Classification Using a Hybrid Generative/Discriminative Approach, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.30, issue.4, pp.712-727, 2008.
DOI : 10.1109/TPAMI.2007.70716

]. B. Boser, I. M. Guyon, and V. N. Vapnik, A training algorithm for optimal margin classifiers, Proceedings of the fifth annual workshop on Computational learning theory , COLT '92, pp.144-152, 1992.
DOI : 10.1145/130385.130401

]. Y. Boureau, F. Bach, Y. Lecun, and J. Ponce, Learning mid-level features for recognition, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5539963

]. Y. Boykov, O. Veksler, and R. Zabih, Fast approximate energy minimization via graph cuts, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.23, issue.11, pp.1222-1239, 2001.
DOI : 10.1109/34.969114

]. Y. Boykov and M. P. Jolly, Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, 2001.
DOI : 10.1109/ICCV.2001.937505

V. Kolmogorov, An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision, IEEE Trans. Pattern Analysis and Machine Intelligence, vol.26, issue.9, pp.1124-1137, 2004.

]. Y. Boykov and G. Funka-lea, Graph Cuts and Efficient N-D Image Segmentation, International Journal of Computer Vision, vol.18, issue.9, pp.109-131, 2006.
DOI : 10.1007/s11263-006-7934-5

]. D. Brainard, Color Appearance and Color Difference Specification The Science of Color, pp.191-216, 2003.

]. M. Bréhinier, S. Campion, and G. Gravier, Texmix, Proceedings of the 2nd ACM International Conference on Multimedia Retrieval, ICMR '12, 2003.
DOI : 10.1145/2324796.2324868

M. Brown and D. Lowe, Recognizing panoramas, Proc. IEEE International Conference on Computer Vision, pp.1218-1225, 2003.

]. G. Burghouts and J. M. Geusebroek, Performance evaluation of local colour invariants, Computer Vision and Image Understanding, vol.113, issue.1, pp.48-62, 2009.
DOI : 10.1016/j.cviu.2008.07.003

]. A. Bursuc, T. Zaharia, and F. Prêteux, Mobile video browsing and retrieval with the OVIDIUS platform, Proceedings of the international conference on Multimedia, MM '10, 2010.
DOI : 10.1145/1873951.1874315
URL : https://hal.archives-ouvertes.fr/hal-00625819

]. A. Bursuc, T. Zaharia, and O. Martinot, ARTEMIS-UBIMEDIA at TRECVid 2011: Instance Search, Proc. TRECVid 2011 -Text REtrieval Conference TRECVid Workshop, 2011.

]. A. Bursuc, T. Zaharia, and F. Prêteux, OVIDIUS: A Web Platform for Video Browsing and Search, Proc. International Conf. on Advances in Multimedia Modelling, pp.649-651, 2012.
DOI : 10.1007/978-3-642-27355-1_66

]. T. Caetano, J. J. Mcauley, L. Cheng, Q. V. Le, and A. J. Smola, Learning Graph Matching, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.31, issue.6, pp.1048-1058, 2009.
DOI : 10.1109/TPAMI.2009.28
URL : http://arxiv.org/abs/0806.2890

]. M. Calonder, V. Lepetit, C. Strecha, and P. Fua, BRIEF: Binary Robust Independent Elementary Features, Proc. 11 th European Conference on Computer Vision, 2010.
DOI : 10.1007/978-3-642-15561-1_56
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.175.2122

]. J. Cao, Y. Zhang, J. Guo, L. Bao, and J. Li, VideoMap, Proceeding of the ACM International Conference on Image and Video Retrieval, CIVR '09, 2009.
DOI : 10.1145/1646396.1646458

S. Cheng and . Chen, Image classification using color, texture and regions, Image and Vision Computing, vol.21, issue.9, pp.759-776, 2003.
DOI : 10.1016/S0262-8856(03)00069-6

]. F. Chevalier, J. P. Domenger, J. Benois-pineau, and M. Delest, Retrieval of objects in video by similarity based on graph matching, Pattern Recognition Letters, vol.28, issue.8, pp.939-949, 2007.
DOI : 10.1016/j.patrec.2006.12.009
URL : https://hal.archives-ouvertes.fr/hal-00307872

]. M. Cho, J. Lee, M. Kyoung, and . Lee, Feature Correspondence & Deformable Object Matching via Agglomerative Correspondence Clustering, Proc. 12 th IEEE International Conference on Computer Vision, 2009.

L. Chiariglione, Introduction to MPEG-7: Multimedia Content Description Interface Introduction to MPEG-7 Multimedia Content Description Interface, pp.3-6, 2002.

M. Cho and K. M. Lee, Bilateral Symmetry Detection and Segmentation via Symmetry- Growing, Proc. 20 th British Machine Vision Conference, 2009.
DOI : 10.5244/c.23.4

M. Cho and K. M. Lee, Progressive Graph Matching: Making a Move of Graphs via Probabilistic Voting, Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 2012.

C. M. Christoudias, B. Georgescu, and P. Meer, Synergism in low level vision, Object recognition supported by user interaction for service robots, 2002.
DOI : 10.1109/ICPR.2002.1047421

]. O. Chum, J. Matas, and J. Kittler, Locally Optimized RANSAC, Proc. DAGM Symposium, pp.236-243, 2003.
DOI : 10.1007/978-3-540-45243-0_31

]. O. Chum and J. Matas, Matching with PROSAC -progressive sampling consensus, Proc, 2005.
DOI : 10.1109/cvpr.2005.221
URL : https://dspace.cvut.cz/bitstream/10467/9496/1/2005-Matching-with-PROSAC-progressive-sample-consensus.pdf

]. O. Chum, J. Philbin, J. Sivic, M. Isard, and A. Zisserman, Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval, 2007 IEEE 11th International Conference on Computer Vision, pp.1-8, 2007.
DOI : 10.1109/ICCV.2007.4408891

]. O. Chum and J. Matas, Optimal Randomized RANSAC, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.30, issue.8, pp.1472-1482, 2008.
DOI : 10.1109/TPAMI.2007.70787

]. O. Chum, J. Philbin, and A. Zisserman, Near Duplicate Image Detection: min-Hash and tf-idf Weighting, Procedings of the British Machine Vision Conference 2008, 2008.
DOI : 10.5244/C.22.50

]. O. Chum, A. Mikulik, M. Perdoch, and J. Matas, Total recall II: Query expansion revisited, CVPR 2011, pp.889-896, 2011.
DOI : 10.1109/CVPR.2011.5995601

]. D. Comaniciu and P. Meer, Mean shift analysis and applications, Proceedings of the Seventh IEEE International Conference on Computer Vision, pp.1197-1203, 1999.
DOI : 10.1109/ICCV.1999.790416
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.21.2524

]. D. Comaniciu and P. Meer, Mean shift: a robust approach toward feature space analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.24, issue.5, pp.603-619, 2002.
DOI : 10.1109/34.1000236

]. C. Cortes and V. N. Vapnik, Support-vector networks, Machine Learning, 1995.
DOI : 10.1007/BF00994018

]. T. Cour, P. Srinivasan, and J. Shi, Balanced graph matching, Proc. Neural Information Processing Systems Conf, pp.313-320, 2006.

]. N. Dalal and B. Triggs, Histograms of Oriented Gradients for Human Detection, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 2005.
DOI : 10.1109/CVPR.2005.177
URL : https://hal.archives-ouvertes.fr/inria-00548512

]. B. Delezoide, G. Pitel, H. Le-borgne, G. Greffenstette, P. A. Moellic et al., Object/Background Scene Joint Classification in Photographs Using Linguistic Statistics from the Web, Proc. 2 nd International Language Resources for Content-Based Image Retrieval Workshop, 2008.

]. A. Delong, Advances in Graph-Cut Optimization: Multi-Surface Models, Label Costs, and Hierarchical Costs, 2011.

]. O. Duchenne, F. Bach, I. Kweon, and J. Ponce, A tensor based algorithm for high-order graph matching, Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 2009.
URL : https://hal.archives-ouvertes.fr/hal-01063322

]. O. Duchenne, A. Joulin, and J. Ponce, A graph-matching kernel for object categorization, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126445
URL : https://hal.archives-ouvertes.fr/hal-00650345

M. Everingham, L. Van-gool, C. K. Williams, J. Winn, and A. Zisserman, The Pascal Visual Object Classes (VOC) Challenge, International Journal of Computer Vision, vol.73, issue.2, pp.303-338, 2010.
DOI : 10.1007/s11263-009-0275-4

]. M. Fabro and L. Böszörmenyi, AAU video browser: non-sequential hierarchical video browsing without content analysis, Proc. International Conf. on Advances in Multimedia Modelling, 2012.

]. L. Fei-fei, P. Fei-fei, and . Perona, A Bayesian Hierarchical Model for Learning Natural Scene Categories, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 2005.
DOI : 10.1109/CVPR.2005.16

P. F. Felzenszwalb and D. P. Huttenlocher, Efficient Graph-Based Image Segmentation, International Journal of Computer Vision, vol.59, issue.2, pp.167-181, 2004.
DOI : 10.1023/B:VISI.0000022288.19776.77

P. F. Felzenszwalb, R. B. Girshick, D. Mcallester, and D. Ramanan, Object Detection with Discriminatively Trained Part-Based Models, Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/TPAMI.2009.167

]. V. Ferrari, T. Tuytelaars, and L. Gool, Simultaneous Object Recognition and Segmentation from Single or Multiple Model Views, International Journal of Computer Vision, vol.24, issue.3, pp.159-188, 2006.
DOI : 10.1007/s11263-005-3964-7

]. R. Fergus, P. Perona, and A. Zisserman, Object class recognition by unsupervised scaleinvariant learning, Proc. IEEE Conf. on Computer Vision and Pattern Recognition, pp.264-271, 2003.
DOI : 10.1109/cvpr.2003.1211479
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.114.7863

]. R. Fergus, P. Perona, and A. Zisserman, Weakly Supervised Scale-Invariant Learning of Models for Visual Recognition, International Journal of Computer Vision, vol.20, issue.1, pp.273-303, 2007.
DOI : 10.1007/s11263-006-8707-x

]. M. Fischler and R. C. Bolles, Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography, Communications of the ACM, vol.24, issue.6, pp.381-395, 1981.
DOI : 10.1145/358669.358692

]. D. Freedman and P. Drineas, Energy Minimization via Graph Cuts: Settling What is Possible, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), pp.939-946, 2005.
DOI : 10.1109/CVPR.2005.143

]. J. Friedman, J. L. Bentley, and R. A. , An Algorithm for Finding Best Matches in Logarithmic Expected Time, ACM Transactions on Mathematical Software, vol.3, issue.3, pp.209-226, 1977.
DOI : 10.1145/355744.355745

]. B. Fulkerson, A. Vedaldi, and S. Soatto, Class segmentation and objectlocalization with superpixel neighborhoods, Proc. IEEE International Conference on Computer Vision, 2009.
DOI : 10.1109/iccv.2009.5459175
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.150.4613

]. P. Gabriel, J. Hayet, J. Piater, and J. Verly, Object tracking using color interest points, Proceedings. IEEE Conference on Advanced Video and Signal Based Surveillance, 2005., 2005.
DOI : 10.1109/AVSS.2005.1577260
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.149.2548

J. Garding and T. Lindeberg, Direct computation of shape cues using scale-adapted spatial derivative operators, International Journal of Computer Vision, vol.8, issue.8, pp.163-191, 1996.
DOI : 10.1007/BF00058750

]. M. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness, 1979.

]. S. Geman and D. Geman, Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images, IEEE Trans. Pattern Analysis and Machine Intelligence, vol.6, issue.6, pp.721-741, 1984.

]. J. Van-gemert, C. J. Veenman, A. W. Smeulders, and J. Geusebroek, Visual Word Ambiguity, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.32, issue.7, pp.1271-1283, 2010.
DOI : 10.1109/TPAMI.2009.132

]. A. Girgensohn, F. Shipman, and L. Wilcox, Adaptive clustering and interactive visualizations to support the selection of video clips, Proceedings of the 1st ACM International Conference on Multimedia Retrieval, ICMR '11, 2011.
DOI : 10.1145/1991996.1992030

]. A. Goldberg and R. Tarjan, A new approach to the maximum-flow problem, Journal of the ACM, vol.35, issue.4, pp.921-940, 1988.
DOI : 10.1145/48014.61051

]. A. Goldberg, S. Hed, H. Kaplan, R. E. Tarjan, and R. F. Werneck, Maximum Flows by Incremental Breadth-First Search, Algorithms ESA, 2011.
DOI : 10.1007/978-3-642-23719-5_39

]. D. Gorisse, IRIM at TRECVID 2010: Semantic Indexing and Instance Search, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00591099

]. K. Grauman and T. Darrell, The pyramid match kernel: discriminative classification with sets of image features, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1, 2005.
DOI : 10.1109/ICCV.2005.239

]. D. Greig, B. Porteous, and A. Seheult, Exact Maximum A Posteriori Estimation for Binary Images, Journal of Royal Statistical Society, Series B, vol.51, issue.2, pp.271-279, 1989.

]. C. Gu, J. Lim, P. Arbelaez, and J. Malik, Recognition using regions, Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 2009.

]. J. Hafner, H. S. Sawhney, W. Equitz, M. Flickner, and W. Niblack, Efficient color histogram indexing for quadratic form distance functions, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.17, issue.7, pp.729-736, 1995.
DOI : 10.1109/34.391417

]. P. Halvorsen, D. Johansen, B. Olstad, T. Kupka, and S. Tennøe, vESP, Proceedings of the international conference on Multimedia, MM '10, pp.1603-1604, 2010.
DOI : 10.1145/1873951.1874298

]. A. Hauptmann, R. Yan, W. Lin, M. Christel, and H. Wactlar, Can High-Level Concepts Fill the Semantic Gap in Video Retrieval? A Case Study With Broadcast News, IEEE Transactions on Multimedia, vol.9, issue.5, pp.958-966, 2007.
DOI : 10.1109/TMM.2007.900150

]. R. Hummel and S. W. Zucker, On the foundations of relaxation labeling processes, IEEE Trans. Pattern Analysis and Machine Intelligence, vol.5, issue.3, pp.267-287, 1983.

]. T. Jaakkola and D. Haussler, Exploiting generative models in discriminative classifiers, Proc. Neural Information Processing Systems Conf, 1998.

]. M. Jansen, W. Heeren, and B. Van-dijk, Videotrees: Improving video surrogate presentation using hierarchy, 2008 International Workshop on Content-Based Multimedia Indexing, 2008.
DOI : 10.1109/CBMI.2008.4564997

]. S. Järvinen, J. Peltola, J. Lahti, and A. Sachinopoulou, Multimedia service creation platform for mobile experience sharing, Proceedings of the 8th International Conference on Mobile and Ubiquitous Multimedia, MUM '09, 2009.
DOI : 10.1145/1658550.1658556

]. H. Jégou, H. Harzallah, and C. Schmid, A contextual dissimilarity measure for accurate and efficient image search, 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007.
DOI : 10.1109/CVPR.2007.382970

]. H. Jégou, M. Douze, and C. Schmid, Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search, Proc. European Conference on Computer Vision, pp.304-317, 2008.
DOI : 10.1007/978-3-540-88682-2_24

]. H. Jégou, M. Douze, and C. Schmid, Improving Bag-of-Features for Large Scale Image Search, International Journal of Computer Vision, vol.42, issue.3, pp.316-336, 2010.
DOI : 10.1007/s11263-009-0285-2

H. Jégou, F. Perronnin, M. Douze, J. Sanchez, P. Pérez et al., Aggregating Local Image Descriptors into Compact Codes, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.34, issue.9, pp.1704-1716, 2012.
DOI : 10.1109/TPAMI.2011.235

]. R. Jesus, R. Dias, R. Frias, and N. Correia, Geographic image retrieval in mobile guides, Proceedings of the 4th ACM workshop on Geographical information retrieval , GIR '07, 2007.
DOI : 10.1145/1316948.1316958

M. Jia, Photo-to-Search: Using camera phones to inquire of the surrounding world, Proc. 7 th International Conference on Mobile Data Management, 2006.

]. H. Jiang, M. S. Drew, and Z. Li, Matching by Linear Programming and Successive Convexification, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.29, issue.6, pp.959-975, 2007.
DOI : 10.1109/TPAMI.2007.1048

]. D. Joshi, R. Datta, Z. Zhuang, W. P. Weiss, M. Friedenberg et al., Paragrab: a comprehensive architecture for web image management and multimodal querying, Proc. International Conference on Very Large Data Bases, pp.1163-1166, 2006.

]. F. Jurie and B. Triggs, Creating efficient codebooks for visual recognition, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1, 2005.
DOI : 10.1109/ICCV.2005.66
URL : https://hal.archives-ouvertes.fr/inria-00548511

E. Kalogerakis, O. Vesselova, J. Hays, A. Efros, and A. Hertzmann, Image sequence geolocation with human travel priors, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459259

]. H. Kang, M. Hebert-;-t, and . Kanade, Discovering object instances from scenes of Daily Living, Proc. IEEE International Conference on Computer Vision, pp.762-769, 2011.

]. R. Karp, R. E. Miller, and J. W. Thatcher, Reducibility among combinatorial problems, Complexity of Computer Computations, pp.85-103, 1972.

R. Sukthankar, PCA-SIFT: a more distinctive representation for local image descriptors, Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 2004.

. Kim, VISCORS: A visual-content recommender for the mobile web, IEEE Intelligent Systems, vol.19, issue.6, pp.32-39, 2004.

]. S. Kim, Y. Tak, Y. Nam, and E. Hwang, CLOVER, Proceedings of the 13th annual ACM international conference on Multimedia , MULTIMEDIA '05, pp.215-216, 2005.
DOI : 10.1145/1101149.1101183

]. K. Kim and K. Grauman, Boundary Preserving Dense Local Regions, Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 2011.
DOI : 10.1109/cvpr.2011.5995526
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.362.9043

]. S. Kirkpatrick, C. D. Gelatt, and M. P. Vechi, Optimization by simulated annealing, Science, vol.220, 1983.

]. P. Kohli and P. H. Torr, Dynamic Graph Cuts for Efficient Inference in Markov Random Fields, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.29, issue.12, pp.2079-2088, 2007.
DOI : 10.1109/TPAMI.2007.1128

V. Kolmogorov, R. Kolmogorov, and . Zabih, What energy functions can be minimized via graph cuts?, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.26, issue.2, pp.147-159, 2004.
DOI : 10.1109/TPAMI.2004.1262177
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.113.1823

]. C. Lampert, M. B. Blaschko, and T. Hofmann, Beyond sliding windows: Object localization by efficient subwindow search, 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp.1-8, 2008.
DOI : 10.1109/CVPR.2008.4587586

G. Grefenstette and J. L. Gauvain, VoxaleadNews: robust automatic segmentation of video into browsable content, Proc. 17 th ACM International Conference on Multimedia, pp.1119-1120, 2009.

]. S. Lazebnik, C. Schmid, and J. Ponce, Affine-invariant local descriptors and neighborhood statistics for texture recognition, Proceedings Ninth IEEE International Conference on Computer Vision, pp.649-655, 2003.
DOI : 10.1109/ICCV.2003.1238409
URL : https://hal.archives-ouvertes.fr/inria-00548231

]. S. Lazebnik, C. Schmid, and J. Ponce, Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 2 (CVPR'06), 2006.
DOI : 10.1109/CVPR.2006.68
URL : https://hal.archives-ouvertes.fr/inria-00548585

]. H. Le-borgne and N. Honnorat, Fast shared boosting: Application to large-scale visual concept detection, 2010 International Workshop on Content Based Multimedia Indexing (CBMI), 2010.
DOI : 10.1109/CBMI.2010.5529912

]. J. Lee, M. Cho, and K. M. Lee, Hyper-graph matching via reweighted random walks, CVPR 2011, pp.1633-1640, 2011.
DOI : 10.1109/CVPR.2011.5995387

]. B. Leibe, A. Leonardis, and B. Schiele, Robust Object Detection with Interleaved Categorization and Segmentation, International Journal of Computer Vision, vol.73, issue.2, pp.259-289, 2008.
DOI : 10.1007/s11263-007-0095-3

A. Levinshtein, A. Stere, K. Kutulakos, D. Fleet, S. Dickinson et al., TurboPixels: Fast Superpixels Using Geometric Flows, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.31, issue.12, pp.2290-2297, 2009.
DOI : 10.1109/TPAMI.2009.96

]. M. Leordeanu and M. Hebert, A spectral technique for correspondence problems using pairwise constraints, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1, 2005.
DOI : 10.1109/ICCV.2005.20

M. Hebert, Unsupervised learning for graph matching, Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 2009.

M. Leordeanu and M. Herbert, An integer projected fixed point method for graph matching and map inference, Proc. Neural Information Processing Systems Conf, 2009.

]. V. Lepetit, P. Lagger, and P. Fua, Randomized Trees for Real-Time Keypoint Recognition, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 2005.
DOI : 10.1109/CVPR.2005.288

]. T. Leung and J. Malik, Representing and recognizing the visual appearance of materials using three-dimensional textons, International Journal of Computer Vision, vol.43, issue.1, pp.29-44, 2001.
DOI : 10.1023/A:1011126920638

S. Leutenegger, M. Chli, and R. Siegwart, BRISK: Binary Robust invariant scalable keypoints, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126542
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.371.1343

]. Y. Li, J. Sun, C. Tang, and H. Shum, Lazy Snapping, Proc. ACM SIGGRAPH Conference, pp.303-308, 2004.
DOI : 10.1145/1186562.1015719

]. H. Li, E. Kim, X. Huang, and L. He, Object matching with a locally affine-invariant constraint, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.1641-1648, 2010.
DOI : 10.1109/CVPR.2010.5539776

]. H. Li, J. Huang, S. Zhang-;-x, and . Huang, Optimal object matching via convexification and composition, 2011 International Conference on Computer Vision, pp.33-40, 2011.
DOI : 10.1109/ICCV.2011.6126222
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.231.6033

]. Y. Liu, Z. Yang, X. Deng, J. Bu, and C. Chen, Media Browsing for Mobile Devices Based on Resolution Adaptive Recommendation, 2009 WRI International Conference on Communications and Mobile Computing, pp.285-290, 2009.
DOI : 10.1109/CMC.2009.124

]. D. Lowe, Perceptual Organization and Visual Recognition, 1985.
DOI : 10.1007/978-1-4613-2551-2
URL : http://www.dtic.mil/get-tr-doc/pdf?AD=ADA150826

]. D. Lowe, Object recognition from local scale-invariant features, Proceedings of the Seventh IEEE International Conference on Computer Vision, pp.1150-1157, 1999.
DOI : 10.1109/ICCV.1999.790410
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.121.4065

]. D. Lowe, Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, vol.60, issue.2, pp.91-110, 2004.
DOI : 10.1023/B:VISI.0000029664.99615.94
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.14.4931

M. Lundy and A. Mees, Convergence of an annealing algorithm, Mathematical Programming, pp.111-124, 1986.
DOI : 10.1007/BF01582166

]. S. Mahamud and M. Hebert, The optimal distance measure for object detection, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings., 2003.
DOI : 10.1109/CVPR.2003.1211361

]. M. Maire, P. Arbelaez, C. Fowlkes, and J. Malik, Using contours to detect and localize junctions in natural images, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2008.4587420

T. Malisiewicz and A. Efros, Improving Spatial Support for Objects via Multiple Segmentations, Procedings of the British Machine Vision Conference 2007, 2007.
DOI : 10.5244/C.21.55

]. K. Manske, Video browsing using 3D video content trees, Proceedings of the 1998 workshop on New paradigms in information visualization and manipulation , NPIV '98, 1998.
DOI : 10.1145/324332.324336

]. J. Matas, O. Chum, M. Urban, and T. Pajdla, Robust wide baseline stereo from maximally stable extremal regions, Proc. British Machine Vision Conference, pp.384-393, 2002.
DOI : 10.5244/c.16.36
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.671.8241

N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller, Equation of State Calculations by Fast Computing Machines, The Journal of Chemical Physics, vol.21, issue.6, pp.1087-1092, 1953.
DOI : 10.1063/1.1699114

K. Mikolajczyk and C. Schmid, Indexing based on scale invariant interest points, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, pp.525-531, 2001.
DOI : 10.1109/ICCV.2001.937561
URL : https://hal.archives-ouvertes.fr/inria-00548276

K. Mikolajczyk and C. Schmid, An Affine Invariant Interest Point Detector, Proc
DOI : 10.1007/3-540-47969-4_9
URL : https://hal.archives-ouvertes.fr/inria-00548252

K. Mikolajczyk, C. Mikolajczyk, and . Schmid, Scale & Affine Invariant Interest Point Detectors, International Journal of Computer Vision, vol.60, issue.1, pp.63-86, 2004.
DOI : 10.1023/B:VISI.0000027790.02288.f2
URL : https://hal.archives-ouvertes.fr/inria-00548554

K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas et al., A Comparison of Affine Region Detectors, International Journal of Computer Vision, vol.65, issue.1-2, pp.43-72, 2005.
DOI : 10.1007/s11263-005-3848-x
URL : https://hal.archives-ouvertes.fr/inria-00548528

]. G. Miller, S. Fels, M. Finke, W. Motz, W. Eagleston et al., MiniDiver: A Novel Mobile Media Playback Interface for Rich Video Content on an iPhoneTM, Proc, 2009.
DOI : 10.1006/ijhc.2001.0459

]. A. Mishra and Y. Aloimonos, Active segmentation with fixation, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459254

]. H. Moravec, Towards automatic visual obstacle avoidance, Proc. International Joint Conference on Artificial Intelligence, 1977.

M. Muja and D. Lowe, Fast approximate nearest neighbors with automatic algorithm configuration, Proc. International Conference on Computer Vision Theory and Applications, 2009.

]. R. Nevatia and T. O. Binford, Description and recognition of curved objects???, Artificial Intelligence, vol.8, issue.1, pp.77-98, 1977.
DOI : 10.1016/0004-3702(77)90006-6

]. K. Ni, H. Jin, and F. Dellaert, GroupSAC: Efficient Consensus in the Presence of Groupings, Proc. IEEE International Conference on Computer Vision, 2009.

]. W. Niblack, R. Barber, W. Equitz, M. Fickner, E. Glasman et al., <title>QBIC project: querying images by content, using color, texture, and shape</title>, Storage and Retrieval for Image and Video Databases, 1993.
DOI : 10.1117/12.143648

]. D. Nister and H. Stewenius, Scalable Recognition with a Vocabulary Tree, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 2 (CVPR'06), 2006.
DOI : 10.1109/CVPR.2006.264

T. Pietikainen and . Maenpaa, Multiresolution gray-scale and rotation invariant texture classiffication with local binary patterns, IEEE Trans. Pattern Analysis and Machine Intelligence, vol.24, issue.7, pp.971-987, 2002.

]. A. Oliva and A. Torralba, Modeling the shape of the scene: A holistic representation of the spatial envelope, International Journal of Computer Vision, vol.42, issue.3, pp.145-175, 2001.
DOI : 10.1023/A:1011139631724

]. L. Page, S. Brin, M. Rajeev, and T. Winograd, The PageRank citation ranking: bringing order to the web, 1999.

]. C. Pantofaru, G. Dorko, C. Scmid, and M. Hebert, Combining Regions and Patches for Object Class Localization, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06), 2006.
DOI : 10.1109/CVPRW.2006.57
URL : https://hal.archives-ouvertes.fr/inria-00548581

]. C. Pantofaru, C. Schmid, and M. Hebert, Object Recognition by Integrating Multiple Image Segmentations, Proc. European Conference on Computer Vision, 2008.
DOI : 10.1007/978-3-540-88690-7_36
URL : https://hal.archives-ouvertes.fr/inria-00548655

]. M. Perdoch, O. Chum, and J. Matas, Efficient representation of local geometry for large scale object retrieval, 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009.
DOI : 10.1109/CVPR.2009.5206529

]. F. Pereira and B. Koenen, Context, Goals and Procedures Introduction to MPEG-7 Multimedia Content Description Interface, pp.7-30, 2002.

]. S. Pfeiffer, The Definitive Guide to HTML5 Video, Apress, pp.978-979, 2010.
DOI : 10.1007/978-1-4302-3091-5

]. J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, Object retrieval with large vocabularies and fast spatial matching, 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007.
DOI : 10.1109/CVPR.2007.383172

]. J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, Lost in quantization: Improving particular object retrieval in large scale image databases, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2008.4587635

]. J. Philbin, Scalable Object Retrieval in Very Large Image Collections, 2010.

]. J. Philbin, J. Sivic, and A. Zisserman, Geometric Latent Dirichlet Allocation on a Matching Graph for??Large-scale Image Datasets, International Journal of Computer Vision, vol.62, issue.2, pp.138-153, 2011.
DOI : 10.1007/s11263-010-0363-5
URL : https://hal.archives-ouvertes.fr/hal-01064717

]. J. Philbin, FASTCLUSTER: A library for fast, distributed clustering, 2012.

]. R. Potts-]-r, J. M. Raguram, M. Frahm, and . Pollefeys, Some Generalized Order-Disorder Transformation A Comparative Analysis of RANSAC Techniques Leading to Adaptive Real-Time Random Sample Consensus, Proc. Cambridge Proc. European Conference on Computer Vision, pp.500-513, 1952.

]. X. Ren and J. Malik, Learning a classication model for segmentation, Proc. IEEE International Conference on Computer Vision, 2003.

]. S. Robertson, THE PROBABILITY RANKING PRINCIPLE IN IR, Journal of Documentation, vol.33, issue.4, pp.294-304, 1977.
DOI : 10.1108/eb026647

]. O. Rooij, C. G. Snoek, and M. Worring, Query on demand video browsing, Proceedings of the 15th international conference on Multimedia , MULTIMEDIA '07, pp.811-814, 2007.
DOI : 10.1145/1291233.1291417

]. O. Rooij, C. G. Snoek, and M. Worring, Balancing thread based navigation for targeted video search, Proceedings of the 2008 international conference on Content-based image and video retrieval, CIVR '08, pp.485-494, 2008.
DOI : 10.1145/1386352.1386414

]. O. Rooij, C. G. Snoek, and M. Worring, MediaMill, Proceeding of the ACM International Conference on Image and Video Retrieval, CIVR '09, 2009.
DOI : 10.1145/1646396.1646454

]. O. Rooij and M. Worring, MediaTable, Proceedings of the international conference on Multimedia, MM '10, pp.1633-1636, 2010.
DOI : 10.1145/1873951.1874307

]. E. Rosten and T. Drummond, Machine learning for highspeed corner detection, Proc. European Conference on Comuter Vision, 2006.
DOI : 10.1007/11744023_34
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.60.3991

]. S. Roy and I. Cox, A maximum-flow formulation of the N-camera stereo correspondence problem, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271), 1998.
DOI : 10.1109/ICCV.1998.710763

]. E. Rublee, ;. Konolige, and G. Bradski, ORB: An efficient alternative to SIFT or SURF, 2011 International Conference on Computer Vision, pp.2564-2571, 2011.
DOI : 10.1109/ICCV.2011.6126544

]. B. Russell, A. Torralba, K. Murphy, and W. T. Freeman, LabelMe: A Database and Web-Based Tool for Image Annotation, International Journal of Computer Vision, vol.3, issue.1, pp.157-173, 2008.
DOI : 10.1007/s11263-007-0090-8

]. M. Sadeghi and A. Farhadi, Recognition using visual phrases, CVPR 2011, pp.1745-1752, 2011.
DOI : 10.1109/CVPR.2011.5995711
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.226.5551

]. P. Salembier, O. Avaro, G. Salton, and C. Buckley, MPEG-7: Multimedia Content Description interface, Term-weighting approaches in automatic text retrieval, " Information Proceeding and Management, pp.513-523, 1988.

]. K. Van-de-sande, T. Gevers, and C. G. Snoek, Evaluating Color Descriptors for Object and Scene Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.32, issue.9, pp.1582-1596, 2010.
DOI : 10.1109/TPAMI.2009.154

]. K. Van-de-sande, J. R. Uijling, and A. W. Smeulders, Segmentation as selective search for object recognition, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126456

F. Schaffalitzky and A. Zisserman, Multi-view Matching for Unordered Image Sets, or ???How Do I Organize My Holiday Snaps????, Proc. 7 th European Conference on Computer Vision, pp.414-431, 2002.
DOI : 10.1007/3-540-47969-4_28

J. Crowley, Recognition without correspondence using multidimensional receptive field histograms, International Journal of Computer Vision, vol.36, issue.1, pp.31-50, 2000.

H. Schneiderman and T. Kanade, Object Detection Using the Statistics of Parts, International Journal of Computer Vision, vol.56, issue.3, pp.151-177, 2004.
DOI : 10.1023/B:VISI.0000011202.85607.00

K. Schoeffmann and M. D. Fabro, Hierarchical video browsing with a 3D carousel, Proceedings of the 19th ACM international conference on Multimedia, MM '11, pp.827-828, 2011.
DOI : 10.1145/2072298.2072479

K. Schoeffmann, D. Ahlström, and L. Böszörmenyi, Video Browsing with a 3D Thumbnail Ring Arranged by Color Similarity, Proc. International Conf. on Advances in Multimedia Modelling, pp.646-648, 2012.
DOI : 10.1007/978-3-642-27355-1_70

K. Schoeffmann and W. Bailer, Video browser showdown, ACM SIGMultimedia Records, vol.4, issue.2, pp.1-2, 2012.
DOI : 10.1145/2350204.2350205

]. D. Scott, J. Guo, H. Wang, Y. Yang, F. Hopfgartner et al., Clipboard: A Visual Search and Browsing Engine for Tablet and PC, Proc. International Conf. on Advances in Multimedia Modelling, pp.646-648, 2012.
DOI : 10.1109/TPAMI.2009.154

]. J. Shi and J. Malik, Normalized cuts and image segmentation, IEEE Trans. Pattern Analysis and Machine Intelligence, vol.22, issue.8, pp.888-905, 2000.

]. M. Shindler, A. Meyerson, and A. Wong, Fast and accurate k-means for large datasets, Advances in Neural Information Processing Systems, pp.2375-2383, 2011.

]. J. Sivic and A. Zisserman, Video Google: a text retrieval approach to object matching in videos, Proceedings Ninth IEEE International Conference on Computer Vision, 2003.
DOI : 10.1109/ICCV.2003.1238663

]. J. Sivic, F. Schaffalitzky, and A. Zisserman, Object Level Grouping for Video Shots, International Journal of Computer Vision, vol.2, issue.3, pp.189-210, 2006.
DOI : 10.1007/s11263-005-4264-y

]. J. Sivic and A. Zisserman, Efficient Visual Search of Videos Cast as Text Retrieval, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.31, issue.4, pp.591-606, 2009.
DOI : 10.1109/TPAMI.2008.111

]. J. Smith and S. F. Chang, VisualSEEk, Proceedings of the fourth ACM international conference on Multimedia , MULTIMEDIA '96, 1996.
DOI : 10.1145/244130.244151

]. C. Snoek, M. Worring, D. Koelma, and A. W. Smeulders, A Learned Lexicon-Driven Paradigm for Interactive Video Retrieval, IEEE Transactions on Multimedia, vol.9, issue.2, pp.280-292, 2007.
DOI : 10.1109/TMM.2006.886275

]. C. Snoek, M. Worring, O. D. Rooij, K. E. Van-de-sande, K. et al., VideOlympics: Real-Time Evaluation of Multimedia Retrieval Systems, IEEE Multimedia, vol.15, issue.1, pp.86-91, 2008.
DOI : 10.1109/MMUL.2008.21

]. C. Snoek and M. Worring, Concept-Based Video Retrieval, Foundations and Trends?? in Information Retrieval, vol.2, issue.4, pp.215-322, 2008.
DOI : 10.1561/1500000014
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.156.5031

]. Jones, S. Walker, and S. E. Robertson, A probabilistic model of information retrieval: development and comparative experiments, Information Processing & Management, vol.36, issue.6, pp.779-840, 2000.
DOI : 10.1016/S0306-4573(00)00015-7

H. Zheng, C. Yu, H. Jin, X. Lu, and . Xue, Fudan University: hierarchical video retrieval with adaptive multi-modal fusion, Proc. ACM International Conference on Content-Based Image and Video Retrieval, pp.549-550, 2008.

T. ]. Tapu and . Zaharia, A complete framework for temporal video segmentation, 2011 IEEE International Conference on Consumer Electronics -Berlin (ICCE-Berlin), 2011.
DOI : 10.1109/ICCE-Berlin.2011.6031875
URL : https://hal.archives-ouvertes.fr/hal-00625886

]. P. Tirilly, V. Claveau, and P. Gros, Distances and weighting schemes for bag of visual words image retrieval, Proceedings of the international conference on Multimedia information retrieval, MIR '10, pp.323-332, 2010.
DOI : 10.1145/1743384.1743438
URL : https://hal.archives-ouvertes.fr/inria-00523975

]. E. Tola, V. Lepetit, and P. Fua, A fast local descriptor for dense matching, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2008.4587673

]. P. Torr and A. Zisserman, Feature based methods for structure and motion estimation Vision Algorithms: Theory and Practice, pp.278-294, 2000.

]. T. Tuytelaars and L. Van-gool, Wide Baseline Stereo Matching based on Local, Affinely Invariant Regions, Procedings of the British Machine Vision Conference 2000, pp.412-425, 2000.
DOI : 10.5244/C.14.38
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.124.2860

]. T. Tuytelaars and C. Schmid, Vector Quantizing Feature Space with a Regular Lattice, 2007 IEEE 11th International Conference on Computer Vision, 2007.
DOI : 10.1109/ICCV.2007.4408924
URL : https://hal.archives-ouvertes.fr/inria-00548675

]. T. Tuytelaars and K. Mikolajczyk, Local Invariant Feature Detectors: A Survey, Foundations and Trends?? in Computer Graphics and Vision, vol.3, issue.3, pp.177-280, 2008.
DOI : 10.1561/0600000017
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.377.3635

]. S. Ullman, E. Sali, and M. Vidal-naquet, A Fragment-Based Approach to Object Representation and Classification, Proc. International Workshop on Visual Form, pp.85-100, 2001.
DOI : 10.1007/3-540-45129-3_7

M. Varma and A. Zisserman, Unifying statistical texture classification frameworks, Image and Vision Computing, vol.22, issue.14, pp.1175-1183, 2005.
DOI : 10.1016/j.imavis.2004.03.012
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.85.9116

]. A. Vedaldi, V. Gulshan, M. Varma, and A. Zisserman, Multiple kernels for object detection, 2009 IEEE 12th International Conference on Computer Vision, 2009.
DOI : 10.1109/ICCV.2009.5459183

]. O. Veksler, Y. Boykov, and P. Mehrani, Superpixels and Supervoxels in an Energy Optimization Framework, Proc. European Conference on Computer Vision, 2010.
DOI : 10.1007/978-3-642-15555-0_16

]. R. Vieux, J. Benois-pineau, and J. P. Domenger, Content based image retrieval using bagof-regions, Proc. International Conference on Advances in Multimedia Modeling, pp.507-517, 2012.

S. Vijayanarasimhan and K. Grauman, Efficient region search for object detection, CVPR 2011
DOI : 10.1109/CVPR.2011.5995545
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.486.2854

]. S. Vrochidis, P. King, L. Makris, A. Moumtzidou, V. Mezaris et al., MKLab interactive video retrieval system, Proc. ACM International Conference on Content-Based Image and Video Retrieval, pp.563-564, 2008.
DOI : 10.1145/1386352.1386432

]. X. Wang, D. Q. Zhang, T. Gu, and H. K. Pung, Ontology based context modeling and reasoning using OWL, Proc. 2 nd IEEE Conference on Pervasive Computing and Communications Workshops, pp.18-22, 2004.

]. L. Wang, D. Tjondrongoro, and Y. Liu, Clustering and visualizing audiovisual dataset on mobile devices in a topicoriented manner, Proc. 9th International Conference on Advances in Visual Information Systems, pp.310-321, 2007.

]. J. Van-de-weijer, T. Gevers, and A. Bagdanov, Boosting color saliency in image feature detection, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.28, issue.1, pp.150-156, 2006.
DOI : 10.1109/TPAMI.2006.3
URL : https://hal.archives-ouvertes.fr/inria-00548615

]. C. Wengert, M. Douze, and H. Jégou, Bag-of-colors for improved image search, Proceedings of the 19th ACM international conference on Multimedia, MM '11, pp.1437-1440, 2011.
DOI : 10.1145/2072298.2072034
URL : https://hal.archives-ouvertes.fr/inria-00614523

]. J. Wills, S. Agarwal, and S. Belongie, A Feature-based Approach for Dense Segmentation and Estimation of Large Disparity Motion, International Journal of Computer Vision, vol.II, issue.12, pp.125-143, 2006.
DOI : 10.1007/s11263-006-6660-3

Z. Wu, Q. Ke-;-m, J. Isard, and . Sun, Bundling features for large scale partial-duplicate web image search, Proc. IEEE Conf. on Computer Vision and Pattern Recognition, pp.25-32, 2009.

]. L. Yang, P. Meer, and D. Foran, Multiple Class Segmentation Using A Unified Framework over Mean-Shift Patches, 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007.
DOI : 10.1109/CVPR.2007.383229

]. J. Yang, J. Yu-gang, A. G. Hauptmann, and C. W. Ngo, Evaluating bag-of-visual-words representations in scene classification, Proceedings of the international workshop on Workshop on multimedia information retrieval , MIR '07, pp.197-206, 2007.
DOI : 10.1145/1290082.1290111

]. N. Yang, W. H. Chang, C. M. Kuo, and T. H. Li, A fast MPEG-7 dominant color extraction with new similarity measure for image retrieval, Journal of Visual Communication and Image Representation, vol.19, issue.2, pp.92-105, 2008.
DOI : 10.1016/j.jvcir.2007.05.003

]. T. Yeh, K. Grauman, K. Tollmar, and T. Darrell, A picture is worth a thousand keywords, CHI '05 extended abstracts on Human factors in computing systems , CHI '05, pp.2025-2028, 2005.
DOI : 10.1145/1056808.1057083

]. J. Yuen, B. Russell, C. Liu, and A. Torralba, LabelMe video: Building a video database with human annotations, 2009 IEEE 12th International Conference on Computer Vision, pp.1451-1458, 2009.
DOI : 10.1109/ICCV.2009.5459289

]. T. Zaharia, A. Vaucelle, T. Laquet, and F. Preteux, INVENIO: An MPEG-7 image indexing platform for content re-use within audio-visual production chains, 2010 International Workshop on Content Based Multimedia Indexing (CBMI), 2010.
DOI : 10.1109/CBMI.2010.5529887
URL : https://hal.archives-ouvertes.fr/hal-00625813

S. Zheng, X. Neo, T. Chen, and . Chua, VisionGo, Proceeding of the ACM International Conference on Image and Video Retrieval, CIVR '09, 2009.
DOI : 10.1145/1646396.1646456

C. Zhu, K. Li, Q. Lv, L. Shang, and R. P. Dick, iScope, Proceedings of the 7th international conference on Mobile systems, applications, and services, Mobisys '09, pp.277-290, 2009.
DOI : 10.1145/1555816.1555845

]. C. Zhu and S. Satoh, Large vocabulary quantization for searching instances from videos, Proceedings of the 2nd ACM International Conference on Multimedia Retrieval, ICMR '12, 2012.
DOI : 10.1145/2324796.2324856

]. T. Zin, P. Tin, T. Toriu, and H. Hama, Dominant Color Embedded Markov Chain Model for Object Image Retrieval, 2009 Fifth International Conference on Intelligent Information Hiding and Multimedia Signal Processing, 2009.
DOI : 10.1109/IIH-MSP.2009.281