Multidimensional Hidden Markov Model Applied to Image and Video Analysis

Joakim Jitén Söderberg

Résumé

Recent progress and prospects in cognitive vision, multimedia, human-computer interaction, communications and the Web call for, and can profit from applications of advanced image and video analysis. Adaptive robust systems are required for analysis, indexing and summarization of large amounts of audio-visual data. Image classification is perhaps the most important part of digital image analysis. The objective is to identify and portray the visual features occurring in an image in terms of differentiated classes or themes. Applications can be found in a wide range of domains such as medical image understanding, surveillance applications, remote sensing and interactive TV. Traditional image classification methods analyses independent blocks of an image, which results in a context-free formalism. However there is a fairly wide-spread agreement that observations should be presented as collections of features which appear in a given mutual position or shape (e.g. sun in the sky, sky above landscape or boat in the water etc.) [20], [21]. Consider analyzing local features in a small region of an image; it is sometimes difficult even for a human to tell what the image is about. In this dissertation we apply a statistical machine learning approach to model context in sequential data. With a statistical model in hand, we can perform several important tasks to image analysis such as; estimation, classification and segmentation. We employ a new efficient algorithm that models images by a two dimensional hidden Markov model (HMM). The HMM considers observations statistically dependent on neighboring observations through transition probabilities organized in a Markov mesh, giving a dependency in two dimensions. The main difficulty with applying a 2-D HMM to images is the computational complexity which grows exponentially with the number of image blocks. The main technical contribution of this thesis is a way of estimating the parameters of a 2-D HMM in O(whN^2) complexity instead of O(wN^(2h)), where N is the number of states in the model and (w,h) is the width respectively height of the image. We investigate the performance of our proposed model (DT HMM), and search for its point of operation. Application to classification of TV broadcast frames reveal intrinsic weaknesses of the HMMs for which we propose remedies. In an effort to introduce both global and local context in images, the DT HMM was extended to model multiple image resolutions. The results indicate that the earlier recorded deficiency can be conquered and that its performance can be compared with other known algorithms. Finally we illustrate that the DT HMM formalism is open to a great variety of extensions and tracks. Since 3-D HMMs has been little studied we investigate the extension of the framework to three dimensions. We consider the case of video data, where the two dimensions are spatial, while the third dimension is temporal. To investigate the impact of the time-dimension dependency we explore the ability of the model to track objects that cross each other or pass behind another static object.

Multidimensional Hidden Markov Model Applied to Image and Video Analysis

Modèle de Markov caché multidimensionnelle appliqué aux images et à l'analyse vidéo

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager