Skip to Main content Skip to Navigation

Deep learning methods for visual content creation and understanding

Abstract : The goal of this thesis is to develop algorithms to help visual artists create and manipulate images easily with deep learning and computer vision tools. AI advances, in particular generative models, have enabled new possibilities that can be leveraged in the artistic domain to simplify the manipulation of digital visual content, and assist artists in finding inspiring ideas. Progress in this domain could democratize the access to visual content manipulation software, which still requires time, money and expert skills.The first contribution of this thesis is the introduction of two methods for generating novel and surprising images: one for generating new fashion designs and one for creating unexpected visual blends. First, we show how generative adversarial networks can be used as an inspirational tool for fashion designers to create realistic and novel designs. While most image generation models aim to generate realistic images that cannot be differentiated from the real ones, they tend to reproduce the training examples. We instead focus on designing models that encourage novelty and surprise in the generated images.Second, we develop a visual blending model that allows the generation of compositions by blending objects in uncommon contexts based on visual similarity. Using recent advances in image retrieval, completion and blending, our simple model provides realistic and surprising visual blends. We study how the selection of the foreground object influences its novelty and realism.In the rest of the thesis, we focus on improving the image generation methods presented by exploring how generative models can be extended to resolution independent image generation and by studying the quality of image features used in image retrieval from a training data perspective.The second contribution is a new layered image decomposition and generation model aimed at representing images in a resolution independent and easily editable way.Generating higher resolution images is challenging from a training time and stability perspectives.To alleviate these difficulties, we design the first deep learning based image generation model using vector mask layers.We frame vector mask generation using a parametric function (multi-layer perceptron) applied on a regular coordinate grid to obtain mask values at input pixel positions.Our model reconstructs images by predicting vector masks and their corresponding colors then iteratively blends colored masks.We train our model to reconstruct natural images, from face images to more diverse ones, we show how our model captures interesting mask embeddings that can be used for image editing and vectorization. Furthermore, we present an adversarially trained setup of our vector image generation model.The third contribution is focused on image retrieval and few-shot classification. Indeed, a large part of the artistic work and effort when creating visual blends is searching for relevant images to use. To simplify this tedious step of image search, deep features can be used as similarity measures to retrieve images. While there has been consequent work on learning image representations for image classification, and particularly using self-supervised techniques, the impact of the training dataset on the quality of learned features has not been extensively explored. Thus, we study the impact of the base dataset composition on the quality of features from a few-shot classification perspective. We show that designing the base training dataset is crucial for improving the features for few-shot classification performance. For instance, a careful dataset relabeling allows to increase the performance considerably using a simple competitive baseline model
Document type :
Complete list of metadata
Contributor : ABES STAR :  Contact
Submitted on : Monday, December 6, 2021 - 5:45:09 PM
Last modification on : Thursday, September 29, 2022 - 10:47:06 AM
Long-term archiving on: : Monday, March 7, 2022 - 7:44:18 PM


Version validated by the jury (STAR)


  • HAL Id : tel-03467925, version 1


Othman Sbai. Deep learning methods for visual content creation and understanding. Artificial Intelligence [cs.AI]. École des Ponts ParisTech, 2021. English. ⟨NNT : 2021ENPC0020⟩. ⟨tel-03467925⟩



Record views


Files downloads