Abstract

In this project we have attempted to create a unique artistic image by composing two images where the new image will have content of one image and style of the other image. This is one of the kinds of painting where the artists have mastered their skills with huge practice over time. We have used Deep Neural Network (VGG Network) in order to accomplish the task. In this work, we can see an algorithmic approach of how humans create and perceive artistic imagery.

1 Problem statement

In view of creating new art in painting, artists always look to create something new. There is a type of painting they have mastered where they create a unique image by composing two images such that the content of one image is shown in the style of the other image. We have worked on creating such artistic images using an approach mentioned in the research paper (provided in references), which utilises the properties of Deep Neural Network in order to achieve this task. Here, we combine the content and style of two different images using neural representation. This is a good way to get an algorithmic understanding of how we create and perceive the artistic image. This has already been used for a wide variety of applications and has a potential to open new possibilities especially in the field of graphic design and textile industry.

2 Proposed solution

In the research paper, the author has focussed on a class of Deep Neural Network which is very powerful for image processing named Convolutional Neural Networks. In CNN, we have layers of computational units where each unit processes some visual information or feature from the input image and output of the layer will be feature maps containing all the extracted features. We have used VGG - 19 architecture in our implementation.

Source: https://medium.com/machine-learning-algorithms/image-style-transfer-740d08f8c1bd

Convolutional Neural Networks are used to train object recognition models. In this process, they develop a representation of the image that makes object information increasingly explicit along with the processing hierarchy. Higher layers in the network capture the high-level content in terms of objects. However, lower layers simply reproduce the exact pixel values of the original image. So, we choose a feature map from a higher layer of the network for content representation. Content is nothing but a macro-structure of the input image.

Using feature space designed for capturing texture information, we get a style representation of the input image. Visual pattern, textures, colors are the style of the input image. Content and style representation in CNN are separable but cannot be completely disentangled. Further, both of them can be manipulated separately to produce a new artistic or meaningful image.

Inspired by deep learning algorithms, we minimize loss function for content and style during synthesis of new images. The loss function is given by:

Here, α and β are user defined weighting parameters for reconstruction of content and style respectively. is photograph, is the artwork. The ratio of α /β can be in order of 10-1 to 10-5

The content and style loss functions are mean squared error between squared error between input image and the generated image. As discussed earlier, content loss is taken from the upper layer. However, the style layer loss is calculated over multiple layers of the network. The correlation of style loss is obtained by multiplying the feature map and its transpose, resulting in the gram matrix.

Image source: https://github.com/Adi-iitd/AI-Art

To identify the style of image we compare different layers with their correlations. So, we are using the feature map gram matrix of each layer to obtain an image style. taking difference of the gram matrices and then difference of original and generated image gives us final style cost. Below is basic architecture.

3 Implementation details

**4 Results and Discussion **

Taking a content image and a style1 image

Content and style images taken Image obtained every 5 iterations

Final Image Generated

Taking a content image and a style2 image

Image obtained every 5 iterations

Final Image Generated

5 References