pystiche logo

Welcome to pystiche ‘s documentation!

pystiche (pronounced /ˈpaɪˈstiʃ/ ) is a framework for Neural Style Transfer (NST) built upon PyTorch. The name of the project is a pun on pastiche meaning:

A pastiche is a work of visual art […] that imitates the style or character of the work of one or more other artists. Unlike parody, pastiche celebrates, rather than mocks, the work it imitates.

pystiche has similar goals as Deep Learning (DL) frameworks such as PyTorch:

  1. Accessibility

    Starting off with NST can be quite overwhelming due to the sheer amount of techniques one has to know and be able to deploy. pystiche aims to provide an easy-to-use interface that reduces the necessary prior knowledge about NST and DL to a minimum.

  2. Reproducibility

    Implementing NST from scratch is not only inconvenient but also error-prone. pystiche aims to provide reusable tools that let developers focus on their ideas rather than worrying about bugs in everything around it.

Getting started

Installation

The latest stable version can be installed with

pip install pystiche

The latest potentially unstable version can be installed with

pip install git+https://github.com/pmeier/pystiche@master

Installation of PyTorch

pystiche is built upon PyTorch and depends on torch and torchvision. By default, a pip install of pystiche tries to install the PyTorch distributions precompiled for the latest CUDA release. If you use another version or don’t have a CUDA-capable GPU, we encourage you to try light-the-torch for a convenient installation:

pip install light-the-torch
ltt install pystiche

Otherwise, please follow the official installation instructions of PyTorch for your setup before you install pystiche.

Note

While pystiche is designed to be fully functional without a GPU, most tasks require significantly more time to perform on a CPU.

Contributing

First and foremost: Thank you for your interest in pystiche s development! We appreciate all contributions be it code or something else.

Guide lines

pystiche uses the GitHub workflow . Below is small guide how to make your first contribution.

Note

The following guide assumes that git, python, and pip , are available on your system. If that is not the case, follow the official installation instructions.

Note

pystiche officially supports Python 3.6, 3.7, and 3.8. To ensure backwards compatibility, the development should happen on the minimum Python version, i. e. 3.6.

  1. Fork pystiche on GitHub

Navigate to pmeier/pystiche on GitHub and click the Fork button in the top right corner.

  1. Clone your fork to your local file system

Use git clone to get a local copy of pystiche s repository that you can work on:

$ PYSTICHE_ROOT="pystiche"
$ git clone "https://github.com/pmeier/pystiche.git" $PYSTICHE_ROOT
  1. Setup your development environment

$ cd $PYSTICHE_ROOT
$ virtualenv .venv --prompt="(pystiche) "
$ source .venv/bin/activate
$ pip install -r requirements-dev.txt
$ pre-commit install

Note

While pystiche s development requirements are fairly lightweight, it is still recommended to install them in a virtual environment rather than system wide. If you do not have virtualenv installed, you can do so by running pip install --user virtualenv.

  1. Create a branch for local development

Use git checkout to create local branch with a descriptive name:

$ PYSTICHE_BRANCH="my-awesome-feature-or-bug-fix"
$ git checkout -b $PYSTICHE_BRANCH

Now make your changes. Happy Coding!

  1. Use tox to run various checks

$ tox

Note

Running tox is equivalent to running

$ tox -e lint-style
$ tox -e lint-typing
$ tox -e tests-integration
$ tox -e tests-galleries
$ tox -e tests-docs

You can find details what the individual commands do below of this guide.

  1. Commit and push your changes

If all checks are passing you can commit your changes an push them to your fork:

$ git add .
$ git commit -m "Descriptive message of the changes made"
$ git push -u origin $PYSTICHE_BRANCH

Note

For larger changes, it is good practice to split them in multiple small commits rather than one large one. If you do that, make sure to run the test suite before every commit. Furthermore, use git push without any parameters for consecutive commits.

  1. Open a Pull request (PR)

  1. Navigate to pmeier/pystiche/pulls on GitHub and click on the green button “New pull request”.

  2. Click on “compare across forks” below the “Compare changes” headline.

  3. Select your fork for “head repository” and your branch for “compare” in the drop-down menus.

  4. Click the the green button “Create pull request”.

Note

If the time between the branch being pushed and the PR being opened is not too long, GitHub will offer you a yellow box after step 1. If you click the button, you can skip steps 2. and 3.

Note

Steps 1. to 3. only have to performed once. If you want to continue contributing, make sure to branch from the current master branch. You can use git pull

$ git checkout master
$ git pull origin
$ git checkout -b "my-second-awesome-feature-or-bug-fix"

If you forgot to do that or if since the creation of your branch many commits have been made to the master branch, simply rebase your branch on top of it.

$ git checkout master
$ git pull origin
$ git checkout "my-second-awesome-feature-or-bug-fix"
$ git rebase master

If you are contributing bug-fixes or documentation improvements, you can open a pull request (PR) without further discussion. If on the other hand you are planning to contribute new features, please open an issue and discuss the feature with us first.

Every PR is subjected to multiple automatic checks (continuous integration, CI) as well as a manual code review that it has to pass before it can be merged. The automatic checks are performed by tox. You can find details and instructions how to run the checks locally below.

Code format and linting

pystiche uses isort to sort the imports, black to format the code, and flake8 to enforce PEP8 compliance. To format and check the code style, run

cd $PYSTICHE_ROOT
source .venv/bin/activate
tox -e lint-style

Note

Amongst others, isort, black, and flake8 are run by pre-commit before every commit.

Furthermore, pystiche_papers is PEP561 compliant and checks the type annotations with mypy. To check the static typing, run

cd $PYSTICHE_ROOT
source .venv/bin/activate
tox -e lint-typing

For convenience, you can run all lint checks with

cd $PYSTICHE_ROOT
source .venv/bin/activate
tox -f lint

Test suite

pystiche uses pytest to run the test suite. You can run it locally with

cd $PYSTICHE_ROOT
source .venv/bin/activate
tox

Note

pystiche_papers adds the following custom options with the corresponding @pytest.mark.* decorators: - --skip-large-download: @pytest.mark.large_download - --skip-slow: @pytest.mark.slow - --run-flaky: @pytest.mark.flaky

Options prefixed with --skip are run by default and skipped if the option is given. Options prefixed with --run are skipped by default and run if the option is given.

These options are passed through tox if given after a -- flag. For example, the CI invokes the test suite with

cd $PYSTICHE_ROOT
source .venv/bin/activate
tox -- --skip-large-download

Documentation

To build the html documentation locally, run

cd $PYSTICHE_ROOT
source .venv/bin/activate
tox -e docs-html

To build the latex (PDF) documentation locally, run

cd $PYSTICHE_ROOT
source .venv/bin/activate
tox -e docs-latex

To build both, run

cd $PYSTICHE_ROOT
source .venv/bin/activate
tox -f docs

Note

Building the documentation triggers a sphinx gallery build by default for the example galleries. This which will take some time to complete. To get around this, pystiche offers two environment variables:

  • PYSTICHE_PLOT_GALLERY: If False, the code inside the galleries is not executed. See the official sphinx-gallery documentation for details. Defaults to True.

  • PYSTICHE_DOWNLOAD_GALLERY: If True, downloads pre-built galleries and uses them instead of rebuilding. For the master the galleries are at most six hours old. Defaults to False.

Both environment variables are evaluated with strtobool().

Gist

From a high viewpoint, Neural Style Transfer (NST) can be described with only three images and two symbols:

pystiche banner

Not only the quality of the results but also the underlying steps are comparable to the work of human artisans or craftsmen. Such a manual style transfer can be roughly divided into three steps:

  1. The content or motif of an image needs to be identified. That means one has to identify which parts of the image are essential and on the other hand which details can be discarded.

  2. The style of an image, such as color, shapes, brush strokes, needs to be identified. Usually that means one has to intensively study of the works of the original artist.

  3. The identified content and style have to be merged together. This can be the most difficult step, since it usually requires a lot of skill to match the style of another artist.

In principle an NST performs the same steps, albeit fully automatically. This is nothing new in the field of computational style transfers. What makes NST stand out is its generality: NST only needs a single arbitrary content and style image as input and thus “makes – for the first time – a generalized style transfer practicable.” [SID17].

The following sections provide the gist of how these three steps are performed with pystiche as part of an NST . Afterwards head over to the usage examples to see pystiche in action.

Perceptual loss

The identification of content and style are core elements of a Neural Style Transfer (NST). The agreement of the content and style of two images is measured with the content_loss and style_loss, respectively.

Operators

In pystiche these losses are implemented Loss s. Loss s are differentiated between two types: RegularizationLoss and ComparisonLoss. A RegularizationLoss works without any context while a ComparisonLoss compares two images. Furthermore, pystiche differentiates between two different domains an Loss can work on: PixelOperator and EncodingOperator . A PixelOperator operates directly on the input_image while an EncodingOperator encodes it first.

In total pystiche supports four archetypes:

Loss

Builtin examples

PixelRegularizationOperator

EncodingRegularizationOperator

PixelComparisonOperator

EncodingComparisonOperator

Multi-layer encoder

One of the main improvements of NST compared to traditional approaches is that the agreement is not measured in the pixel or a handcrafted feature space, but rather in the learned feature space of a Convolutional Neural Network called encoder. Especially variants of the style_loss depend upon encodings, i. e. feature maps, from various layers of the encoder.

pystiche offers a MultiLayerEncoder that enables to extract all required encodings after a single forward pass. If the same operator should be applied to different layers of a MultiLayerEncoder, a MultiLayerEncodingLoss can be used.

Perceptual loss

The PerceptualLoss combines all Operator s in a single measure acting as joint optimization criterion. How the optimization is performed will be detailed in the next section.

Optimization

The merging of the identified content and style with a Neural Style Transfer (NST) is posed as an optimization problem. The optimization is performed on the basis of a PerceptualLoss. A distinction is made between two different approaches.

Image optimization

In its basic form, an NST optimizes the pixels of the input_image directly. That means they are iteratively adapted to reduce the perceptual loss. This process is called image optimization and can be performed in pystiche with a image_optimization() .

Model optimization

While the image optimization approach yields the highest quality results, the computation is quite expensive and usually takes multiple minutes to complete for a single image. Model optimization on the other hand trains a model called transformer to perform the stylization. The training is performed with the same perceptual loss as before, but now the transformer weights are used as optimization parameters. The training is even more time consuming but afterwards the stylization is performed in a single forward pass of the input_image through the transformer. The quality however, while still high, is lower than for image optimisation approaches since the transformer cannot finetune the output_image. In pystiche a model optimization can be performed with a model_optimization() .

Note

Due to the differences in execution time image and model optimization approaches are often dubbed slow and fast respectively.

pystiche usage examples

Note

Although a GPU is not a requirement, it is strongly advised to run these examples with one. If you don’t have access to a GPU, the execution time of the examples might increase by multiple orders of magnitude. The total running time provided at the end of each example is measured using a GPU.

Beginner

Neural Style Transfer without pystiche

This example showcases how a basic Neural Style Transfer (NST), i.e. image-based optimization, could be performed without pystiche.

Note

This is an example how to implement an NST and not a tutorial on how NST works. As such, it will not explain why a specific choice was made or how a component works. If you have never worked with NST before, we strongly suggest you to read the Gist first.

Setup

We start this example by importing everything we need and setting the device we will be working on. torch and torchvision will be used for the actual NST. Furthermore, we use PIL.Image for the file input, and matplotlib.pyplot to show the images.

26 import itertools
27 import os.path
28 from collections import OrderedDict
29 from urllib.request import urlopen
30
31 import matplotlib.pyplot as plt
32 from PIL import Image
33 from tqdm.auto import tqdm
34
35 import torch
36 import torchvision
37 from torch import nn, optim
38 from torch.nn.functional import mse_loss
39 from torchvision import transforms
40 from torchvision.models import vgg19
41 from torchvision.transforms.functional import resize
42
43 print(f"I'm working with torch=={torch.__version__}")
44 print(f"I'm working with torchvision=={torchvision.__version__}")
45
46 device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
47 print(f"I'm working with {device}")

The core component of different NSTs is the perceptual loss, which is used as optimization criterion. The perceptual loss is usually, and also for this example, calculated on features maps also called encodings. These encodings are generated from different layers of a Convolutional Neural Net (CNN) also called encoder.

A common implementation strategy for the perceptual loss is to weave in transparent loss layers into the encoder. These loss layers are called transparent since from an outside view they simply pass the input through without alteration. Internally though, they calculate the loss with the encodings of the previous layer and store them in themselves. After the forward pass is completed the stored losses are aggregated and propagated backwards to the image. While this is simple to implement, this practice has two downsides:

  1. The calculated score is part of the current state but has to be stored inside the layer. This is generally not recommended.

  2. While the encoder is a part of the perceptual loss, it itself does not generate it. One should be able to use the same encoder with a different perceptual loss without modification.

Thus, this example (and pystiche) follows a different approach and separates the encoder and the perceptual loss into individual entities.

Multi-layer Encoder

In a first step we define a MultiLayerEncoder that should have the following properties:

  1. Given an image and a set of layers, the MultiLayerEncoder should return the encodings of every given layer.

  2. Since the encodings have to be generated in every optimization step they should be calculated in a single forward pass to keep the processing costs low.

  3. To reduce the static memory requirement, the MultiLayerEncoder should be trim mable in order to remove unused layers.

We achieve the main functionality by subclassing torch.nn.Sequential and define a custom forward method, i.e. different behavior if called. Besides the image it also takes an iterable layer_cfgs containing multiple sequences of layers. In the method body we first find the deepest_layer that was requested. Subsequently, we calculate and store all encodings of the image up to that layer. Finally we can return all requested encodings without processing the same layer twice.

 97 class MultiLayerEncoder(nn.Sequential):
 98     def forward(self, image, *layer_cfgs):
 99         storage = {}
100         deepest_layer = self._find_deepest_layer(*layer_cfgs)
101         for layer, module in self.named_children():
102             image = storage[layer] = module(image)
103             if layer == deepest_layer:
104                 break
105
106         return [[storage[layer] for layer in layers] for layers in layer_cfgs]
107
108     def children_names(self):
109         for name, module in self.named_children():
110             yield name
111
112     def _find_deepest_layer(self, *layer_cfgs):
113         # find all unique requested layers
114         req_layers = set(itertools.chain(*layer_cfgs))
115         try:
116             # find the deepest requested layer by indexing the layers within
117             # the multi layer encoder
118             children_names = list(self.children_names())
119             return sorted(req_layers, key=children_names.index)[-1]
120         except ValueError as error:
121             layer = str(error).split()[0]
122         raise ValueError(f"Layer {layer} is not part of the multi-layer encoder.")
123
124     def trim(self, *layer_cfgs):
125         deepest_layer = self._find_deepest_layer(*layer_cfgs)
126         children_names = list(self.children_names())
127         del self[children_names.index(deepest_layer) + 1 :]

The pretrained models the MultiLayerEncoder is based on are usually trained on preprocessed images. In PyTorch all models expect images are normalized by a per-channel mean = (0.485, 0.456, 0.406) and standard deviation (std = (0.229, 0.224, 0.225)). To include this into a, MultiLayerEncoder, we implement this as torch.nn.Module .

139 class Normalize(nn.Module):
140     def __init__(self, mean, std):
141         super().__init__()
142         self.register_buffer("mean", torch.tensor(mean).view(1, -1, 1, 1))
143         self.register_buffer("std", torch.tensor(std).view(1, -1, 1, 1))
144
145     def forward(self, image):
146         return (image - self.mean) / self.std
147
148
149 class TorchNormalize(Normalize):
150     def __init__(self):
151         super().__init__((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))

In a last step we need to specify the structure of the MultiLayerEncoder. In this example we use a VGGMultiLayerEncoder based on the VGG19 CNN introduced by Simonyan and Zisserman [SZ14].

We only include the feature extraction stage (vgg_net.features), i.e. the convolutional stage, since the classifier stage (vgg_net.classifier) only accepts feature maps of a single size.

For our convenience we rename the layers in the same scheme the authors used instead of keeping the consecutive index of a default torch.nn.Sequential. The first layer however is the TorchNormalize as defined above.

168 class VGGMultiLayerEncoder(MultiLayerEncoder):
169     def __init__(self, vgg_net):
170         modules = OrderedDict((("preprocessing", TorchNormalize()),))
171
172         block = depth = 1
173         for module in vgg_net.features.children():
174             if isinstance(module, nn.Conv2d):
175                 layer = f"conv{block}_{depth}"
176             elif isinstance(module, nn.BatchNorm2d):
177                 layer = f"bn{block}_{depth}"
178             elif isinstance(module, nn.ReLU):
179                 # without inplace=False the encodings of the previous layer would no
180                 # longer be accessible after the ReLU layer is executed
181                 module = nn.ReLU(inplace=False)
182                 layer = f"relu{block}_{depth}"
183                 # each ReLU layer increases the depth of the current block by one
184                 depth += 1
185             elif isinstance(module, nn.MaxPool2d):
186                 layer = f"pool{block}"
187                 # each max pooling layer marks the end of the current block
188                 block += 1
189                 depth = 1
190             else:
191                 msg = f"Type {type(module)} is not part of the VGG architecture."
192                 raise RuntimeError(msg)
193
194             modules[layer] = module
195
196         super().__init__(modules)
197
198
199 def vgg19_multi_layer_encoder():
200     return VGGMultiLayerEncoder(vgg19(pretrained=True))
201
202
203 multi_layer_encoder = vgg19_multi_layer_encoder().to(device)
204 print(multi_layer_encoder)
Perceptual Loss

In order to calculate the perceptual loss, i.e. the optimization criterion, we define a MultiLayerLoss to have a convenient interface. This will be subclassed later by the ContentLoss and StyleLoss.

If called with a sequence of ìnput_encs the MultiLayerLoss should calculate layerwise scores together with the corresponding target_encs. For that a MultiLayerLoss needs the ability to store the target_encs so that they can be reused for every call. The individual layer scores should be averaged by the number of encodings and finally weighted by a score_weight.

To achieve this we subclass torch.nn.Module . The target_encs are stored as buffers, since they are not trainable parameters. The actual functionality has to be defined in calculate_score by a subclass.

226 def mean(sized):
227     return sum(sized) / len(sized)
228
229
230 class MultiLayerLoss(nn.Module):
231     def __init__(self, score_weight=1e0):
232         super().__init__()
233         self.score_weight = score_weight
234         self._numel_target_encs = 0
235
236     def _target_enc_name(self, idx):
237         return f"_target_encs_{idx}"
238
239     def set_target_encs(self, target_encs):
240         self._numel_target_encs = len(target_encs)
241         for idx, enc in enumerate(target_encs):
242             self.register_buffer(self._target_enc_name(idx), enc.detach())
243
244     @property
245     def target_encs(self):
246         return tuple(
247             getattr(self, self._target_enc_name(idx))
248             for idx in range(self._numel_target_encs)
249         )
250
251     def forward(self, input_encs):
252         if len(input_encs) != self._numel_target_encs:
253             msg = (
254                 f"The number of given input encodings and stored target encodings "
255                 f"does not match: {len(input_encs)} != {self._numel_target_encs}"
256             )
257             raise RuntimeError(msg)
258
259         layer_losses = [
260             self.calculate_score(input, target)
261             for input, target in zip(input_encs, self.target_encs)
262         ]
263         return mean(layer_losses) * self.score_weight
264
265     def calculate_score(self, input, target):
266         raise NotImplementedError

In this example we use the feature_reconstruction_loss introduced by Mahendran and Vedaldi [MV15] as ContentLoss as well as the gram_loss introduced by Gatys, Ecker, and Bethge [GEB16] as StyleLoss.

275 def feature_reconstruction_loss(input, target):
276     return mse_loss(input, target)
277
278
279 class ContentLoss(MultiLayerLoss):
280     def calculate_score(self, input, target):
281         return feature_reconstruction_loss(input, target)
282
283
284 def channelwise_gram_matrix(x, normalize=True):
285     x = torch.flatten(x, 2)
286     G = torch.bmm(x, x.transpose(1, 2))
287     if normalize:
288         return G / x.size()[-1]
289     else:
290         return G
291
292
293 def gram_loss(input, target):
294     return mse_loss(channelwise_gram_matrix(input), channelwise_gram_matrix(target))
295
296
297 class StyleLoss(MultiLayerLoss):
298     def calculate_score(self, input, target):
299         return gram_loss(input, target)
Images

Before we can load the content and style image, we need to define some basic I/O utilities.

At import a fake batch dimension is added to the images to be able to pass it through the MultiLayerEncoder without further modification. This dimension is removed again upon export. Furthermore, all images will be resized to size=500 pixels.

313 import_from_pil = transforms.Compose(
314     (
315         transforms.ToTensor(),
316         transforms.Lambda(lambda x: x.unsqueeze(0)),
317         transforms.Lambda(lambda x: x.to(device)),
318     )
319 )
320
321 export_to_pil = transforms.Compose(
322     (
323         transforms.Lambda(lambda x: x.cpu()),
324         transforms.Lambda(lambda x: x.squeeze(0)),
325         transforms.Lambda(lambda x: x.clamp(0.0, 1.0)),
326         transforms.ToPILImage(),
327     )
328 )
329
330
331 def download_image(url):
332     file = os.path.abspath(os.path.basename(url))
333     with open(file, "wb") as fh, urlopen(url) as response:
334         fh.write(response.read())
335
336     return file
337
338
339 def read_image(file, size=500):
340     image = Image.open(file)
341     image = resize(image, size)
342     return import_from_pil(image)
343
344
345 def show_image(image, title=None):
346     _, ax = plt.subplots()
347     ax.axis("off")
348     if title is not None:
349         ax.set_title(title)
350
351     image = export_to_pil(image)
352     ax.imshow(image)

With the I/O utilities set up, we now download, read, and show the images that will be used in the NST.

Note

The images used in this example are licensed under the permissive Pixabay License .

367 content_url = "https://download.pystiche.org/images/bird1.jpg"
368 content_file = download_image(content_url)
369 content_image = read_image(content_file)
370 show_image(content_image, title="Content image")
375 style_url = "https://download.pystiche.org/images/paint.jpg"
376 style_file = download_image(style_url)
377 style_image = read_image(style_file)
378 show_image(style_image, title="Style image")
Neural Style Transfer

At first we chose the content_layers and style_layers on which the encodings are compared. With them we trim the multi_layer_encoder to remove unused layers that otherwise occupy memory.

Afterwards we calculate the target content and style encodings. The calculation is performed without a gradient since the gradient of the target encodings is not needed for the optimization.

393 content_layers = ("relu4_2",)
394 style_layers = ("relu1_1", "relu2_1", "relu3_1", "relu4_1", "relu5_1")
395
396 multi_layer_encoder.trim(content_layers, style_layers)
397
398 with torch.no_grad():
399     target_content_encs = multi_layer_encoder(content_image, content_layers)[0]
400     target_style_encs = multi_layer_encoder(style_image, style_layers)[0]

Next up, we instantiate the ContentLoss and StyleLoss with a corresponding weight. Afterwards we store the previously calculated target encodings.

407 content_weight = 1e0
408 content_loss = ContentLoss(score_weight=content_weight)
409 content_loss.set_target_encs(target_content_encs)
410
411 style_weight = 1e3
412 style_loss = StyleLoss(score_weight=style_weight)
413 style_loss.set_target_encs(target_style_encs)

We start NST from the content_image since this way it converges quickly.

419 input_image = content_image.clone()
420 show_image(input_image, "Input image")

Note

If you want to start from a white noise image instead use

input_image = torch.rand_like(content_image)

In a last preliminary step we create the optimizer that will be performing the NST. Since we want to adapt the pixels of the input_image directly, we pass it as optimization parameters.

438 optimizer = optim.LBFGS([input_image.requires_grad_(True)], max_iter=1)

Finally we run the NST. The loss calculation has to happen inside a closure since the LBFGS optimizer could need to reevaluate it multiple times per optimization step . This structure is also valid for all other optimizers.

447 num_steps = 500
448
449 with tqdm(desc="Image optimization", total=num_steps) as progress_bar:
450     for _ in range(num_steps):
451
452         def closure():
453             optimizer.zero_grad()
454
455             input_encs = multi_layer_encoder(input_image, content_layers, style_layers)
456             input_content_encs, input_style_encs = input_encs
457
458             content_score = content_loss(input_content_encs)
459             style_score = style_loss(input_style_encs)
460
461             perceptual_loss = content_score + style_score
462             perceptual_loss.backward()
463
464             progress_bar.set_postfix(
465                 loss=f"{float(perceptual_loss):.3e}", refresh=False
466             )
467             progress_bar.update()
468
469             return perceptual_loss
470
471         optimizer.step(closure)
472
473 output_image = input_image.detach()

After the NST we show the resulting image.

478 show_image(output_image, title="Output image")
Conclusion

As hopefully has become clear, an NST requires even in its simplest form quite a lot of utilities and boilerplate code. This makes it hard to maintain and keep bug free as it is easy to lose track of everything.

Judging by the lines of code one could (falsely) conclude that the actual NST is just an appendix. If you feel the same you can stop worrying now: in Neural Style Transfer with pystiche we showcase how to achieve the same result with pystiche.

Total running time of the script: ( 0 minutes 0.000 seconds)

Estimated memory usage: 0 MB

Gallery generated by Sphinx-Gallery

Neural Style Transfer with pystiche

This example showcases how a basic Neural Style Transfer (NST), i.e. image optimization, could be performed with pystiche.

Note

This is an example how to implement an NST and not a tutorial on how NST works. As such, it will not explain why a specific choice was made or how a component works. If you have never worked with NST before, we strongly suggest you to read the Gist first.

Setup

We start this example by importing everything we need and setting the device we will be working on.

23 import pystiche
24 from pystiche import demo, enc, loss, optim
25 from pystiche.image import show_image
26 from pystiche.misc import get_device, get_input_image
27
28 print(f"I'm working with pystiche=={pystiche.__version__}")
29
30 device = get_device()
31 print(f"I'm working with {device}")
Multi-layer Encoder

The content_loss and the style_loss operate on the encodings of an image rather than on the image itself. These encodings are generated by a pretrained encoder. Since we will be using encodings from multiple layers we load a multi-layer encoder. In this example we use the vgg19_multi_layer_encoder() that is based on the VGG19 architecture introduced by Simonyan and Zisserman [SZ14] .

44 multi_layer_encoder = enc.vgg19_multi_layer_encoder()
45 print(multi_layer_encoder)
Perceptual Loss

The core components of every NST are the content_loss and the style_loss. Combined they make up the perceptual loss, i.e. the optimization criterion.

In this example we use the FeatureReconstructionLoss introduced by Mahendran and Vedaldi [MV15] as content_loss. We first extract the content_encoder that generates encodings from the content_layer. Together with the content_weight we can construct the content_loss.

60 content_layer = "relu4_2"
61 content_encoder = multi_layer_encoder.extract_encoder(content_layer)
62 content_weight = 1e0
63 content_loss = loss.FeatureReconstructionLoss(
64     content_encoder, score_weight=content_weight
65 )
66 print(content_loss)

We use the GramLoss introduced by Gatys, Ecker, and Bethge [GEB16] as style_loss. Unlike before, we use multiple style_layers. The individual losses can be conveniently bundled in a MultiLayerEncodingLoss.

75 style_layers = ("relu1_1", "relu2_1", "relu3_1", "relu4_1", "relu5_1")
76 style_weight = 1e3
77
78
79 def get_style_op(encoder, layer_weight):
80     return loss.GramLoss(encoder, score_weight=layer_weight)
81
82
83 style_loss = loss.MultiLayerEncodingLoss(
84     multi_layer_encoder, style_layers, get_style_op, score_weight=style_weight,
85 )
86 print(style_loss)

We combine the content_loss and style_loss into a joined PerceptualLoss, which will serve as optimization criterion.

93 perceptual_loss = loss.PerceptualLoss(content_loss, style_loss).to(device)
94 print(perceptual_loss)
Images

We now load and show the images that will be used in the NST. The images will be resized to size=500 pixels.

104 images = demo.images()
105 images.download()
106 size = 500

Note

ìmages.download() downloads all demo images upfront. If you only want to download the images for this example remove this line. They will be downloaded at runtime instead.

Note

If you want to work with other images you can load them with read_image():

from pystiche.image import read_image

my_image = read_image("my_image.jpg", size=size, device=device)
130 content_image = images["bird1"].read(size=size, device=device)
131 show_image(content_image, title="Content image")
136 style_image = images["paint"].read(size=size, device=device)
137 show_image(style_image, title="Style image")
Neural Style Transfer

After loading the images they need to be set as targets for the optimization criterion.

147 perceptual_loss.set_content_image(content_image)
148 perceptual_loss.set_style_image(style_image)

As a last preliminary step we create the input image. We start from the content_image since this way the NST converges quickly.

155 starting_point = "content"
156 input_image = get_input_image(starting_point, content_image=content_image)
157 show_image(input_image, title="Input image")

Note

If you want to start from a white noise image instead use starting_point = "random" instead:

starting_point = "random"
input_image = get_input_image(starting_point, content_image=content_image)

Finally we run the NST with the image_optimization() for num_steps=500 steps.

In every step the perceptual_loss is calculated nd propagated backward to the pixels of the input_image. If get_optimizer is not specified, as is the case here, the default_image_optimizer(), i.e. LBFGS is used.

181 output_image = optim.image_optimization(input_image, perceptual_loss, num_steps=500)

After the NST is complete we show the result.

187 show_image(output_image, title="Output image")
Conclusion

If you started with the basic NST example without pystiche this example hopefully convinced you that pystiche is a helpful tool. But this was just the beginning: to unleash its full potential head over to the more advanced examples.

Total running time of the script: ( 0 minutes 0.000 seconds)

Estimated memory usage: 0 MB

Gallery generated by Sphinx-Gallery

Multi-layer Encoder

This example showcases the pystiche.enc.MultiLayerEncoder.

We start this example by importing everything we need.

12 import itertools
13 import time
14 from collections import OrderedDict
15 from math import floor, log10
16
17 import torch
18 from torch import nn
19 from torchvision import models
20
21 import pystiche
22 from pystiche import enc
23
24 print(f"I'm working with pystiche=={pystiche.__version__}")

In a second preliminary step we define some helper functions to ease the performance analysis later on.

31 SI_PREFIXES = {0: "", -3: "m", -6: "µ"}
32
33
34 def timeit(fn, times=10, cleanup=None):
35     total = 0.0
36     for _ in range(times):
37         start = time.time()
38         fn()
39         stop = time.time()
40         total += stop - start
41         if cleanup:
42             cleanup()
43     return total / times
44
45
46 def feng(num, unit, digits=3):
47     exp = int(floor(log10(num)))
48     exp -= exp % 3
49     sig = num * 10 ** -exp
50     prec = digits - len(str(int(sig)))
51     return f"{sig:.{prec}f} {SI_PREFIXES[exp]}{unit}"
52
53
54 def fsecs(seconds):
55     return feng(seconds, "s")
56
57
58 def ftimeit(fn, msg="The execution took {seconds}.", **kwargs):
59     return msg.format(seconds=fsecs(timeit(fn, **kwargs)))
60
61
62 def fdifftimeit(seq_fn, mle_fn, **kwargs):
63     time_seq = timeit(seq_fn, **kwargs)
64     time_mle = timeit(mle_fn, **kwargs)
65
66     abs_diff = time_mle - time_seq
67     rel_diff = abs_diff / time_seq
68
69     if abs_diff >= 0:
70         return (
71             f"Encoding the input with the enc.MultiLayerEncoder was "
72             f"{fsecs(abs_diff)} ({rel_diff:.0%}) slower."
73         )
74     else:
75         return "\n".join(
76             (
77                 "Due to the very rough timing method used here, ",
78                 "we detected a case where the encoding with the enc.MultiLayerEncoder ",
79                 "was actually faster than the boiler-plate nn.Sequential. ",
80                 "Since the enc.MultiLayerEncoder has some overhead, ",
81                 "this is a measuring error. ",
82                 "Still, this serves as indicator that the overhead is small enough, ",
83                 "to be well in the measuring tolerance.",
84             )
85         )

Next up, we define the device we will be testing on as well as the input dimensions.

Note

We encourage the user to play with these parameters and see how the results change. In order to do that, you can use the download buttons at the bottom of this page.

 96 device = torch.device("cpu")
 97
 98 batch_size = 32
 99 num_channels = 3
100 height = width = 512
101
102 input = torch.rand((batch_size, num_channels, height, width), device=device)

As a toy example to showcase the MultiLayerEncoder capabilities, we will use a CNN with three layers.

109 conv = nn.Conv2d(num_channels, num_channels, 3, padding=1)
110 relu = nn.ReLU(inplace=False)
111 pool = nn.MaxPool2d(2)
112
113 modules = [("conv", conv), ("relu", relu), ("pool", pool)]
114
115 seq = nn.Sequential(OrderedDict(modules)).to(device)
116 mle = enc.MultiLayerEncoder(modules).to(device)
117 print(mle)

Before we dive into the additional functionalities of the MultiLayerEncoder we perform a smoke test and assert that it indeed does the same as an torch.nn.Sequential with the same layers.

125 assert torch.allclose(mle(input), seq(input))
126 print(fdifftimeit(lambda: seq(input), lambda: mle(input)))

As we saw, the MultiLayerEncoder produces the same output as an torch.nn.Sequential but is slower. In the following we will learn what other functionalities a MultiLayerEncoder has to offer that justify this overhead.

Intermediate feature maps

By calling the multi-layer encoder with a layer name in addition to the input, the intermediate layers of the MultiLayerEncoder can be accessed. This is helpful if one needs the feature maps from different layers of a model, as is often the case during an NST.

143 assert torch.allclose(mle(input, "conv"), conv(input))
144 assert torch.allclose(mle(input, "relu"), relu(conv(input)))
145 assert torch.allclose(mle(input, "pool"), pool(relu(conv(input))))

For convenience, one can extract a pystiche.enc.SingleLayerEncoder as an interface to the multi-layer encoder for a specific layer.

152 sle = mle.extract_encoder("conv")
153 assert torch.allclose(sle(input), conv(input))
Caching

If the access intermediate feature maps is necessary, as is usually the case in an NST, it is important to only compute every layer once.

A MultiLayerEncoder() enables this functionality by caching already computed feature maps. Thus, after an input is cached, retrieving it is a constant time lookup

In order to enable caching for a layer, it has to be registered first.

Note

The internal cache will be automatically cleared during the backward pass. Since we don’t perform that here, we need to clear it manually by calling clear_cache()

Note

extract_encoder() automatically registers the layer for caching.

180 shallow_layers = ("conv", "relu")
181 for layer in shallow_layers:
182     mle.register_layer(layer)
183
184 mle(input)
185
186 for layer in shallow_layers:
187     print(
188         ftimeit(
189             lambda: mle(input, layer),
190             (
191                 f"After the forward pass was completed once for the input, "
192                 f"extracting the encoding of the intermediate layer '{layer}' "
193                 f"took {{seconds}}."
194             ),
195         )
196     )
197
198 mle.clear_cache()

Due to this caching, it doesn’t matter in which order the feature maps are requested:

  1. If a shallow layer is requested before a deeper one, the encoding is later resumed from the feature map of the shallow layer.

  2. If a deep layer is requested before a more shallow one, the feature map of the shallow one is cached while computing the deep layer.

210 def fn(layers):
211     for layer in layers:
212         mle(input, layer)
213
214
215 for permutation in itertools.permutations(("conv", "relu", "pool")):
216     order = f"""'{"' -> '".join(permutation)}'"""
217     print(
218         ftimeit(
219             lambda: fn(permutation),
220             f"The encoding of layers {order} took {{seconds}}.",
221             cleanup=mle.clear_cache,
222         )
223     )
Real-world example

Up to this point we used a toy example to demonstrate the capabilities of a MultiLayerEncoder. In addition to the boiler-plate MultiLayerEncoder, pystiche has builtin implementations of some well-known CNN architectures that are commonly used in NST papers.

Note

By default, vgg19_multi_layer_encoder() loads weights provided by torchvision. We disable this here since we load the randomly initilaized weights of the torchvision model to enable a comparison.

Note

By default, vgg19_multi_layer_encoder() adds an internal_preprocessing so that the user can simply pass the image as is, without worrying about it. We disable this here to enable a comparison.

Note

By default, vgg19_multi_layer_encoder() disallows in-place operations since after they are carried out, the previous encoding is no longer accessible. In order to enable a fair performance comparison, we allow them here, since they are also used in vgg19().

Note

The fully connected stage of the original VGG19 architecture requires the input to be exactly 224 pixels wide and high [SZ14]. Since this requirement can usually not be met in an NST, the builtin multi-layer encoder only comprises the size invariant convolutional stage. Thus, we only use vgg19().features to enable a comparison.

262 seq = models.vgg19()
263 mle = enc.vgg19_multi_layer_encoder(
264     pretrained=False, internal_preprocessing=False, allow_inplace=True
265 )
266 mle.load_state_dict(seq.state_dict())
267
268 input = torch.rand((4, 3, 256, 256), device=device)
269
270 assert torch.allclose(mle(input), seq.features(input))
271 print(fdifftimeit(lambda: seq.features(input), lambda: mle(input)))

Total running time of the script: ( 0 minutes 0.000 seconds)

Estimated memory usage: 0 MB

Gallery generated by Sphinx-Gallery

Advanced

Guided image optimization

This example showcases how a guided, i.e. regionally constraint, NST can be performed in pystiche.

Usually, the style_loss discards spatial information since the style elements should be able to be synthesized regardless of their position in the style_image. Especially for images with clear separated regions style elements might leak into regions where they fit well with respect to the perceptual loss, but don’t belong for a human observer. This can be overcome with spatial constraints also called guides ([GEB+17]).

We start this example by importing everything we need and setting the device we will be working on.

21 import pystiche
22 from pystiche import demo, enc, loss, optim
23 from pystiche.image import guides_to_segmentation, show_image
24 from pystiche.misc import get_device, get_input_image
25
26 print(f"I'm working with pystiche=={pystiche.__version__}")
27
28 device = get_device()
29 print(f"I'm working with {device}")

In a first step we load and show the images that will be used in the NST.

35 images = demo.images()
36 images.download()
37 size = 500
41 content_image = images["castle"].read(size=size, device=device)
42 show_image(content_image)
47 style_image = images["church"].read(size=size, device=device)
48 show_image(style_image)
Unguided image optimization

As a baseline we use a default NST with a FeatureReconstructionLoss as content_loss and GramLoss as style_loss.

59 multi_layer_encoder = enc.vgg19_multi_layer_encoder()
60
61 content_layer = "relu4_2"
62 content_encoder = multi_layer_encoder.extract_encoder(content_layer)
63 content_weight = 1e0
64 content_loss = loss.FeatureReconstructionLoss(
65     content_encoder, score_weight=content_weight
66 )
67
68 style_layers = ("relu1_1", "relu2_1", "relu3_1", "relu4_1", "relu5_1")
69 style_weight = 1e4
70
71
72 def get_style_op(encoder, layer_weight):
73     return loss.GramLoss(encoder, score_weight=layer_weight)
74
75
76 style_loss = loss.MultiLayerEncodingLoss(
77     multi_layer_encoder, style_layers, get_style_op, score_weight=style_weight,
78 )
79
80
81 perceptual_loss = loss.PerceptualLoss(content_loss, style_loss).to(device)
82 print(perceptual_loss)

We set the target images for the optimization criterion.

88 perceptual_loss.set_content_image(content_image)
89 perceptual_loss.set_style_image(style_image)

We perform the unguided NST and show the result.

95 starting_point = "content"
96 input_image = get_input_image(starting_point, content_image=content_image)
97
98 output_image = optim.image_optimization(input_image, perceptual_loss, num_steps=500)
103 show_image(output_image)

While the result is not completely unreasonable, the building has a strong blueish cast that looks unnatural. Since the optimization was unconstrained the color of the sky was used for the building. In the remainder of this example we will solve this by dividing the images in multiple separate regions.

Guided image optimization

For both the content_image and style_image we load regional guides and show them.

Note

In pystiche a guide is a binary image in which the white pixels make up the region that is guided. Multiple guides can be combined into a segmentation for a better overview. In a segmentation the regions are separated by color. You can use guides_to_segmentation() and segmentation_to_guides() to convert one format to the other.

Note

The guides used within this example were created manually. It is possible to generate them automatically [CZP+18], but this is outside the scope of pystiche.

134 content_guides = images["castle"].guides.read(size=size, device=device)
135 content_segmentation = guides_to_segmentation(content_guides)
136 show_image(content_segmentation, title="Content segmentation")
141 style_guides = images["church"].guides.read(size=size, device=device)
142 style_segmentation = guides_to_segmentation(style_guides)
143 show_image(style_segmentation, title="Style segmentation")

The content_image is separated in three regions: the "building", the "sky", and the "water".

Note

Since no water is present in the style image we reuse the "sky" for the "water" region.

154 regions = ("building", "sky", "water")
155
156 style_guides["water"] = style_guides["sky"]

Since the stylization should be performed for each region individually, we also need separate losses. Within each region we use the same setup as before. Similar to how a MultiLayerEncodingLoss bundles multiple operators acting on different layers a MultiRegionLoss bundles multiple losses acting in different regions.

The guiding is only needed for the style_loss since the content_loss by definition honors the position of the content during the optimization. Thus, the previously defined content_loss is combined with the new regional style_loss.

171 def get_region_op(region, region_weight):
172     return loss.MultiLayerEncodingLoss(
173         multi_layer_encoder, style_layers, get_style_op, score_weight=region_weight,
174     )
175
176
177 style_loss = loss.MultiRegionLoss(regions, get_region_op, score_weight=style_weight)
178
179 perceptual_loss = loss.PerceptualLoss(content_loss, style_loss).to(device)
180 print(perceptual_loss)

The content_loss is unguided and thus the content image can be set as we did before. For the style_loss we use the same style_image for all regions and only vary the guides.

188 perceptual_loss.set_content_image(content_image)
189
190 for region in regions:
191     perceptual_loss.set_style_image(
192         style_image, guide=style_guides[region], region=region
193     )
194     perceptual_loss.set_content_guide(content_guides[region], region=region)

We rerun the optimization with the new constrained optimization criterion and show the result.

201 starting_point = "content"
202 input_image = get_input_image(starting_point, content_image=content_image)
203
204 output_image = optim.image_optimization(input_image, perceptual_loss, num_steps=500)
209 show_image(output_image)

With regional constraints we successfully removed the blueish cast from the building which leads to an overall higher quality. Unfortunately, reusing the sky region for the water did not work out too well: due to the vibrant color, the water looks unnatural.

Fortunately, this has an easy solution. Since we are already using separate losses for each region we are not bound to use only a single style_image: if required, we can use a different style_image for each region.

Guided image optimization with multiple styles

We load a second style image that has water in it.

229 second_style_image = images["cliff"].read(size=size, device=device)
230 show_image(second_style_image, "Second style image")
234 second_style_guides = images["cliff"].guides.read(size=size, device=device)
235 show_image(guides_to_segmentation(second_style_guides), "Second style segmentation")

We can reuse the previously defined criterion and only change the style_image and style_guides in the "water" region.

242 region = "water"
243 perceptual_loss.set_style_image(
244     second_style_image, guide=second_style_guides[region], region=region
245 )

Finally, we rerun the optimization again with the new constraints.

251 starting_point = "content"
252 input_image = get_input_image(starting_point, content_image=content_image)
253
254 output_image = optim.image_optimization(input_image, perceptual_loss, num_steps=500)
259 show_image(output_image)

Compared to the two previous results we now achieved the highest quality. Nevertheless, This approach has its downsides : since we are working with multiple images in multiple distinct regions, the memory requirement is higher compared to the other approaches. Furthermore, compared to the unguided NST, the guides have to be provided together with the for the content and style images.

Total running time of the script: ( 0 minutes 0.000 seconds)

Estimated memory usage: 0 MB

Gallery generated by Sphinx-Gallery

Image optimization with pyramid

This example showcases how an image pyramid is integrated in an NST with pystiche.

With an image pyramid the optimization is not performed on a single but rather on multiple increasing resolutions. This procedure is often dubbed coarse-to-fine, since on the lower resolutions coarse structures are synthesized whereas on the higher levels the details are carved out.

This technique has the potential to reduce the convergence time as well as to enhance the overall result [LW16][GEB+17].

We start this example by importing everything we need and setting the device we will be working on.

23 import time
24
25 import pystiche
26 from pystiche import demo, enc, loss, optim, pyramid
27 from pystiche.image import show_image
28 from pystiche.misc import get_device, get_input_image
29
30 print(f"I'm working with pystiche=={pystiche.__version__}")
31
32 device = get_device()
33 print(f"I'm working with {device}")

At first we define a PerceptualLoss that is used as optimization criterion.

40 multi_layer_encoder = enc.vgg19_multi_layer_encoder()
41
42
43 content_layer = "relu4_2"
44 content_encoder = multi_layer_encoder.extract_encoder(content_layer)
45 content_weight = 1e0
46 content_loss = loss.FeatureReconstructionLoss(
47     content_encoder, score_weight=content_weight
48 )
49
50
51 style_layers = ("relu3_1", "relu4_1")
52 style_weight = 2e0
53
54
55 def get_style_op(encoder, layer_weight):
56     return loss.MRFLoss(encoder, patch_size=3, stride=2, score_weight=layer_weight)
57
58
59 style_loss = loss.MultiLayerEncodingLoss(
60     multi_layer_encoder, style_layers, get_style_op, score_weight=style_weight,
61 )
62
63 perceptual_loss = loss.PerceptualLoss(content_loss, style_loss).to(device)
64 print(perceptual_loss)

Next up, we load and show the images that will be used in the NST.

70 images = demo.images()
71 images.download()
72 size = 500
77 content_image = images["bird2"].read(size=size, device=device)
78 show_image(content_image, title="Content image")
79 perceptual_loss.set_content_image(content_image)
84 style_image = images["mosaic"].read(size=size, device=device)
85 show_image(style_image, title="Style image")
86 perceptual_loss.set_style_image(style_image)
Image optimization without pyramid

As a baseline we use a standard image optimization without pyramid.

95 starting_point = "content"
96 input_image = get_input_image(starting_point, content_image=content_image)
97 show_image(input_image, title="Input image")

We time the NST performed by image_optimization() and show the result.

104 start_without_pyramid = time.time()
105 output_image = optim.image_optimization(input_image, perceptual_loss, num_steps=400)
106 stop_without_pyramid = time.time()
107
108 show_image(output_image, title="Output image without pyramid")
112 elapsed_time_without_pyramid = stop_without_pyramid - start_without_pyramid
113 print(
114     f"Without pyramid the optimization took {elapsed_time_without_pyramid:.0f} seconds."
115 )

As you can see the small blurry branches on the left side of the image were picked up by the style transfer. They distort the mosaic pattern, which minders the quality of the result. In the next section we tackle this by focusing on coarse elements first and add the details afterwards.

Image optimization with pyramid

Opposed to the prior examples we now want to perform an NST on multiple resolutions. In pystiche this handled by an ImagePyramid . The resolutions are selected by specifying the edge_sizes of the images on each level . The optimization is performed for num_steps on the different levels.

The resizing of all images, i.e. input_image and target images (content_image and style_image) is handled by the pyramid. For that we need to register the perceptual loss (criterion) as one of the resize_targets.

Note

By default the edge_sizes correspond to the shorter edge of the images. To change that you can pass edge="long". For fine-grained control you can also pass a sequence comprising "short" and "long" to select the edge for each level separately. Its length has to match the length of edge_sizes.

Note

For a fine-grained control over the number of steps on each level you can pass a sequence to select the num_steps for each level separately. Its length has to match the length of edge_sizes.

150 edge_sizes = (250, 500)
151 num_steps = 200
152 image_pyramid = pyramid.ImagePyramid(
153     edge_sizes, num_steps, resize_targets=(perceptual_loss,)
154 )
155 print(image_pyramid)

With a pyramid the NST is performed by pyramid_image_optimization(). We time the execution and show the result afterwards.

Note

We regenerate the input_image since it was changed inplace during the first optimization.

168 input_image = get_input_image(starting_point, content_image=content_image)
169
170 start_with_pyramid = time.time()
171 output_image = optim.pyramid_image_optimization(
172     input_image, perceptual_loss, image_pyramid
173 )
174 stop_with_pyramid = time.time()
175
176 show_image(output_image, title="Output image with pyramid")
182 elapsed_time_with_pyramid = stop_with_pyramid - start_with_pyramid
183 relative_decrease = 1.0 - elapsed_time_with_pyramid / elapsed_time_without_pyramid
184 print(
185     f"With pyramid the optimization took {elapsed_time_with_pyramid:.0f} seconds. "
186     f"This is a {relative_decrease:.0%} decrease."
187 )

With the coarse-to-fine architecture of the image pyramid, the stylization of the blurry background branches is reduced leaving the mosaic pattern mostly intact. On top of this quality improvement the execution time is significantly lower while performing the same number of steps.

Total running time of the script: ( 0 minutes 0.000 seconds)

Estimated memory usage: 0 MB

Gallery generated by Sphinx-Gallery

Model optimization

This example showcases how an NST based on model optimization can be performed in pystiche. It closely follows the official PyTorch example which in turn is based on [JAL16].

We start this example by importing everything we need and setting the device we will be working on.

16 import contextlib
17 import os
18 import time
19 from collections import OrderedDict
20 from os import path
21
22 import torch
23 from torch import hub, nn
24 from torch.nn.functional import interpolate
25 from torch.utils.data import DataLoader
26 from torchvision import transforms
27
28 import pystiche
29 from pystiche import demo, enc, loss, optim
30 from pystiche.data import ImageFolderDataset
31 from pystiche.image import show_image
32 from pystiche.misc import get_device
33
34 print(f"I'm working with pystiche=={pystiche.__version__}")
35
36 device = get_device()
37 print(f"I'm working with {device}")
Transformer

In contrast to image optimization, for model optimization we need to define a transformer that, after it is trained, performs the stylization. In general different architectures are possible ([JAL16][ULVL16]). For this example we use an encoder-decoder architecture.

Before we define the transformer, we create some helper modules to reduce the clutter.

In the decoder we need to upsample the image. While it is possible to achieve this with a ConvTranspose2d, it was found that traditional upsampling followed by a standard convolution produces fewer artifacts. Thus, we create an module that wraps torch.nn.functional.interpolate().

60 class Interpolate(nn.Module):
61     def __init__(self, scale_factor=1.0, mode="nearest"):
62         super().__init__()
63         self.scale_factor = scale_factor
64         self.mode = mode
65
66     def forward(self, input):
67         return interpolate(input, scale_factor=self.scale_factor, mode=self.mode,)
68
69     def extra_repr(self):
70         extras = []
71         if self.scale_factor:
72             extras.append(f"scale_factor={self.scale_factor}")
73         if self.mode != "nearest":
74             extras.append(f"mode={self.mode}")
75         return ", ".join(extras)

For the transformer architecture we will be using, we need to define a convolution module with some additional capabilities. In particular, it needs to be able to

  • optionally upsample the input,

  • pad the input in order for the convolution to be size-preserving,

  • optionally normalize the output, and

  • optionally pass the output through an activation function.

Note

Instead of BatchNorm2d we use InstanceNorm2d to normalize the output since it gives better results for NST [UVL16].

 93 class Conv(nn.Module):
 94     def __init__(
 95         self,
 96         in_channels,
 97         out_channels,
 98         kernel_size,
 99         stride=1,
100         upsample=False,
101         norm=True,
102         activation=True,
103     ):
104         super().__init__()
105         self.upsample = Interpolate(scale_factor=stride) if upsample else None
106         self.pad = nn.ReflectionPad2d(kernel_size // 2)
107         self.conv = nn.Conv2d(
108             in_channels, out_channels, kernel_size, stride=1 if upsample else stride
109         )
110         self.norm = nn.InstanceNorm2d(out_channels, affine=True) if norm else None
111         self.activation = nn.ReLU() if activation else None
112
113     def forward(self, input):
114         if self.upsample:
115             input = self.upsample(input)
116
117         output = self.conv(self.pad(input))
118
119         if self.norm:
120             output = self.norm(output)
121         if self.activation:
122             output = self.activation(output)
123
124         return output

It is common practice to append a few residual blocks after the initial convolutions to the encoder to enable it to learn more descriptive features.

132 class Residual(nn.Module):
133     def __init__(self, channels):
134         super().__init__()
135         self.conv1 = Conv(channels, channels, kernel_size=3)
136         self.conv2 = Conv(channels, channels, kernel_size=3, activation=False)
137
138     def forward(self, input):
139         output = self.conv2(self.conv1(input))
140         return output + input

It can be useful for the training to transform the input into another value range, for example from \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} \closedinterval{0}{1}\) to \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} \closedinterval{0}{255}\).

148 class FloatToUint8Range(nn.Module):
149     def forward(self, input):
150         return input * 255.0
151
152
153 class Uint8ToFloatRange(nn.Module):
154     def forward(self, input):
155         return input / 255.0

Finally, we can put all pieces together.

Note

You can access this transformer through pystiche.demo.transformer().

166 class Transformer(nn.Module):
167     def __init__(self):
168         super().__init__()
169         self.encoder = nn.Sequential(
170             Conv(3, 32, kernel_size=9),
171             Conv(32, 64, kernel_size=3, stride=2),
172             Conv(64, 128, kernel_size=3, stride=2),
173             Residual(128),
174             Residual(128),
175             Residual(128),
176             Residual(128),
177             Residual(128),
178         )
179         self.decoder = nn.Sequential(
180             Conv(128, 64, kernel_size=3, stride=2, upsample=True),
181             Conv(64, 32, kernel_size=3, stride=2, upsample=True),
182             Conv(32, 3, kernel_size=9, norm=False, activation=False),
183         )
184
185         self.preprocessor = FloatToUint8Range()
186         self.postprocessor = Uint8ToFloatRange()
187
188     def forward(self, input):
189         input = self.preprocessor(input)
190         output = self.decoder(self.encoder(input))
191         return self.postprocessor(output)
192
193
194 transformer = Transformer().to(device)
195 print(transformer)
Perceptual loss

Although model optimization is a different paradigm, the perceptual_loss is the same as for image optimization.

Note

In some implementations, such as the PyTorch example and [JAL16], one can observe that the gram_matrix(), used as style representation, is not only normalized by the height and width of the feature map, but also by the number of channels. If used together with a mse_loss(), the normalization is performed twice. While this is unintended, it affects the training. In order to keep the other hyper parameters on par with the PyTorch example, we also adopt this change here.

215 multi_layer_encoder = enc.vgg16_multi_layer_encoder()
216
217 content_layer = "relu2_2"
218 content_encoder = multi_layer_encoder.extract_encoder(content_layer)
219 content_weight = 1e5
220 content_loss = loss.FeatureReconstructionLoss(
221     content_encoder, score_weight=content_weight
222 )
223
224
225 class GramOperator(loss.GramLoss):
226     def enc_to_repr(self, enc: torch.Tensor) -> torch.Tensor:
227         repr = super().enc_to_repr(enc)
228         num_channels = repr.size()[1]
229         return repr / num_channels
230
231
232 style_layers = ("relu1_2", "relu2_2", "relu3_3", "relu4_3")
233 style_weight = 1e10
234 style_loss = loss.MultiLayerEncodingLoss(
235     multi_layer_encoder,
236     style_layers,
237     lambda encoder, layer_weight: GramOperator(encoder, score_weight=layer_weight),
238     layer_weights="sum",
239     score_weight=style_weight,
240 )
241
242 perceptual_loss = loss.PerceptualLoss(content_loss, style_loss)
243 perceptual_loss = perceptual_loss.to(device)
244 print(perceptual_loss)
Training

In a first step we load the style image that will be used to train the transformer.

254 images = demo.images()
255 size = 500
256
257 style_image = images["paint"].read(size=size, device=device)
258 show_image(style_image)

The training of the transformer is performed similar to other models in PyTorch. In every optimization step a batch of content images is drawn from a dataset, which serve as input for the transformer as well as content_image for the perceptual_loss. While the style_image only has to be set once, the content_image has to be reset in every iteration step.

While this can be done with a boilerplate optimization loop, pystiche provides multi_epoch_model_optimization() that handles the above for you.

Note

If the perceptual_loss is a PerceptualLoss, as is the case here, the update of the content_image is performed automatically. If that is not the case or you need more complex update behavior, you need to specify a criterion_update_fn.

Note

If you do not specify an optimizer, the default_model_optimizer(), i.e. Adam is used.

285 def train(*, transformer, root, batch_size, epochs, image_size):
286     if root is None:
287         raise RuntimeError("You forgot to define a root image directory.")
288
289     transform = nn.Sequential(
290         transforms.Resize(image_size), transforms.CenterCrop(image_size),
291     )
292     dataset = ImageFolderDataset(root, transform=transform)
293     image_loader = DataLoader(dataset, batch_size=batch_size)
294
295     perceptual_loss.set_style_image(style_image)
296
297     return optim.multi_epoch_model_optimization(
298         image_loader, transformer.train(), perceptual_loss, epochs=epochs,
299     )

Depending on the dataset and your setup the training can take a couple of hours. To avoid this, we provide transformer weights that were trained with the scheme above.

307 def download():
308     # Unfortunately, torch.hub.load_state_dict_from_url has no option to disable
309     # printing the downloading process. Since this would clutter the output, we
310     # suppress it completely.
311     @contextlib.contextmanager
312     def suppress_output():
313         with open(os.devnull, "w") as devnull:
314             with contextlib.redirect_stdout(devnull), contextlib.redirect_stderr(
315                 devnull
316             ):
317                 yield
318
319     with suppress_output():
320         return hub.load_state_dict_from_url(
321             "https://download.pystiche.org/models/example_transformer.pth"
322         )

Note

The weights of the provided transformer were trained with the 2014 training images of the COCO dataset. The training was performed for num_epochs=2 and batch_size=4. Each image was center-cropped to 256 x 256 pixels.

Note

If you want to perform the training yourself, set root to a location of a folder of images.

339 root = None
340 checkpoint = "example_transformer.pth"
341
342 if root is None:
343     state_dict = torch.load(checkpoint) if path.exists(checkpoint) else download()
344     transformer.load_state_dict(state_dict)
345 else:
346     transformer = train(
347         transformer=transformer, root=root, batch_size=4, epochs=2, image_size=256,
348     )
349     state_dict = OrderedDict(
350         [
351             (name, parameter.detach().cpu())
352             for name, parameter in transformer.state_dict().items()
353         ]
354     )
355     torch.save(state_dict, checkpoint)
Neural Style Transfer

In order to perform the NST, we load an image we want to stylize.

364 input_image = images["bird1"].read(size=size, device=device)
365 show_image(input_image)

After the transformer is trained we can now perform an NST with a single forward pass. To do this, the transformer is simply called with the input_image.

372 transformer.eval()
373
374 start = time.time()
375
376 with torch.no_grad():
377     output_image = transformer(input_image)
378
379 stop = time.time()
380
381 show_image(output_image, title="Output image")

Compared to NST via image optimization, the stylization is performed multiple orders of magnitudes faster. Given capable hardware, NST via model optimization enables real-time stylization for example of a video feed.

390 print(f"The stylization took {(stop - start) * 1e3:.0f} milliseconds.")

Total running time of the script: ( 0 minutes 0.000 seconds)

Estimated memory usage: 0 MB

Gallery generated by Sphinx-Gallery

Gallery generated by Sphinx-Gallery

Command-line interface

For simple tasks pystiche provides a command-line interface (CLI). For example, a simple NST with the builtin demo images can be performed with:

$pystiche bird1 paint

In version

$pystiche --version
1.1.0.dev50+g71217c2

the CLI looks like this:

$pystiche --help
usage: pystiche [-h] [-V] [-v] [-o OUTPUT_IMAGE] [-n NUM_STEPS] [-d DEVICE]
                [-s STARTING_POINT]
                [--multi-layer-encoder MULTI_LAYER_ENCODER]
                [--content-loss CONTENT_LOSS]
                [--content-layers CONTENT_LAYERS]
                [--content-size CONTENT_SIZE]
                [--content-weight CONTENT_WEIGHT] [--style-loss STYLE_LOSS]
                [--style-layers STYLE_LAYERS] [--style-size STYLE_SIZE]
                [--style-weight STYLE_WEIGHT]
                content_image style_image

Performs simple Neural Style Transfers (NSTs) with image optimization. For
more complex tasks, have a look at the Python API at
https://docs.pystiche.org.

positional arguments:
  content_image         Image containing the content for the style transfer.
                        Can be a path to an image file or the name of a
                        pystiche.demo image.
  style_image           Image containing the style for the style transfer. Can
                        be a path to an image file or the name of a
                        pystiche.demo image.

optional arguments:
  -h, --help            Show this message and exit.
  -V, --version         Show pystiche's version and exit.
  -v, --verbose         Print additional information to STDOUT.
  -o OUTPUT_IMAGE, --output-image OUTPUT_IMAGE
                        Path, the output image will be saved to. If omitted,
                        the output image will be saved to
                        'pystiche_{timestamp}.jpg' in the current directory.
  -n NUM_STEPS, --num-steps NUM_STEPS
                        Number of optimization steps. Defaults to '500'.
  -d DEVICE, --device DEVICE
                        Device, the optimization is performed on. If
                        available, defaults to 'cuda' and falls back to 'cpu'
                        otherwise.
  -s STARTING_POINT, --starting-point STARTING_POINT
                        Starting point of the optimization. Can be 'content'
                        (default) to start from the content image, 'random' to
                        start from a white noise image, a path to an image
                        file, or a name of a pystiche.demo image.
  --multi-layer-encoder MULTI_LAYER_ENCODER, --mle MULTI_LAYER_ENCODER
                        Can be any pretrained multi-layer encoder from
                        pystiche.enc, e.g. 'vgg19' (default) or 'alexnet'.

Content options:
  --content-loss CONTENT_LOSS, --cl CONTENT_LOSS
                        Can be any comparison loss from pystiche.loss, e.g.
                        'FeatureReconstruction' (default).
  --content-layers CONTENT_LAYERS, --cla CONTENT_LAYERS
                        Layers of the MULTI_LAYER_ENCODER used to encode the
                        content representation, e.g. 'relu4_2' (default).
                        Multiple layers can be given as a comma separated
                        list.
  --content-size CONTENT_SIZE, --cs CONTENT_SIZE
                        Size in pixels the CONTENT_IMAGE will be resized to
                        before the optimization, e.g. '500' (default).
  --content-weight CONTENT_WEIGHT, --cw CONTENT_WEIGHT
                        Optimization weight for the CONTENT_LOSS, e.g. '1e0'
                        (default). Higher values lead to more focus on the
                        content.

Style options:
  --style-loss STYLE_LOSS, --sl STYLE_LOSS
                        Can be any comparison loss from pystiche.loss, e.g.
                        'Gram' (default) or 'MRF'.
  --style-layers STYLE_LAYERS, --sla STYLE_LAYERS
                        Layers of the MULTI_LAYER_ENCODER used to encode the
                        content representation. Multiple layers can be given
                        as a comma separated list, e.g.
                        'relu1_1,relu2_1,relu3_1,relu4_1,relu5_1' (default).
  --style-size STYLE_SIZE, --ss STYLE_SIZE
                        Size in pixels the STYLE_IMAGE will be resized to
                        before the optimization, e.g. '500' (default).
  --style-weight STYLE_WEIGHT, --sw STYLE_WEIGHT
                        Optimization weight for the STYLE_LOSS, e.g. '1e3'
                        (default). Higher values lead to more focus on the
                        style.

Package reference

pystiche

pystiche.home()

Local directory to save downloaded images and guides. Defaults to ~/.cache/pystiche but can be overwritten with the PYSTICHE_HOME environment variable.

Return type

str

Objects

class pystiche.ComplexObject

Object with a complex representation. See pystiche.misc.build_complex_obj_repr() for details.

_named_children()
Yields

Internal named children.

Note

If subclassed, this method should yield the named children of the superclass alongside yielding the new named children.

Return type

Iterator[Tuple[str, Any]]

_properties()
Return type

Dict[str, Any]

Returns

Internal properties.

Note

If subclassed, this method should integrate the new properties in the properties of the superclass.

extra_named_children()
Yields

Extra named children.

Return type

Iterator[Tuple[str, Any]]

extra_properties()
Return type

Dict[str, Any]

Returns

Extra properties.

named_children()
Yields

Internal and extra named children.

Return type

Iterator[Tuple[str, Any]]

properties()
Return type

Dict[str, Any]

Returns

Internal and extra properties.

class pystiche.LossDict(losses=())

Hierarchic dictionary of scalar torch.Tensor losses. Levels are seperated by "." in the names.

Parameters

losses (Sequence[Tuple[str, Union[Tensor, LossDict]]]) – Optional named losses.

__mul__(other)

Multiplies all entries with a scalar.

Parameters

other (SupportsFloat) – Scalar multiplier.

Return type

LossDict

__setitem__(name, loss)

Add a named loss to the entries.

Parameters
Raises

TypeError – If loss is torch.Tensor but isn’t scalar.

Return type

None

aggregate(max_depth)

Aggregate all entries up to a given maximum depth.

Parameters

max_depth (int) – If 0 returns sum of all entries as scalar torch.Tensor.

Return type

Union[Tensor, LossDict]

backward(*args, **kwargs)

Computes the gradient of all entries with respect to the graph leaves. See torch.Tensor.backward() for details.

Return type

None

item()
Return type

float

Returns

The sum of all entries as standard Python number.

total()
Return type

Tensor

Returns

Sum of all entries as scalar tensor.

class pystiche.Module(named_children=None, indexed_children=None)

torch.nn.Module with the enhanced representation options of pystiche.ComplexObject.

Parameters

Note

named_children and indexed_children are mutually exclusive parameters.

torch_repr()
Return type

str

Returns

Native torch representation.

Math

pystiche.nonnegsqrt(x)

Safely calculates the square-root of a non-negative input

\[\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} \begin{split}\fun{nonnegsqrt}{x} = \begin{cases} \sqrt{x} &\quad\text{if } x \ge 0 \\ 0 &\quad\text{otherwise} \end{cases}\end{split}\]

Note

This operation is useful in situations where the input tensor is strictly non-negative from a theoretical standpoint, but might be negative due to numerical instabilities.

Parameters

x (Tensor) – Input tensor.

Return type

Tensor

pystiche.gram_matrix(x, normalize=False)

Calculates the channel-wise Gram matrix of a batched input tensor.

Given a tensor \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} x\) of shape \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} B \times C \times N_1 \times \dots \times N_D\) each element of the single-sample Gram matrix \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} G_{b,c_1 c_2}\) with \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} b \in 1,\dots,B\) and \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} c_1,\,c_2 \in 1,\dots,C\) is calculated by

\[\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} G_{b,c_1 c_2} = \dotproduct{\fun{vec}{x_{b, c_1}}}{\fun{vec}{x_{b, c_2}}}\]

where \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} \dotproduct{\cdot}{\cdot}\) denotes the dot product and \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} \fun{vec}{\cdot}\) denotes the vectorization function .

Parameters
  • x (Tensor) – Input tensor of shape \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} B \times C \times N_1 \times \dots \times N_D\)

  • normalize (bool) – If True, normalizes the Gram matrix \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} G\) by \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} \prod\limits_{d=1}^{D} N_d\) to keep the value range similar for different sized inputs. Defaults to False.

Return type

Tensor

Returns

Channel-wise Gram matrix G of shape \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} B \times C \times C\).

pystiche.cosine_similarity(x1, x2, eps=1e-08, batched_input=None)

Calculates the cosine similarity between the samples of x1 and x2.

Parameters
  • x1 (Tensor) – First input of shape \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} B \times S_1 \times N_1 \times \dots \times N_D\).

  • x2 (Tensor) – Second input of shape \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} B \times S_2 \times N_1 \times \dots \times N_D\).

  • eps (float) – Small value to avoid zero division. Defaults to 1e-8.

  • batched_input (Optional[bool]) – If False, treat the first dimension of the inputs as sample dimension, i.e. \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} S \times N_1 \times \dots \times N_D\). Defaults to True.

Return type

Tensor

Returns

Similarity matrix of shape \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} B \times S_1 \times S_2\) in which every element represents the cosine similarity between the corresponding samples \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} S\) of x1 and x2. If batched_input is False, the output shape is \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} S_1 \times S_2\)

pystiche.data

class pystiche.data.LocalImage(file, collect_local_guides=True, guides=None, transform=None, note=None)
read(root=None, **read_image_kwargs)

Read the image from file with pystiche.image.read_image() and optionally apply transform.

Parameters
Return type

Tensor

class pystiche.data.LocalImageCollection(images)
read(root=None, **read_image_kwargs)

Read the images from file.

Parameters
Return type

Dict[str, Tensor]

Returns

Dictionary with the name image pairs.

class pystiche.data.DownloadableImage(url, title=None, author=None, date=None, license=None, md5=None, file=None, guides=None, prefix_guide_files=True, transform=None, note=None)
download(root=None, overwrite=False)

Download the image and if applicable the guides from their URL. If the correct MD5 checksum is known, it is verified first. If it checks out the file not re-downloaded.

Parameters
  • root (Optional[str]) – Optional root directory for the download if the file is a relative path. Defaults to pystiche.home().

  • overwrite (bool) – Overwrites files if they already exists or the MD5 checksum does not match. Defaults to False.

Return type

None

static generate_file(url, title, author)

Generate a filename from the supplied information from the following scheme:

  • If title and author are None, the ending of url is used.

  • If one of title or author is not None, it is used as filename where spaces are replaced by underscores.

  • If title and author are not None, the filename is generated as above separating both path with double underscores.

Parameters
  • url (str) – URL to the image.

  • title (Optional[str]) – Optional title of the image.

  • author (Optional[str]) – Optional author of the image

Return type

str

read(root=None, download=None, overwrite=False, **read_image_kwargs)

Read the image from file with pystiche.image.read_image(). If available the transform is applied afterwards.

Parameters
  • root (Optional[str]) – Optional root directory if the file is a relative path. Defaults to pystiche.home().

  • download (Optional[bool]) – If True, downloads the image first. Defaults to False if the file already exists and the MD5 checksum is not known. Otherwise defaults to True.

  • overwrite (bool) – If downloaded, overwrites files if they already exists or the MD5 checksum does not match. Defaults to False.

  • **read_image_kwargs – Optional parameters passed to pystiche.image.read_image().

Return type

Tensor

class pystiche.data.DownloadableImageCollection(images)
download(root=None, overwrite=False)

Download all images and if applicable their guides from their URLs. See pystiche.data.DownloadableImage.download() for details.

Parameters
  • root (Optional[str]) – Optional root directory for the download if the file is a relative path. Defaults to pystiche.home().

  • overwrite (bool) – Overwrites files if they already exists or the MD5 checksum does not match. Defaults to False.

Return type

None

read(root=None, download=None, overwrite=False, **read_image_kwargs)

Read the images from file. See pystiche.data.DownloadableImage.read() for details.

Parameters
  • root (Optional[str]) – Optional root directory if the file is a relative path. Defaults to pystiche.home().

  • download (Optional[bool]) – If True, downloads the image first. Defaults to False if the file already exists and the MD5 checksum is not known. Otherwise defaults to True.

  • overwrite (bool) – If downloaded, overwrites files if they already exists or the MD5 checksum does not match. Defaults to False.

  • **read_image_kwargs – Optional parameters passed to pystiche.image.read_image().

Return type

Dict[str, Tensor]

Returns

Dictionary with the name image pairs.

pystiche.demo

pystiche.demo.images()

Collection of images used in the usage examples.

Return type

DownloadableImageCollection

name

image

bird1

bird1

bird2

bird2

castle

castle

church

church

cliff

cliff

mosaic

mosaic

paint

paint

pystiche.demo.transformer()

Basic transformer for model-based optimization.

The transformer is compatible with the official PyTorch example which in turn is based on [JAL16]

Return type

Module

pystiche.enc

class pystiche.enc.Encoder(named_children=None, indexed_children=None)

ABC for all encoders. Invokes Encoder.forward() if called.

abstract forward(input)

Encodes the given input.

Note

This method has to be overwritten in every subclass.

Return type

Tensor

abstract propagate_guide(guide)

Encodes the given guide.

Note

This method has to be overwritten in every subclass.

Return type

Tensor

class pystiche.enc.SequentialEncoder(modules)

Encoder that operates in sequential manner. Invokes Encoder.forward() if called.

Parameters

modules (Sequence[Module]) – Sequential modules.

class pystiche.enc.MultiLayerEncoder(modules)

Sequential encoder with convenient access to intermediate layers.

Parameters

modules (Sequence[Tuple[str, Module]]) – Named modules that serve as basis for the encoding.

registered_layers

Layers, on which the encodings will be cached during the forward() pass.

__call__(*args, **kwargs)

Invokes forward().

Return type

Any

__contains__(layer)

Is the layer part of the multi-layer encoder?

Parameters

layer (str) – Layer to be checked.

Return type

bool

clear_cache()

Clear the internal cache.

Return type

None

encode(input, layers)

Encode the input on layers.

Parameters
  • input (Tensor) – Input to be encoded.

  • layers (Sequence[str]) – Layers on which the input should be encoded.

Return type

Tuple[Tensor, …]

extract_encoder(layer)

Extract a SingleLayerEncoder for the layer and register it.

Parameters

layer (str) – Layer.

Return type

SingleLayerEncoder

forward(input, layer=None, cache=None, to_cache=None)

Encode the input.

Parameters
  • input (Tensor) – Input to be encoded.

  • layer (Optional[str]) – Layer on which the input should be encoded. If omitted, defaults to the last layer in the multi-layer encoder.

  • cache (Optional[Dict[str, Tensor]]) – Encoding cache. If omitted, defaults to the the internal cache.

  • to_cache (Optional[Collection[str]]) – Layers, of which the encodings should be cached. If omitted, defaults to registered_layers.

Examples

>>> modules = [("conv", nn.Conv2d(3, 3, 3)), ("pool", nn.MaxPool2d(2))]
>>> mle = pystiche.enc.MultiLayerEncoder(modules)
>>> input = torch.rand(1, 3, 128, 128)
>>> output = mle(input, "conv")
Return type

Tensor

propagate_guide(guide, layers, method='simple', allow_empty=False)

Propagate the guide on the given layers.

Parameters
  • guide (Tensor) – Guide.

  • layers (Sequence[str]) – Layers.

  • allow_empty (bool) – If True, allow the propagated guides to become empty. Defaults to False.

Return type

Tuple[Tensor, …]

Returns

Tuple of guides which order corresponds to layers.

register_layer(layer)

Register a layer for caching the encodings in the forward() pass.

Parameters

layer (str) – Layer to be registered.

Return type

None

verify(layer)

Verifies that a layer is part of the multi-layer encoder.

Parameters

layer (str) – Layer to be checked.

Raises

ValueError – If layer is not part of the multi-layer encoder.

Return type

None

class pystiche.enc.SingleLayerEncoder(multi_layer_encoder, layer)

Encoder extracted from a MultiLayerEncoder that operates on a single layer. Invokes SingleLayerEncoder.forward() if called.

multi_layer_encoder

Corresponding multi-layer encoder.

layer

Encoding layer.

forward(input)

Encode the given input on SingleLayerEncoder.layer of SingleLayerEncoder.multi_layer_encoder.

Parameters

input_image – Input image.

Return type

Tensor

propagate_guide(guide)

Propagate the given guide on SingleLayerEncoder.layer of SingleLayerEncoder.multi_layer_encoder.

Parameters

guide (Tensor) – Guide.

Return type

Tensor

Models

class pystiche.enc.ModelMultiLayerEncoder(pretrained=True, framework='torch', internal_preprocessing=True, allow_inplace=False)

Multi-layer encoder based on a pre-defined model.

Parameters
  • pretrained (bool) – If True, loads builtin weights. Defaults to True.

  • framework (str) – Name of the framework that was used to train the builtin weights. Defaults to "torch".

  • internal_preprocessing (bool) – If True, adds a preprocessing layer for the selected framework as first layer. Defaults to True.

  • allow_inplace (bool) –

    If True, allows inplace operations to reduce the memory requirement during the forward pass. Defaults to False.

    Warning

    After performing an inplace operation the encodings of the previous layer is no longer accessible. Only use this if you are sure that you do not need these encodings.

abstract collect_modules(inplace)

Collect modules of a base model with more descriptive names.

Parameters

inplace (bool) – If True, when possible, modules should use inplace operations.

Return type

Tuple[List[Tuple[str, Module]], Dict[str, str]]

Returns

List of name-module-pairs as well as a dictionary mapping the new, more descriptive names to the original ones.

load_state_dict(state_dict, strict=True, map_names=True, framework='unknown')

Loads parameters and buffers from the state_dict.

Parameters
  • state_dict (Dict[str, Tensor]) – State dictionary.

  • strict (bool) – Enforce matching keys in state_dict and the internal states.

  • map_names (bool) – If True, maps the names names in state_dict of the underlying model to the more descriptive names generated by collect_modules(). Defaults to True.

  • framework (str) –

    Name of the framework that was used to train the weights in state_dict. Defaults to "unknown".

    Note

    This has no effect on the behavior, but makes the representation of the ModelMultiLayerEncoder more descriptive.

Return type

_IncompatibleKeys

Returns

Named tuple with missing_keys and unexpected_keys fields.

load_state_dict_from_url(framework, strict=True, map_names=True, check_hash=True, **kwargs)

Downloads and loads parameters and buffers trained with framework.

Parameters
  • framework (str) – Name of the framework that was used to train the weights of the state_dict.

  • strict (bool) – Enforce matching keys in state_dict and the internal states.

  • map_names (bool) – If True, maps the names names in state_dict of the underlying model to the more descriptive names generated by collect_modules(). Defaults to True.

  • check_hash (bool) – If True, checks if the hash postfix of the URL matches the SHA256 hash of the downloaded state_dict. Defaults to True.

  • kwargs (Any) – Optional arguments for torch.hub.load_state_dict_from_url() .

Return type

None

abstract state_dict_url(framework)

Select URL of a downloadable state_dict.

Parameters

framework (str) – Name of the framework that was used to train the weights.

Raises

RuntimeError – If no state_dict is available.

Return type

str

VGG
class pystiche.enc.VGGMultiLayerEncoder(arch, **kwargs)

Multi-layer encoder based on VGG.

The VGG architecture was introduced by Krizhevsky, Sutskever, and Hinton in [KSH12]

Parameters
  • arch (str) – VGG architecture. Has to match "vgg(11|13|16|19)(_bn)?".

  • pretrained – If True, loads builtin weights. Defaults to True.

  • framework – Name of the framework that was used to train the builtin weights. Defaults to "torch".

  • kwargs (Any) – Optional arguments of ModelMultiLayerEncoder .

Raises

RuntimeError – If pretrained is True and no weights are available for the combination of arch and framework.

pystiche.enc.vgg11_multi_layer_encoder(**kwargs)

Multi-layer encoder based on VGG 11.

The VGG architecture was introduced by Krizhevsky, Sutskever, and Hinton in [KSH12]. VGG11 corresponds to configuration A in the paper.

Parameters

kwargs (Any) – Optional arguments of VGGMultiLayerEncoder .

Return type

VGGMultiLayerEncoder

pystiche.enc.vgg11_bn_multi_layer_encoder(**kwargs)

Multi-layer encoder based on VGG 11 with batch normalization.

The VGG architecture was introduced by Krizhevsky, Sutskever, and Hinton in [KSH12]. VGG11 corresponds to configuration A in the paper.

Parameters

kwargs (Any) – Optional arguments of VGGMultiLayerEncoder .

Return type

VGGMultiLayerEncoder

pystiche.enc.vgg13_multi_layer_encoder(**kwargs)

Multi-layer encoder based on VGG 13.

The VGG architecture was introduced by Krizhevsky, Sutskever, and Hinton in [KSH12]. VGG13 corresponds to configuration B in the paper.

Parameters

kwargs (Any) – Optional arguments of VGGMultiLayerEncoder .

Return type

VGGMultiLayerEncoder

pystiche.enc.vgg13_bn_multi_layer_encoder(**kwargs)

Multi-layer encoder based on VGG 13 with batch normalization.

The VGG architecture was introduced by Krizhevsky, Sutskever, and Hinton in [KSH12]. VGG13 corresponds to configuration B in the paper.

Parameters

kwargs (Any) – Optional arguments of VGGMultiLayerEncoder .

Return type

VGGMultiLayerEncoder

pystiche.enc.vgg16_multi_layer_encoder(**kwargs)

Multi-layer encoder based on VGG 16.

The VGG architecture was introduced by Krizhevsky, Sutskever, and Hinton in [KSH12]. VGG16 corresponds to configuration D in the paper.

Parameters

kwargs (Any) – Optional arguments of VGGMultiLayerEncoder .

Return type

VGGMultiLayerEncoder

pystiche.enc.vgg16_bn_multi_layer_encoder(**kwargs)

Multi-layer encoder based on VGG 16 with batch normalization.

The VGG architecture was introduced by Krizhevsky, Sutskever, and Hinton in [KSH12]. VGG16 corresponds to configuration D in the paper.

Parameters

kwargs (Any) – Optional arguments of VGGMultiLayerEncoder .

Return type

VGGMultiLayerEncoder

pystiche.enc.vgg19_multi_layer_encoder(**kwargs)

Multi-layer encoder based on VGG 19.

The VGG architecture was introduced by Krizhevsky, Sutskever, and Hinton in [KSH12]. VGG19 corresponds to configuration E in the paper.

Parameters

kwargs (Any) – Optional arguments of VGGMultiLayerEncoder .

Return type

VGGMultiLayerEncoder

pystiche.enc.vgg19_bn_multi_layer_encoder(**kwargs)

Multi-layer encoder based on VGG 19 with batch normalization.

The VGG architecture was introduced by Krizhevsky, Sutskever, and Hinton in [KSH12]. VGG19 corresponds to configuration E in the paper.

Parameters

kwargs (Any) – Optional arguments of VGGMultiLayerEncoder .

Return type

VGGMultiLayerEncoder

AlexNet
class pystiche.enc.AlexNetMultiLayerEncoder(pretrained=True, framework='torch', internal_preprocessing=True, allow_inplace=False)

Multi-layer encoder based on AlexNet.

The AlexNet architecture was introduced by Krizhevsky, Sutskever, and Hinton in [KSH12].

Parameters
  • pretrained (bool) – If True, loads builtin weights. Defaults to True.

  • framework (str) – Name of the framework that was used to train the builtin weights. Defaults to "torch".

  • kwargs – Optional arguments of ModelMultiLayerEncoder .

Raises
  • RuntimeError – If pretrained and no weights are available for the

  • framework`

pystiche.enc.alexnet_multi_layer_encoder(**kwargs)

Multi-layer encoder based on AlexNet.

The AlexNet architecture was introduced by Krizhevsky, Sutskever, and Hinton in [KSH12].

Parameters

kwargs (Any) – Optional arguments of AlexNetMultiLayerEncoder .

Return type

AlexNetMultiLayerEncoder

pystiche.image

Utilities

pystiche.image.calculate_aspect_ratio(image_size)
Return type

float

pystiche.image.image_to_edge_size(image_size, edge='short')
Return type

int

pystiche.image.edge_to_image_size(edge_size, aspect_ratio, edge='short')
Return type

Tuple[int, int]

pystiche.image.extract_batch_size(x)
Return type

int

pystiche.image.extract_num_channels(x)
Return type

int

pystiche.image.extract_image_size(x)
Return type

Tuple[int, int]

pystiche.image.extract_edge_size(x, edge='short')
Return type

int

pystiche.image.extract_aspect_ratio(x)
Return type

float

I/O

pystiche.image.read_image(file, device='cpu', make_batched=True, size=None, interpolation_mode='bilinear')

Read an image from file with PIL.Image and return it as Tensor .

Parameters
  • file (str) – Path to image file to be read.

  • device (Union[device, str]) – Device that the image is transferred to. Defaults to CPU.

  • make_batched (bool) – If True, a fake batch dimension is added to the image.

  • size (Optional[Union[int]]) – Optional size the image is resized to.

  • interpolation_mode (str) – Interpolation mode that is used to perform the optional resizing. Valid modes are "nearest", "bilinear", and "bicubic". Defaults to "bilinear".

Return type

Tensor

pystiche.image.write_image(image, file, mode=None, **save_kwargs)

Write a Tensor image to a file with PIL.Image .

Parameters
Return type

None

pystiche.image.show_image(image, title=None, mode=None, size=None, interpolation_mode='bilinear')

Show an image and optionally read it from file first.

Note

show_image uses matplotlib.pyplot.imshow() as primary means to show images. If that is not available the native PIL.Image.Image.show() is used as a fallback.

Parameters
  • image (Union[Tensor, str]) – Image to be shown. If str this is treated as a path to an image and is read by read_image() .

  • title (Optional[str]) – Optional title of the image.

  • mode (Optional[str]) –

    Optional image mode. See the Pillow documentation for details.

  • size (Optional[Union[int]]) – Optional size the image is resized to.

  • interpolation_mode (str) – Interpolation mode that is used to perform the optional resizing. Valid modes are "nearest", "bilinear", and "bicubic". Defaults to "bilinear".

Return type

None

Guides

pystiche.image.verify_guides(guides, verify_coverage=True, verify_overlap=True)

Verify if guides cover the whole canvas and if they do not overlap with each other.

Parameters
  • guides (Dict[str, Tensor]) – Guides to be verified.

  • verify_coverage (bool) – If True verifies that the guides cover the whole canvas. Defaults to True.

  • verify_overlap (bool) – If True verifies that the guides do not overlap with each other. Defaults to True.

Raises

RuntimeError – If at least on check is not successful.

Return type

None

pystiche.image.read_guides(dir, device='cpu', make_batched=True, size=None, interpolation_mode='nearest')

Read all guides from a directory using read_image() and return them as dictionary. The filename without extensions is used as region key.

Parameters
  • dir (str) – Path to root directory of the guide files.

  • device (Union[device, str]) – Device that the guides are transferred to. Defaults to CPU.

  • make_batched (bool) – If True, a fake batch dimension is added to every guide.

  • size (Optional[Union[int]]) – Optional size the guides are resized to.

  • interpolation_mode (str) – Interpolation mode that is used to perform the optional resizing. Valid modes are "nearest", "bilinear", and "bicubic". Defaults to "nearest".

Return type

Dict[str, Tensor]

pystiche.image.write_guides(guides, dir, ext='.png', mode='L', **save_kwargs)

Write guides to directory using write_image(). The region key is used as filename.

Parameters
  • guides (Dict[str, Tensor]) – Guides to be written.

  • dir (str) – Path to root directory of the guide files.

  • ext (str) – Extension that is appended to the filename. Defaults to ".png".

  • mode (str) –

    Optional image mode. See the Pillow documentation for details. Defaults to "L".

  • **save_kwargs – Other parameters that are passed to PIL.Image.Image.save() .

Return type

None

pystiche.image.guides_to_segmentation(guides, color_map=None)

Combines multiple guides into one segmentation image.

Parameters
Return type

Tensor

pystiche.image.segmentation_to_guides(seg, region_map=None)

Splits a segmentation image in multiple guides.

Parameters
  • seg (Tensor) – Segmentation image to be split.

  • region_map (Optional[Dict[Tuple[int, int, int], str]]) – Optional mapping from RGB triplets to regions. If omitted, the RGB triplets are used as key in the output.

Return type

Dict[Union[Tuple[int, int, int], str], Tensor]

pystiche.loss

class pystiche.loss.Loss(*, encoder=None, input_guide=None, score_weight=1.0)

Bases: pystiche.core._modules.Module, abc.ABC

Abstract base class for all losses.

abstract forward(input_image)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Return type

Union[Tensor, LossDict]

set_input_guide(guide)
Return type

None

class pystiche.loss.RegularizationLoss(*, encoder=None, input_guide=None, score_weight=1.0)

Bases: pystiche.loss._loss.Loss

Abstract base class for all regularization losses.

abstract calculate_score(input_repr)
Return type

Tensor

abstract input_enc_to_repr(enc)
Return type

Tensor

class pystiche.loss.ComparisonLoss(*, encoder=None, input_guide=None, target_image=None, target_guide=None, score_weight=1.0)

Bases: pystiche.loss._loss.Loss

Abstract base class for all comparison losses.

abstract calculate_score(input_repr, target_repr, *, ctx)
Return type

Tensor

abstract input_enc_to_repr(enc, ctx)
Return type

Tensor

set_target_image(image, *, guide=None)
Return type

None

abstract target_enc_to_repr(enc)
Return type

Tuple[Tensor, Optional[Tensor]]

Container

class pystiche.loss.LossContainer(named_losses, *, input_guide=None, target_image=None, target_guide=None, score_weight=1.0)

Generic container for Loss’es.

If called with an image, it will be passes it to all immediate losses and returns a pystiche.LossDict scaled with score_weight.

Parameters
  • named_losses (Sequence[Tuple[str, Loss]]) – Named immediate losses that will be called if OperatorContainer is called.

  • score_weight (float) – Score weight of the loss. Defaults to 1.0.

class pystiche.loss.MultiLayerEncodingLoss(mle, layers, encoding_loss_fn, *, layer_weights='mean', input_guide=None, target_image=None, target_guide=None, score_weight=1.0)

Convenience container for multiple Loss’es operating on different layers of the same pystiche.enc.MultiLayerEncoder.

Parameters
  • mle (MultiLayerEncoder) – Multi-layer encoder.

  • layers (Sequence[str]) – Layers of the mle that the children losses operate on.

  • encoding_loss_fn (Callable[[Encoder, float], Loss]) – Callable that returns a loss given a pystiche.enc.SingleLayerEncoder extracted from the mle and its corresponding layer weight.

  • layer_weights (Union[str, Sequence[float]]) – Weights passed to encoding_loss_fn. If "sum", each layer weight is set to 1.0. If "mean", each layer weight is set to 1.0 / len(layers). If sequence of float``s its length has to match ``layers’ length. Defaults to "mean".

  • score_weight (float) – Score weight of the loss. Defaults to 1.0.

Examples

>>> mle = pystiche.enc.vgg19_multi_layer_encoder()
>>> layers = ("relu1_1", "relu2_1", "relu3_1", "relu4_1", "relu5_1")
>>> loss = pystiche.loss.MultiLayerEncodingLoss(
...     mle,
...     layers,
...     lambda encoder, layer_weight: pystiche.loss.GramLoss(
...         encoder, score_weight=layer_weight
...     ),
... )
>>> input = torch.rand(2, 3, 256, 256)
>>> target = torch.rand(2, 3, 256, 256)
>>> loss.set_target_image(target)
>>> score = loss(input)
class pystiche.loss.MultiRegionLoss(regions, region_loss_fn, *, region_weights='sum', input_guide=None, target_image=None, target_guide=None, score_weight=1.0)

Convenience container for multiple Loss’es operating in different regions.

Parameters
  • regions (Sequence[str]) – Regions.

  • region_loss_fn (Callable[[str, float], Loss]) – Callable that returns a children loss given a region and its corresponding weight.

  • region_weights (Union[str, Sequence[float]]) – Weights passed to region_loss_fn. If "sum", each region weight is set to 1.0. If "mean", each region weight is set to 1.0 / len(layers). If sequence of float``s its length has to match ``regions’ length. Defaults to "mean".

  • score_weight (float) – Score weight of the loss. Defaults to 1.0.

Examples

>>> mle = pystiche.enc.vgg19_multi_layer_encoder()
>>> layers = ("relu1_1", "relu2_1", "relu3_1", "relu4_1", "relu5_1")
>>> def encoding_loss_fn(encoder, layer_weight):
...     return pystiche.loss.GramLoss(encoder, score_weight=layer_weight)
>>> regions = ("sky", "landscape")
>>> def region_loss_fn(region, region_weight):
...     return pystiche.loss.MultiLayerEncodingLoss(
...         mle,
...         layers,
...         encoding_loss_fn,
...         score_weight=region_weight,
...     )
>>> loss = pystiche.loss.MultiRegionLoss(regions, region_loss_fn)
>>> loss.set_regional_target_image("sky", torch.rand(2, 3, 256, 256))
>>> loss.set_regional_target_image("landscape", torch.rand(2, 3, 256, 256))
>>> input = torch.rand(2, 3, 256, 256)
>>> score = loss(input)
set_regional_input_guide(region, guide)

Invokes set_input_guide() on the operator of the given region.

Parameters
  • region (str) – Region.

  • guide (Tensor) – Input guide of shape \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} 1 \times 1 \times H \times W\).

Return type

None

set_regional_target_image(region, image, guide=None)

Invokes set_target_image() on the operator of the given region.

Parameters
  • region (str) – Region.

  • image (Tensor) – Input guide of shape \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} B \times C \times H \times W\).

  • guide (Optional[Tensor]) –

Return type

None

class pystiche.loss.PerceptualLoss(content_loss, style_loss, regularization=None, *, content_image=None, content_guide=None, style_image=None, style_guide=None)

Perceptual loss comprising content and style loss as well as optionally a regularization.

Parameters
regional_content_guide(region=None)

Regional content guide.

Parameters

region (Optional[str]) – Region to get the content guide from.

Return type

Optional[Tensor]

regional_style_image(region=None)

Regional style image.

Parameters

region (Optional[str]) – Region to get the style image from.

Return type

Optional[Tensor]

set_content_guide(guide, *, region=None)

Sets the content guide.

Parameters
  • guide (Tensor) – Content guide.

  • region (Optional[str]) – Optional region to set the guide for. If omitted, the guide will be applied to all regions.

Return type

None

set_style_image(image, *, guide=None, region=None)

Sets the style image and guide.

Parameters
  • image (Tensor) – Style image.

  • guide (Optional[Tensor]) – Style guide.

  • region (Optional[str]) – Optional region to set the image and guide for. If omitted, the image and guide will be applied to all regions.

Return type

None

Regularization

class pystiche.loss.TotalVariationLoss(*, exponent=2.0, input_guide=None, score_weight=1.0)

Bases: pystiche.loss._loss.RegularizationLoss

The total variation loss is a regularizer used to suppress checkerboard artifacts by penalizing the gradient of the image. It is calculated by

\[\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} \mean[\limits_{i,j}] \parentheses{\parentheses{x_{i,j+1} - x_{i,j}}^2 + \parentheses{x_{i+1,j} - x_{i,j}}^2}^{\frac{\beta}{2}}\]

where \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} x\) denotes the image and \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} i,j\) index a specific pixel.

Note

Opposed to the paper, the implementation calculates the grand average \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} \mean\) opposed to the grand sum \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} \sum\) to account for different sized images.

Parameters
  • exponent (float) – Parameter \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} \beta\) . A higher value leads to more smoothed results. Defaults to 2.0.

  • score_weight (float) – Score weight of the operator. Defaults to 1.0.

Examples

>>> loss = pystiche.loss.TotalVariationLoss()
>>> input = torch.rand(2, 3, 256, 256)
>>> score = loss(input)

See also

The total variation loss was introduced by Mahendran and Vedaldi in [MV15] .

Comparison

class pystiche.loss.FeatureReconstructionLoss(encoder, *, input_guide=None, target_image=None, target_guide=None, score_weight=1.0)

Bases: pystiche.loss._loss.ComparisonLoss

The feature reconstruction loss is the de facto standard content loss. It measures the mean squared error (MSE) between the encodings of an input_image \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} \hat{I}\) and a target_image \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} I\) :

\[\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} \mean \parentheses{\Phi\of{\hat{I}} - \Phi\of{I}}^2\]

Here \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} \Phi\of{\cdot}\) denotes the encoder.

Note

Opposed to the paper, the implementation calculates the grand average \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} \mean\) opposed to the grand sum \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} \sum\) to account for different sized images.

Parameters
  • encoder (Encoder) – Encoder \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} \Phi\).

  • score_weight (float) – Score weight of the operator. Defaults to 1.0.

Examples

>>> mle = pystiche.enc.vgg19_multi_layer_encoder()
>>> encoder = mle.extract_encoder("relu4_2")
>>> loss = pystiche.loss.FeatureReconstructionLoss(encoder)
>>> input = torch.rand(2, 3, 256, 256)
>>> target = torch.rand(2, 3, 256, 256)
>>> loss.set_target_image(target)
>>> score = loss(input)

See also

The feature reconstruction loss was introduced by Mahendran and Vedaldi in [MV15] , but its name was coined by Johnson, Alahi, and Fei-Fei in [JAL16] .

class pystiche.loss.GramLoss(encoder, *, normalize=True, input_guide=None, target_image=None, target_guide=None, score_weight=1.0)

Bases: pystiche.loss._loss.ComparisonLoss

The gram loss is a style loss based on the correlation of feature map channels. It measures the mean squared error (MSE) between the channel-wise Gram matrices of the encodings of an input_image \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} \hat{I}\) and a target_image \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} I\):

\[\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} \mean \parentheses{\fun{gram}{\Phi\of{\hat{I}}} - \fun{gram}{\Phi\of{I}}}^2\]

Here \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} \Phi\of{\cdot}\) denotes the encoder and \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} \fun{gram}{\cdot}\) denotes pystiche.gram_matrix().

Note

Opposed to the paper, the implementation calculates the grand average \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} \mean\) opposed to the grand sum \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} \sum\) to account for different sized images.

Parameters
  • encoder (Encoder) – Encoder \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} \Phi\left( \cdot \right)\).

  • normalize (bool) – If True, normalizes the Gram matrices to account for different sized images. See pystiche.gram_matrix() for details. Defaults to True.

  • score_weight (float) – Score weight of the operator. Defaults to 1.0.

Examples

>>> mle = pystiche.enc.vgg19_multi_layer_encoder()
>>> encoder = mle.extract_encoder("relu4_2")
>>> loss = pystiche.loss.GramLoss(encoder)
>>> input = torch.rand(2, 3, 256, 256)
>>> target = torch.rand(2, 3, 256, 256)
>>> loss.set_target_image(target)
>>> score = loss(input)

See also

The feature reconstruction loss was introduced by Gatys, Ecker, and Bethge in [GEB16] .

class pystiche.loss.MRFLoss(encoder, patch_size, *, stride=1, target_transforms=None, input_guide=None, target_image=None, target_guide=None, score_weight=1.0)

Bases: pystiche.loss._loss.ComparisonLoss

The MRF loss is a style loss based on Markov Random Fields (MRFs). It measures the mean squared error (MSE) between neural patches extracted from the encodings of an input_image \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} \hat{I}\) and a target_image \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} I\):

\[\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} \mean \parentheses{p_n\of{\Phi\of{\hat{I}}} - p_{MCS\of{n}}\of{\Phi\of{\hat{I}}}}^2\]

Since the number of patches might be different for both images and the order of the patches does not correlate with the order of the enclosed style element, for each input neural patch \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} n\) a fitting target neural patch is to selected based on the maximum cosine similarity \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} MCS\of{n}\) with pystiche.cosine_similarity().

Note

Opposed to the paper, the implementation calculates the grand average \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} \mean\) opposed to the grand sum \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} \sum\) to account for different sized images.

Parameters
  • encoder (Encoder) – Encoder \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} \Phi\) .

  • patch_size (Union[int, Sequence[int]]) – Spatial size of the neural patches.

  • stride (Union[int, Sequence[int]]) – Distance between two neural patches.

  • target_transforms (Optional[Iterable[Module]]) – Optional transformations to apply to the target image before the neural patches are extracted. Defaults to None.

  • score_weight (float) – Score weight of the operator. Defaults to 1.0.

Examples

>>> mle = pystiche.enc.vgg19_multi_layer_encoder()
>>> encoder = mle.extract_encoder("relu4_2")
>>> patch_size = 3
>>> loss = pystiche.loss.MRFLoss(encoder, patch_size)
>>> input = torch.rand(2, 3, 256, 256)
>>> target = torch.rand(2, 3, 256, 256)
>>> loss.set_target_image(target)
>>> score = loss(input)

See also

The MRF loss was introduced by Li and Wand in [LW16].

static scale_and_rotate_transforms(num_scale_steps=1, scale_step_width=0.05, num_rotate_steps=1, rotate_step_width=10.0)

Generate a list of scaling and rotations transformations.

See also

The output of this method can be used as parameter target_transforms of MRFLoss to enrich the space of target neural patches:

target_transforms = MRFOperator.scale_and_rotate_transforms()
loss = pystiche.loss.MRFLoss(..., target_transforms=target_transforms)
Parameters
  • num_scale_steps (int) – Number of scale steps. Each scale is performed in both directions, i.e. enlarging and shrinking the motif. Defaults to 1.

  • scale_step_width (float) – Width of each scale step. Defaults to 5e-2.

  • num_rotate_steps (int) – Number of rotate steps. Each rotate is performed in both directions, i.e. clockwise and counterclockwise. Defaults to 1.

  • rotate_step_width (float) – Width of each rotation step in degrees. Defaults to 10.0.

Return type

List[ScaleAndRotate]

Returns

(num_scale_steps * 2 + 1) * (num_rotate_steps * 2 + 1) transformations in total comprising every combination given by the input parameters.

pystiche.loss.functional

pystiche.loss.functional.mrf_loss(input, target, eps=1e-08, reduction='mean', batched_input=None)

Calculates the MRF loss. See pystiche.loss.MRFLoss for details.

Parameters
  • input (Tensor) – Input of shape \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} B \times S_1 \times N_1 \times \dots \times N_D\).

  • target (Tensor) – Target of shape \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} B \times S_2 \times N_1 \times \dots \times N_D\).

  • eps (float) – Small value to avoid zero division. Defaults to 1e-8.

  • reduction (str) – Reduction method of the output passed to pystiche.misc.reduce(). Defaults to "mean".

  • batched_input (Optional[bool]) – If False, treat the first dimension of the inputs as sample dimension, i.e. \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} S \times N_1 \times \dots \times N_D\). Defaults to True. See pystiche.cosine_similarity() for details.

Examples

>>> import pystiche.loss.functional as F
>>> input = torch.rand(1, 256, 64, 3, 3)
>>> target = torch.rand(1, 128, 64, 3, 3)
>>> score = F.mrf_loss(input, target, batched_input=True)
Return type

Tensor

pystiche.loss.functional.total_variation_loss(input, exponent=2.0, reduction='mean')

Calculates the total variation loss. See pystiche.ops.TotalVariationOperator for details.

Parameters
  • input (Tensor) – Input image

  • exponent (float) – Parameter \(\newcommand{\parentheses}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\mean}[1][]{\overline{\sum #1}} \newcommand{\fun}[2]{\text{#1}\of{#2}} \newcommand{\of}[1]{\parentheses{#1}} \newcommand{\dotproduct}[2]{\left\langle #1 , #2 \right\rangle} \newcommand{\openinterval}[2]{\parentheses{#1, #2}} \newcommand{\closedinterval}[2]{\brackets{#1, #2}} \beta\) . A higher value leads to more smoothed results. Defaults to 2.0.

  • reduction (str) – Reduction method of the output passed to pystiche.misc.reduce(). Defaults to "mean".

Examples

>>> import pystiche.loss.functional as F
>>> input = torch.rand(2, 3, 256, 256)
>>> score = F.total_variation_loss(input)
Return type

Tensor

pystiche.loss.functional.value_range_loss(input, min=0.0, max=1.0, reduction='mean')
Return type

Tensor

pystiche.meta

pystiche.meta.tensor_meta(x, **kwargs)
Return type

Dict[str, Any]

pystiche.meta.is_scalar_tensor(x)
Return type

bool

pystiche.meta.is_conv_module(x)
Return type

bool

pystiche.meta.conv_module_meta(x, **kwargs)
Return type

Dict[str, Any]

pystiche.meta.is_pool_module(x)
Return type

bool

pystiche.meta.pool_module_meta(x, **kwargs)
Return type

Dict[str, Any]

pystiche.misc

pystiche.misc.get_input_image(starting_point='content', content_image=None, style_image=None)

Generates an input image for NST from the given starting_point.

Parameters
  • starting_point (Union[str, Tensor]) – If Tensor returns a copy. If "content" or "style" returns a copy of content_image or style_image, respectively. If "random" returns a white noise image with the dimensions of content_image or style_image, respectively. Defaults to "content".

  • content_image (Optional[Tensor]) – Content image. Only required if starting_point is "content" or "random".

  • style_image (Optional[Tensor]) – Style image. Only required if starting_point is "style" or "random".

Return type

Tensor

pystiche.misc.get_device(device=None)

Selects a device to perform an NST on.

Parameters

device (Optional[str]) – If str, returns the corresponding device. If None selects CUDA if available and otherwise CPU. Defaults to None.

Return type

device

pystiche.misc.reduce(x, reduction)

Reduces a Tensor as specified.

Parameters
  • x (Tensor) – Input tensor.

  • reduction (str) – Reduction method to be applied to x. If "none", no reduction will be applied. If "sum" or "mean", the sum() or mean() will be applied across all dimensions of x.

Return type

Tensor

pystiche.misc.build_complex_obj_repr(name, properties=None, named_children=(), line_length=80, num_indent=2)
Return type

str

pystiche.optim

pystiche.optim.default_image_optimizer(input_image)
Parameters

input_image (Tensor) – Image to be optimized.

Return type

LBFGS

Returns

torch.optim.LBFGS optimizer with a learning rate of 1.0. The pixels of input_image are set as optimization parameters.

pystiche.optim.image_optimization(input_image, criterion, optimizer=None, num_steps=500, preprocessor=None, postprocessor=None, quiet=False)

Perform an image optimization with integrated logging.

Parameters
  • input_image (Tensor) – Image to be optimized.

  • criterion (Module) – Optimization criterion.

  • optimizer (Union[Optimizer, Callable[[Tensor], Optimizer], None]) – Optional optimizer or optimizer getter. If omitted, default_image_optimizer() is used. If a preprocessor is used, has to be a getter.

  • num_steps (Union[int, Iterable[int]]) – Number of optimization steps. Defaults to 500.

  • preprocessor (Optional[Module]) – Optional preprocessor that is called with the input_image before the optimization.

  • postprocessor (Optional[Module]) – Optional preprocessor that is called with the input_image after the optimization.

  • quiet (bool) – If True, no information is printed to STDOUT during the optimization. Defaults to False.

Raises
  • RuntimeError – If preprocessor is used and optimizer is not passed as

  • getter.

Return type

Tensor

pystiche.optim.pyramid_image_optimization(input_image, criterion, pyramid, get_optimizer=None, preprocessor=None, postprocessor=None, quiet=False)

Perform a image optimization for pystiche.pyramid.ImagePyramid s with integrated logging.

Parameters
  • input_image (Tensor) – Image to be optimized.

  • criterion (Module) – Optimization criterion.

  • pyramid (ImagePyramid) – Image pyramid.

  • get_optimizer (Optional[Callable[[Tensor], Optimizer]]) – Optional getter for the optimizer. If None, default_image_optimizer() is used. Defaults to None.

  • preprocessor (Optional[Module]) – Optional preprocessor that is called with the input_image before the optimization.

  • postprocessor (Optional[Module]) – Optional preprocessor that is called with the input_image after the optimization.

  • quiet (bool) – If True, no information is printed to STDOUT during the optimization. Defaults to False.

Return type

Tensor

pystiche.optim.default_model_optimizer(transformer)
Parameters

transformer (Module) – Transformer to be optimized.

Return type

Optimizer

Returns

torch.optim.Adam optimizer with a learning rate of 1e-3. The parameters of transformer are set as optimization parameters.

pystiche.optim.model_optimization(image_loader, transformer, criterion, criterion_update_fn=None, optimizer=None, quiet=False)

Perform a model optimization for a single epoch with integrated logging.

Parameters
  • image_loader (DataLoader) – Images used as input for the transformer. Drawing from this should yield either an batched image or a tuple or list with a batched image as first item.

  • transformer (Module) – Transformer to be optimized.

  • criterion (Module) – Optimization criterion.

  • criterion_update_fn (Optional[Callable[[Tensor, Module], None]]) – Is called before each optimization step with the current images and the optimization criterion. If omitted and criterion is a PerceptualLoss or a GuidedPerceptualLoss this defaults to invoking set_content_image().

  • optimizer (Optional[Optimizer]) – Optional optimizer. If None, default_model_optimizer() is used.

  • quiet (bool) – If True, no information is printed to STDOUT during the optimization. Defaults to False.

Return type

Module

pystiche.optim.multi_epoch_model_optimization(image_loader, transformer, criterion, criterion_update_fn=None, epochs=2, optimizer=None, lr_scheduler=None, quiet=False)

Perform a model optimization for multiple epochs with integrated logging.

Parameters
  • image_loader (DataLoader) – Images used as input for the transformer. Drawing from this should yield either an batched image or a tuple or list with a batched image as first item.

  • transformer (Module) – Transformer to be optimized.

  • criterion (Module) – Optimization criterion.

  • criterion_update_fn (Optional[Callable[[Tensor, Module], None]]) – Is called before each optimization step with the current images and the optimization criterion. If omitted and criterion is a PerceptualLoss or a GuidedPerceptualLoss this defaults to invoking set_content_image().

  • epochs (int) – Number of epochs. Defaults to 2.

  • optimizer (Optional[Optimizer]) – Optional optimizer. If None, it is extracted from lr_scheduler or func:default_model_optimizer is used.

  • lr_scheduler (Optional[_LRScheduler]) – Optional learning rate scheduler. step() is invoked after every epoch.

  • quiet (bool) – If True, no information is printed to STDOUT during the optimization. Defaults to False.

Return type

Module

pystiche.pyramid

Image pyramid

class pystiche.pyramid.ImagePyramid(edge_sizes, num_steps, edge='short', interpolation_mode='bilinear', resize_targets=())

Image pyramid for a coarse-to-fine optimization on different levels. If iterated on yields PyramidLevel s and handles the resizing of all set images and guides of resize_targets.

Parameters
  • edge_sizes (Sequence[int]) – Edge sizes for each level.

  • num_steps (Union[Sequence[int], int]) – Number of steps for each level. If sequence of int its length has to match the length of edge_sizes.

  • edge (Union[Sequence[str], str]) – Corresponding edge to the edge size for each level. Can be "short" or "long". If sequence of str its length has to match the length of edge_sizes. Defaults to "short".

  • interpolation_mode (str) –

    Interpolation mode used for the resizing of the images. Defaults to "bilinear".

    Note

    For the resizing of guides "nearest" is used regardless of the interpolation_mode.

  • resize_targets (Collection[Loss]) – Targets for resizing of set images and guides during iteration.

class pystiche.pyramid.OctaveImagePyramid(max_edge_size, num_steps, num_levels=None, min_edge_size=64, **image_pyramid_kwargs)

Bases: pystiche.pyramid.pyramid.ImagePyramid

Image pyramid that comprises levels spaced by a factor of two.

Parameters
  • max_edge_size (int) – Maximum edge size.

  • num_steps (Union[int, Sequence[int]]) –

    Number of steps for each level.

    Note

    If num_steps is specified as sequence of int``s, you should also specify ``num_levels to match the lengths

  • num_levels (Optional[int]) – Optional number of levels. If None, the number is determined by the number of steps of factor two between max_edge_size and min_edge_size.

  • min_edge_size (int) – Minimum edge size for the automatic calculation of num_levels.

  • image_pyramid_kwargs (Any) – Additional options. See ImagePyramid for details.

Pyramid level

class pystiche.pyramid.PyramidLevel(edge_size, num_steps, edge)

Level with an pystiche.pyramid.ImagePyramid. If iterated on, yields the step beginning at 1 and ending in num_steps.

Parameters
  • edge_size (int) – Edge size.

  • num_steps (int) – Number of steps.

  • edge (str) – Corresponding edge to the edge size. Can be "short" or "long".

resize_guide(guide, aspect_ratio=None, interpolation_mode='nearest')

Resize a guide to the edge_size on the corresponding edge of the PyramidLevel.

Parameters
  • guide (Tensor) – Guide to be resized.

  • aspect_ratio (Optional[float]) – Optional aspect ratio of the output. If None, the aspect ratio of guide is used. Defaults to None.

  • interpolation_mode (str) – Interpolation mode used to resize image. Defaults to "nearest".

Return type

Tensor

resize_image(image, aspect_ratio=None, interpolation_mode='bilinear')

Resize an image to the edge_size on the corresponding edge of the PyramidLevel.

Parameters
  • image (Tensor) – Image to be resized.

  • aspect_ratio (Optional[float]) – Optional aspect ratio of the output. If None, the aspect ratio of image is used. Defaults to None.

  • interpolation_mode (str) – Interpolation mode used to resize image. Defaults to "bilinear".

Warning

The resizing is performed without gradient calculation. Do not use this if the image needs a gradient.

Return type

Tensor

Literature Reference

CZP+18

Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. Encoder-decoder with atrous separable convolution for semantic image segmentation. In The European Conference on Computer Vision (ECCV). 2018. URL: http://openaccess.thecvf.com/content_ECCV_2018/papers/Liang-Chieh_Chen_Encoder-Decoder_with_Atrous_ECCV_2018_paper.pdf, doi:10.1007/978-3-030-01234-2_49.

GEB+17

Leon A. Gatys, Alexander S. Ecker, Matthias Bethge, Aaron Hertzmann, and Eli Shechtman. Controlling perceptual factors in neural style transfer. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017. arXiv:1611.07865, doi:10.1109/CVPR.2017.397.

GEB16

Leon A. Gatys, Alexander. S. Ecker, and Matthias Bethge. Image style transfer using convolutional neural networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016. arXiv:1508.06576, doi:10.1109/CVPR.2016.265.

JAL16

Justin Johnson, Alexandre Alahi, and Fei-Fei Li. Perceptual losses for real-time style transfer and super-resolution. In European Conference on Computer Vision (ECCV). 2016. arXiv:1603.08155, doi:10.1007/978-3-319-46475-6_43.

KSH12

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25 (NIPS). 2012. URL: https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf.

LW16

Chuan Li and Michael Wand. Combining markov random fields and convolutional neural networks for image synthesis. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016. arXiv:1601.04589, doi:10.1109/CVPR.2016.272.

MV15

Aravindh Mahendran and Andrea Vedaldi. Understanding deep image representations by inverting them. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2015. arXiv:1412.0035, doi:10.1109/CVPR.2015.7299155.

SID17

Amir Semmo, Tobias Isenberg, and Jürgen Döllner. Neural style transfer: a paradigm shift for image-based artistic rendering? In Proceedings of the Symposium on Non-Photorealistic Animation and Rendering (NPAR). 2017. doi:10.1145/3092919.3092920.

SZ14

Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. Computing Research Repository (CoRR), 2014. arXiv:1409.1556.

ULVL16

Dmitry Ulyanov, Vadim Lebedev, Andrea Vedaldi, and Viktor S. Lempitsky. Texture networks: feed-forward synthesis of textures and stylized images. In International Conference on Machine Learning (ICML). 2016. URL: http://proceedings.mlr.press/v48/ulyanov16.html, arXiv:1603.03417.

UVL16

Dmitry Ulyanov, Andrea Vedaldi, and Viktor S. Lempitsky. Instance normalization: the missing ingredient for fast stylization. Computing Research Repository (CoRR), 2016. arXiv:1607.08022.