hantverkskod

Super size this

As I wrapped up my denoiser project, I started thinking about how I could turn it into a super-resolution model. I could have just asked an LLM or looked it up on Wikipedia, but I’ve often found it useful to reinvent the wheel before looking at what other people have done. A bit of friction and resistance tends to help me internalize ideas better than having the solution immediately handed to me.

To make things a bit easier for myself (or so I thought), I decided to create a model that was hard-coded to double the resolution of an image. The idea I came up with was simple and seemed like it might work. What if I trained a network to adjust a bicubically upscaled image so that it matched a ground-truth version? The difference between the two images could be seen as a form of “noise” that should be removed.

This meant I could reuse almost all of my denoiser project (a relatively deep U-Net, see the previous post). The only thing I had to change was how the training data was generated. This was simple enough. I could take a high-resolution image, downscale it by 50% using a bicubic filter, upscale it back to its original resolution, and et voilà!

The last thing I did was look at better loss functions. For my denoiser I’d used L1, but for this project I wanted to see if there was something better. This is an area where my general knowledge is strong enough that there was nothing to be gained by reinventing the wheel, so I just asked Codex for a recommendation. It suggested Charbonnier loss, which is very close to L1 but includes a small tweak that makes it differentiable at zero, which is beneficial during training. The loss function is defined as follows:

Loss(x) = \sqrt{x^2 + \varepsilon^2}

ε is a tunable parameter. I set it to 1e-3 in this project. As you can see, the function asymptotically approaches L1 loss as the error increases.

Results

I trained the network on 102,400 random patches of 128 × 128 pixels drawn from a dataset of 216 photos. This took about 20 hours on my RTX 5070. Before we look at the model output, let’s first see what we get from high-quality Lanczos resampling. Here’s the Macaws image from my denoiser project again:

Unsurprisingly, the resampled image mostly looks like a blurry version of the low resolution source. Now let’s look at what we get with the U-Net model.

This is much better. We obviously lose some detail, but the U-Net does a pretty good job of reconstructing the shape of the image. I was expecting a better result than I got from the denoiser, but I was still pleasantly surprised by how good this turned out.

Really, this shouldn’t have come as a surprise since denoising is a harder problem. The error introduced by noise is random by definition, whereas the error between the bicubic seed and the ground truth has a lot more structure (especially when training on a constrained class of images), which makes it easier to learn.

Here is one more pair of images for comparison:

The U-Net once again does a much better job at reconstructing the details than the Lanczos resampling.

Final thoughts

Working on this project reminded me of when temporal antialiasing was introduced in realtime graphics. Initially, I didn’t like it. There was something slightly offensive to me about reusing pixels from the previous frame. I wanted fresh, brand-new pixels, damn it!

This is of course the wrong way to think about it. From an information theory perspective, you don’t need all that much new information to go from the previous frame to the current one. This is especially true if you’re generating a signal that mostly consists of low frequencies in the temporal domain. The same principle is what makes it possible to encode and transmit movies using a reasonable amount of data.

The reason I’m bringing this up is that super resolution algorithms like NVIDIA DLSS use the previous frame in addition to the current one. They typically also use motion vectors and various g-buffer inputs when available. If my network had access to that data as well, the results might have been even better. This might be an interesting next project for me.

Super size this

Results

Final thoughts

Share this:

Leave a comment Cancel reply