State-of-the-art deblurring techniques are computationally intensive and slow, preventing user interactivity that would enable them to explore the parameter space until they arrive at a satisfying output. We aim to build a GPU-enabled cross-platform application that supports interactive deblurring by developing a UI that lets users:
  • Interactively change kernel parameters, seeing the output of their changes within a 4 second turnaround time.
  • Interactively change layer weights to reduce ringing artifacts
  • Select a rectangular region in a deblurred image for the program to automatically calculate suggested layer weights that significantly reduce ringing artifacts
The software is designed to work on machines without GPUs, but the computational intensity in such cases prevents a truly interactive scenario.

Related Work

Building on previous work by Shan et al. (SIGGRAPH 2008), we adapt his non-blind deconvolution algorithm to take advantage of GPUs by porting the code to use the NVIDIA CUDA architecture. The basic algorithm is solving an optimization problem which involves L1 and L2 energy definitions.

We aim to improve the usefulness of the deblur algorithm by accelerating it on the GPU, which enables people to interactively tune the parameters that result in a good image.

Our Approach

User Interface Design

We built the software using the cross-platform Qt library. We used simple popup windows with sliders and input fields to demonstrate the functionality of the system, but did not do a lot of usability work.

To increase interactivity, we forked threads when doing computationally intensive work so the UI remains responsive.

We expected the UI to be easy to develop, but setting up a development environment, learning Qt and dealing with unexpected UI issues took longer than we anticipated.


Core Functionality

Users can set the parameters for a Gaussian kernel and quickly preview the results of deconvolving their image with the kernel they set.

The deconvolution process unfortunately introduces ringing artifacts to the deblurred image. This popup lets users set layer weights to produce a deblurred image with reduced artifacts. First, they set the smoothness and prepare the image, which computes the deconvolution and image layers. Then the user can drag the weight sliders for each layer and interactively see its effect on their image.

Automatic Weight Learning

After discovering that weight tuning was particularly difficult, we implemented a novel algorithm that automatically selects weights based on an image region the user defines as smooth. After preparing the image, users can click on two points in the image to define a rectangle region that contains ringing artifacts, but ought to be smooth. They can then click "Patch-based Automatic Weight Computing" to have the layer weights automatically set. The process is fast, so users can experiment with different patches to see which one yields the best result.

Porting to CUDA

The most significant speed-up came from exporting FFT processing to the GPU. This was done using standard library calls.

However, the non-blind deconvolution algorithm had to be ported as a whole because the small bandwidth between the GPU and main memory would have undone any performance gains. The key issues of programming with GPU are handling the limited memory and taking advantage of the parallel computing. In order to reduce of the size of the memory footprint, we divide the input image into smaller patches and process them separately. When implementing the algorithm, we reduced the data dependence in order to process them in a parallel way.

The following code snippet is illustrative of how we distributed the computation to different GPU cores.

__global__ void eleAdd_k(Complex* odata, Complex* idata1, float s1, Complex* idata2, float s2, int w, int h) // computes odata = idata1 * s1 + idata2 * s2 (adds two arrays) { const int x = IMUL(blockDim.x, blockIdx.x) + threadIdx.x; const int y = IMUL(blockDim.y, blockIdx.y) + threadIdx.y; const int idx = IMUL(y, w) + x; if(x>>(odata, idata1,s1, idata2, s2, w, h); //this starts a new thread in the GPU. }

Ringing Layer Extraction

In this algorithm, we applied a novel ringing layer extraction algorithm. First, we compute the deconvolution result at different scales. The finest scale result turns out to be ringing contaminated while the results on coarser levels have less ringing. Therefore, we compute the difference between the results from the coarse layer and the finest layer, and use bilateral filtering to filter out the image structures. When we obtain all the ringing layers, we apply signal orthogonalization to eliminate their linear dependence. These layers are then blended with various weights.

Automatic Weight Learning

After discovering that interactive weight tuning was difficult because small changes in either the positive or negative direction could cause significant changes in the output image, we implemented a novel algorithm that takes an image patch with artifacts that ought to be smooth and solves a linear system of equations to attain the layer weights that would transform it into a smooth image.

Our goal is to subtract the ringing layers in different weights to make the final image visually pleasing. After the user specifies a smooth region, what we need to do is compute a set of weighting parameters

Such that the new blended image region L' is as smooth as possible

If L' is perfectly smooth, the gradient should be 0 everywhere, so we have

By solving this over-determined linear system in a least square sense, we get attain the appropriate weighting parameters.

What We Accomplished

Our primary contribution was GPU-acceleration of the Non-Blind Deconvolution algorithm and the creation of a User Interface for interactive deblurring. Although we cannot empirically quantify usability improvements because we did not do a user study, we did subjectively find the user interface a much better way to deblur images than the original command line program.

Multithreading computationally intensive processes made the UI responsive so users could perform other tasks while waiting, and GPU-acceleration was the key factor that enabled interactive deblurring. Informal tests of non-blind deconvolution showed speed increases of at least 20x on a machine with an NVidia GeForce 8800 (e.g. from 45 seconds on a CPU to 2 seconds on the GPU).

The following figures show some example results:

Original Blurred Image

Deblurred image without Ringing Reduction

Deblurred image with Ringing Reduction and Automatic Weight Learning

Future Work

We intend to release a public version of this software after we have implemented GPU-accelerated Blind Deconvolution and solved some interface issues. Enabling better user interactivity should result in more reliable kernel estimation, which ultimately results in better deblurred images.