Hacker News new | past | comments | ask | show | jobs | submit login
Image Kernels Explained Visually (setosa.io)
188 points by apetresc on Jan 29, 2015 | hide | past | favorite | 29 comments



I first learnt about Kernels and Convolution a few months ago during a Computer Vision module at University - was really insightful. The exact methods used to perform Gaussian blurring/edge detection etc was something I hadn't given much though to before.

A cool fact about the Gaussian filter is that it's separable - you can convolve in the X direction (using a 1 * n kernel), and then convolve the result again along the Y direction (using a m * 1 kernel) - the final result will be the same as convolving using a single m * n kernel, but can be done in O(N) time rather than O(N^2) (you only have m+n multiplications per pixel rather than m * n per pixel).

Not every filer is separable - it's only possible an n * m filter can be expressed as the product of a 1 * m and a n * 1 matrix.

Another cool fact is that you can perform convolution in as a point-wise multiplication in Fourier space (see http://en.wikipedia.org/wiki/Convolution_theorem).


Two really cool facts about Gaussian filter are: 1) it's the only separable isotropic (ie 'round') kernel, and 2) there's a Deriche approximation where the number of operations per pixel doesn't depend on filter size: https://espace.library.uq.edu.au/view/UQ:10982/IIR.pdf


A few other useful resources to play with convolutional kernels:

* The linked demo really focuses only on layer 1. See Layers >=2 plotted here: http://arxiv.org/pdf/1311.2901v3.pdf

* DeepViz is a nice tool: https://github.com/bruckner/deepViz

* Another tool one can use to play with convolutional kernels is ShaderToy ( https://www.shadertoy.com/ ). Here's a Gaussian blur: https://www.shadertoy.com/view/XdfGDH

* If you like playing with kernels in shaders, see also Brad Larson's GPUImage: ( https://github.com/BradLarson/GPUImage ) -- the demo app has a bunch of standard kernels.

Just for fun, convolutional shader porn: https://www.shadertoy.com/view/4d2Xzc


I've always been a fan of the non-linear kernel Kuwahara filter

https://www.shadertoy.com/view/lls3WM

http://en.wikipedia.org/wiki/Kuwahara_filter


It's the same as the bilateral filter, isn't it?


I made a custom one that I like:

   -.5 | 1.5 | -.5
   1.5 | -3  | 1.5
   -.5 | 1.5 | -.5
It produces a cool digital edge blurring effect. What's that called?


There's a few examples here: http://docs.gimp.org/en/plug-in-convmatrix.html

Try the "edge detection" example with -3 instead of -4, and it looks very similar to yours.


For the custom, it would be great if there was an option to keep the matrix normalized.


and if interested, error-diffusion dithering kernels:

http://www.tannerhelland.com/4660/dithering-eleven-algorithm...


> They're also used in machine learning for 'feature extraction'... In this context the process is referred to more generally as "convolution"

It's referred to as "convolution" in the image processing community too.


Not surprising, because the kernel is convoluted on the pixels

http://en.wikipedia.org/wiki/Convolution

Now, to what's happening, you're basically applying a FIR filter to each pixel, so that each one depends also on the frequency information of adjacent pixels (in 2 dimensions)

If someone wants to know more: http://en.wikipedia.org/wiki/Finite_impulse_response http://en.wikipedia.org/wiki/Z-transform


Minor nitpicks- "Convolved" with the pixels. And the FIR filter doesn't depend on the frequency information in the adjacent pixels, but rather the intensity of the pixels. A short FIR filter must have large frequency support, so the filtering depends on the frequency information given by all pixels.


That's a much more profound explanation than the given. You can actually come up with values yourself then, and it reveals why cases like "edge detection" and "blur" work so nicely (edge detection approximates differentiation; blur acts as a low pass filter;...).


SAR Image Processor was my first exposure to convolution. http://www.general-cathexis.com/


For interest's sake, note that the blur kernel used here is an approximation of the Gaussian [1]. Also, the vImage documentation includes a brief discussion on where the values in these kernels came from [2]

[1] http://en.wikipedia.org/wiki/Gaussian_blur

[2] https://developer.apple.com/library/ios/documentation/Perfor...


The world needs more visual hands-on explanations :) Very nice work


Slightly offtopic but I really like the effect with the side to side images. The html tags are <image-as-matrix></image-as-matrix> and <kernel-matrix></kernel-matrix>.

Does anyone know which lib (if any) he's using or how it's done? Read the source, couldn't figure it out.


Angular as the overall app framework, and D3.js for the canvas fun. Take a look at his script.js, it's pretty straightforward.


Can it be done without Angular.js?


AngularJS is just being used to make those custom HTML tags (called "directives" in Angular-speak). It likely has nothing to do with the actual functionality.


Yes


Interesting to try a directionally-biased blur filter, which produces an out-of-focus effect but only in that dimension. For instance, put 0.5 in the left and right cells, or the top and bottom cells, with 0 everywhere else.


This is my favorite Explained Visually. Don't forget to try the live video!


Thanks! I didn't even notice the live video option. How is that done I wonder? Is it all happening client side in Javascript?


Yep! It was tricky to make it performant but I mostly just used chromes performance profiling tool to pinpoint optimization compiler bailouts. A big one was just casting to ints with |0 inside the kernel function.


When you calculate the next pixel, do you use the updated value of the previous pixel or do you use the original?


Aside from making conceptual sense, using the original value has the nice property that there's no dependence between the calculation of separate pixels, so the filtered image can be computed fully in parallel. GPUs are very good at this, which obviously makes them useful for graphics rendering, but also at other tasks that involve the same mathematical operation, such as deep learning (the first few stages of a convolution deep neural net are just the application of specially tuned convolution kernels).


The original value. You're computing a new image as a function of the input, not modifying the input in place.


Really awesome stuff! I think there's a very minor bug in which missing pixels are treated as black, which is what adds a black border around the output image in the second example.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: