libpng should be theoretically capable of using one and a half cores per image decode. But unfortunately one of the compression techniques in the spec introduces a hazard between each line of the image, because the pixels at x-1 and y-1 are used to predict the value of next pixel. If that algorithm is used it would take some very interesting math to decode multiple lines in parallel.