Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Image unshredding using a TSP solver (github.com/robinhouston)
205 points by robinhouston on Oct 9, 2016 | hide | past | favorite | 54 comments


A bit tangentially, this is also a great display of just how good readily available approximate solvers have gotten for a wide range of combinatorial optimization problems (like TSP).

The LKH solver used here has rather impressive performance. From http://webhotel4.ruc.dk/~keld/research/LKH/: "LKH has produced optimal solutions for all solved problems we have been able to obtain," and the studies linked there show that it really does find optimal solutions "with an impressively high frequency."

While the point here is to use an off-the-shelf solver, it can be nice to have visibility into what's actually happening:

- This is a reasonable example for explaining some of the local search TSP heuristics. For instance a "2-opt" move corresponds to picking a contiguous range of columns and flipping them.

- The previous HN post used simulated annealing, which would not be an outright terrible approach to TSP itself -- were it not for the better Lin-Kernighan-based approaches (like LKH).

- If we want, we can tell LKH to start from Sangaline's "nearest-neighbor" approach ("INITIAL_TOUR_ALGORITHM = NEAREST-NEIGHBOR"). This does not make a difference here though.


Not so long ago, we’d say “That’s an NP-complete problem” with the implication that it was hopeless to expect an efficient solution. Now (to exaggerate a little) it’s more like a sign that there probably is an efficient solution using some powerful solver.

LKH is a lovely piece of work. I used it a couple of years ago to find a counterexample to an unimportant but fairly long-standing conjecture in combinatorics[0], and the list of scientific applications[1] is impressively diverse.

The best SMT solvers are also hugely impressive. Z3 has an excellent interactive tutorial and online solver.[2]

[0] https://arxiv.org/abs/1408.5108 [1] http://webhotel4.ruc.dk/~keld/research/LKH/ScientificApplica... [2] http://rise4fun.com/z3/tutorial


Z3 (and other constraint solvers) are very impressive. The most interesting application I've seen is in binary exploitation and reverse-engineering through symbolic execution.

Angr is a very popular framework which allows for symbolic execution of binaries using Z3 as a backend. It's made a whole class of common CTF challenges almost trivial:

https://docs.angr.io/docs/examples.html


Absolutely!

Not to mention the often impressive performance of "general" MIP solvers. It is only a shame that the best ones there are commercial (Gurobi followed by Cplex). That said, Cbc is lovely in a wide range of cases, and is open-source.


I always thought "NP-complete/NP-hard" metaphor only meaningful in theoretical work. In practice, it does not make much sense in guiding what/how the real world problem is to be solved.


I don't know. I find it useful in practice. Apart from anything else it means there's (probably) no point looking for an exact polynomial-time algorithm, which can save a lot of wasted effort. But it also gives useful clues about what sort of algorithmic approaches might be useful, and allows you to relate your problem to a huge amount of existing research.

It's really basic taxonomy: what sort of problem are we dealing with? If you don't know that, how could you even begin to design a reasonable algorithm for it?


Maybe you are heavily on huge algorithmic work. In practice, most algorithms are polynomial-time, and we are looking for under-polynomial-time alternative - log(n). For exponential-time, we are just looking for an approximate alternative.


How is it determined that optimal solutions have been produced? Does an exact solver have to find the same solution, or is there a more analytic / less computationally-intensive method?


I'm really surprised by double-shuffling can actually be solved. https://github.com/robinhouston/image-unshredding/#double-sh... It looks very unintuitive.


Think about it like this: for a 100x100 image, there are 100!*100! ways to double-shuffle. But there are 10000! ways to shuffle the pixel. So a lot of structure remains in the double-shuffled image.


In case your point is not obvious to someone, 10000! is FAR larger than 100!*100! (by something like 350,000 orders of magnitude), so the possibility space is vastly bigger if you shuffle the pixels individually than if you shuffle columns first, then rows.


if you think about it, the row-comparing algo basically boils down to a sum of differences. As summation is independent of the order of its terms, it can be done in arbitrary order, and thus the (orthogonal) shuffling makes no difference.


I wouldn't be surprised if a complete shuffling could be solved — i.e., every pixel is given a new random location. (Source: I do research involving stuff like this.)


How could that be possible? Wouldn't that literally throw away everything but the distribution of colors in the image?


An error function against neighboring pixels alone probably would not be enough to reconstruct the image (there are information theoretic limitations on how much noise can exist for a certain model before ground truth recovery becomes impossible); however, depending on the type of image, additional information could be gleaned from images that fall into the same category.

For instance, if I assign each pixel a random color and call that my "image", and then scramble all of the pixels, it is easy to see that recovery of the original image is mathematically impossible. However, if I use a photograph of a person standing in front of a car, information gleaned from similar types of photos could be sufficient to allow complete recovery given only the exact distribution of pixel colors. I don't know whether this is true, but I wouldn't be surprised if it is. (Actually, this seems like a fun hobby project. I may give it a shot.)


I can see making a heuristic assessment that says "based on the blues and greens there is probably a large expanse of sky and some trees in the original image" but getting from there to where the trees are and putting the leaves back on the trees in the right places, I can't see that being even mathematically possible. It's a million-piece jigsaw puzzle where each piece is square and monochromatic.

If enhancing a low-resolution image to regain lost information is basically impossible, getting back a picture from a color histogram would surely be much more impossible.


You could constrain it so the pixels are only shuffled within a set distance of their origin. Still sounds very hard to solve, but would leave a lot more information.


Pretty much, but if you have some good knowledge of what the distributions of neighboring pixels must look like (e.g. the l1 norm error used here wouldn't be enough, I believe) then I think you'd be surprised at the outcome.


Isn't the definition of insanity something about doing the same thing and expecting different results.


Funny, I thought more people would have gotten what I was saying here.


You're forgetting HN is serious discussion for serious people or something like that. Humour in a substantive post seems fine, but usually any humour on its own doesn't fare well here.


No.


The 2D shuffling result is a bit surprising at first glance, but less so when you think about it - row comparison doesn't care about the order of the pixels in each row as long as the order is the same in both, and the shuffling mechanism guarantees that.


the important bit is you have to unscramble in reverse order of the scrambling. Unscrambling the columns and then unscrambling the rows would lead to smooth gibberish.


Interestingly enough, that isn’t true. The operations of row-shuffling and column-shuffling commute, so you can do them in either order.

You can try it yourself, if you’ve cloned the repo. Here are the commands to reconstruct first by columns and then by rows:

  git checkout double-shuffling
  make images/double_shuffled/blue-hour-paris.png bin/compute_scores
  mkdir tmp
  
  bin/compute_scores --cols images/double_shuffled/blue-hour-paris.png > tmp/cols.tsp
  bin/lkh.sh tmp/cols.tsp tmp/cols.tour
  bin/reconstruct_image.py --cols tmp/cols.tour images/double_shuffled/blue-hour-paris.png > tmp/cols-unshuffled.png
  
  bin/compute_scores --rows tmp/cols-unshuffled.png > tmp/rows.tsp
  bin/lkh.sh tmp/rows.tsp tmp/rows.tour
  bin/reconstruct_image.py --rows tmp/rows.tour tmp/cols-unshuffled.png > tmp/reverse-unshuffled.png


No. Column scrambling doesn't affect row ordering and row scrambling doesn't affect column ordering. They are literally orthogonal. You can't tell, from a scrambled matrix, whether the rows or the columns were scrambled first.

Here's a more explicit breakdown:

    (0,0) (1,0) (2,0)
    (0,1) (1,1) (2,1)
    (0,2) (1,2) (2,2)
Column scrambling: swap 1&2:

    (0,0) (2,0) (1,0)
    (0,1) (2,1) (1,1)
    (0,2) (2,2) (1,2)
Now row scrambling: swap 0&1:

    (0,1) (2,1) (1,1)
    (0,0) (2,0) (1,0)
    (0,2) (2,2) (1,2)
Notice how the x coordinates still all match along the vertical, and y coordinates still all match along the horizontal.

Let's try it in the other order, row first:

    (0,1) (1,1) (2,1)
    (0,0) (1,0) (2,0)
    (0,2) (1,2) (2,2)
And columns:

    (0,1) (2,1) (1,1)
    (0,0) (2,0) (1,0)
    (0,2) (2,2) (1,2)
It's the same result irrespective of the order.


Another way to think of it is that permuting rows corresponds to multiplying a matrix on the left by a permutation matrix, and permuting columns, on the right. Since matrix multiplication is associative, they commute!


Yet another way to think of it is that a matrix is an encoding of a linear (or affine) function, matrix multiplication is composition of such functions, and since composition is associative, they commute. :)


Thanks! my mental model was incorrect.


Shuffling lines was the scrambling algorithm used in nagravision a while back... https://www.youtube.com/watch?v=gFAinn6sddQ there was a software that could descramble the video using a similar technique


Now I'd like to see this run on a more realisticly shredded image. A real paper shredder creates strips that are more than one pixel thick are not straight on (I.e. the pixels don't necessarily align with the cuts) and possibly have cutting defects such as ragged edges or nicks.


Seems like with a high resolution enough image, some of these effects may actually make the problem easier.


Perhaps, but you would have to look at more than just neighboring pixel color to take advantage.


The big difference is that this actually preserves a lot of structure, due to being oriented to columns (and/or rows): a real shredder produces chads without any of that, and would presumably be a much harder problem to solve.


Sounds like the DARPA Shredder Challenge from 2011: https://en.wikipedia.org/wiki/DARPA_Shredder_Challenge_2011


Reminds me nagravision decoders (if i recall correctly), one of the first pay tv protection algorithms

Edit: https://en.m.wikipedia.org/wiki/Nagravision


Could be used to compress images? Perhaps instead of shuffling the columns and rows randomly they could be ordered in ways that are better suited for compression.


That's what the Burrows–Wheeler transform does for text. A similar approach could work for images.


It's a good point. PNG uses DEFLATE internally for the compression, it would be interesting to see what would happen if you used bzip2 instead, which uses the Burrows-Wheeler transform.


To me the whole experiment suggests otherwise:

it shows that naturally occurring images are organized by having similar rows and columns close to each other, which is - by default - good for most compression algorithms that prefer regularities/similarities to be as local as possible.

edit: but it can be good for other cases, where there are no good-enough default orders. For example "customer - product bought" matrix (binary matrix, customer rows, product columns: X customer bought Y product = 1, otherwise =0 ).

Here the order of rows/columns are not predefined, and ordering by similar customers and products the compression of the matrix could be improved. Finding these similarities also can be a good starting point for some recommendation/collaborative filtering engine.


IIRC the HDF5 data storage format compresses per-bit (e.g. all LSB's together) instead of per pixel when compressing image data, and tests on our data sets have shown tvis significantly improves compared to ZIP compression for instance.


Roughly how long does this algorithm take to run?


On my laptop (real times for a single run):

1.6 seconds to make the TSPLIB2 file representing the dissimilarity graph. (time make tsp/instances/blue-hour-paris.tsp)

0.3 seconds to run the solver and produce the optimal tour. (time make tsp/tours/blue-hour-paris.tour)

0.4 seconds to reconstruct the image from the shuffled version using the tour. (time make images/reconstructed/blue-hour-paris.png)


I wonder how well this would work with some image pre-processing. Wouldn't something like supersampling improve the speed per-pixel? Of course, it would require more pixels to be sorted, so I doubt it's any more optimal.


This is super interesting; how does it compare to other sampling algorithms? While the problem is indeed reducible; it also has a lot more structure than TSP, especially if we note that natural images are forced to come from some nice, underlying manifold which is probably very well-behaved.

I'd be inclined to believe that with some assumptions on the problem itself (e.g. the images satisfy some smoothness assumptions and of course, assumptions on the objective, itself), we can always come up with a PTAS for this particular instance for any accuracy, ε>0.


I guess this could be used to reassemble images corrupted in other ways (other than shuffling).

I have a disk image of jpegs and other data I'd like recovered. It was Windows formatted a few different ways but any blocks that format didn't touch are still there.

There's other data there of course as well, so there would be lot of blocks that aren't part of images so would need to be ignored when reassembling the images.


I'd guess not, actually. In this a lot of the information is there. Just in a different order. Corrupting data is a different problem.

That said, I'm sure I could be surprised by what is possible.


I guess the difference with the simulated annealing approach would be that this approach(and the simple picking-the-most-similar-column approach) assumes you know the particular shredding method in advance, whereas the simulated annealing does not have this requirement. For example, does this still work out for shredding diagonally in some unknown degree?


How good does it work with non-natural images where rows/columns are not similar to each other? White noise for example should prove rather difficult.


Do you know of any image compression technique that would drop information that we expect to be redeemable in a reasonable time using a solver of this sort?


Since the shuffling adds information, it seems unlikely.


Don't you think that for any image there exist a shuffled version that compress better?


I don't see any reason why there necessarily would be. Generally speaking, compression algorithms do better when adjacent pixels have similar colors, which is the unshuffled state.

In cases where there is a version that compresses better, unshuffling might not work. If you had an "pinstripe" image consisting of alternating black and white columns of pixels, obviously it could compress better after being "shuffled" with all the similarly colored columns together. But then this algorithm would be unable to get the original image back.


Looks like it's time to go back to burning.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: