HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Reconstructing a screen of permuted pixels

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
reconstructingscreenpermutedpixels

Problem

Reconstructing a screen of permuted pixels

Summary

Given a video with the pixel locations randomly permuted (once, for the entire video), can we (efficiently) reconstruct the original picture?

Let:

  • $n=w\times h$ pixels.



  • $m=\text{number of frames}$



Details

  • You are given an LCD screen with $n = w \times h$ pixels, that has been altered so that the pixel locations are permuted, once, in some random manner.



  • Watching a video on this LCD screen would obviously result in quite a garbled picture.



  • The permutation is mapped by some unknown function, we shall call $\text{permute}\left(x_0,y_0\right) \implies (x,y)$



  • Each pixel on the screen has some permuted coordinate $(x,y)$ and some unknown original coordinate $(x_0,y_0)$ (such that $\text{permute}\left(x_0,y_0\right)=(x,y)$).



  • You (and your computer) are given to watch a particular video of an average film (movie) of average length, with $m$ frames, and having the same resolution as the LCD screen; however for simplicity, assume the video will be in grayscale (for example, each pixel is a single color in the range $[0,256)$).



  • You do not know which film is being played.



  • You (and your computer) can rewatch the garbled film as many times as you like.



  • As a frame of reference, you are also informed which pixel is the lowest location in the original non-permuted screen (i.e you are given the values $x$ and $y$, such that $(x_0,y_0) =(0,0)$, and $\text{permute}\left(0,0\right)=(x,y)$).



Goal

  • Algorithm to (efficiently) compute the mapping of the pixels to their original locations.



  • Can this be done in $o\left(n^2\right)$ space (i.e in better than $\mathcal O\left(n^2\right)$ space?)



  • Can this be done in $o\left(n^2f(m)\right)$ time?



  • I'm not sure how to ask this question correctly, but I am mostly wondering about reducing the $n^2$ factor.



  • Is this a known problem, or similar to any known problems?



PS. What about same question but for the case of binary images for each frame instead of

Solution

One approach is to exploit fact that if two pixels are adjacent, then their intensities will be highly correlated: the intensity of the first pixel (as a time series) will be highly correlated to the intensity of the second pixel (as a time series).

In other words, let $I(p,t)$ denote the intensity of pixel $p$ at time $t$. Then you can compute a correlation coefficient between $p,q$ as

$$C(p,q) = \sum_t I(p,t) I(q,t).$$

(Detail: you probably actually want to use the normalized correlation coefficient, as explained at bottom.)

For each pair of pixels $p,q$, compute $C(p,q)$, the correlation between these two pixels.

Next, for each pixel $p$, you could identify the four other pixels $p_1,\dots,p_4$ that have the highest correlation to $p$ (i.e., that maximize $C(p,p_i)$). A reasonable heuristic would be to assume that those are the four neighbors of $p$. Do this for every pixel.

Now you have a pretty good guess at which pairs of pixels are adjacent. At this point, you several options. One approach would be to pick a random pixel, then start trying to expand outward from there: find its four neighbors, then find their neighbors, and so on.

At each step you will need to decide how to orient the neighbors. Here a plausible heuristic might be: the distance between two pixels $p,q$ in the original image is likely to be related to the correlation $C(p,q)$: larger $C(p,q)$ is associated with pixels that are further apart. Thus, given a pixel $p$ and its four neighbors $p_1,p_2,p_3,p_4$, I expect you can tell which pairs of those neighbors are on opposite sides of $p$ vs which pairs are kitty-corner to each other (as opposite pixels are at distance 2 apart while kitty-corner pixels are at distance 2). You probably won't be able to tell which is up and which is right, so just make an arbitrary guess; later you can have a human tell you whether the picture appears to be reflected.

So now suppose that you've mapped some pixels to their original locations, and you want to add one more to the set. You can get at most three candidates for what it might be, then use the distance heuristic to form a guess about which of the three this is.

Continue this process until you've mapped out a guess at how to reconstruct a large patch of the original image. Now have a human look at the unpermuted video (unscrambled using your guess at the mapping for those pixels). If you're lucky, most of that will be correct. A human should be able to tell you whether this looks correct, and identify the region that appears to be correct.

So, you have a process that given a starting pixel, finds a region surrounding it that appears to have been correctly unscrambled. Repeat from multiple starting pixels until you have a large collection of such regions. Now fitting those regions together is basically just a jigsaw puzzle, so should be solvable.

I would expect that the longer the image is, the more effective this will be -- the fewer mistakes this approach will make, and the larger these regions will be and the easier the jigsaw reconstruction process will be.

But the only way to find out for sure is to test this. Ultimately, this is probably not a question that can be answered by proving a theorem, as the effectiveness of the technique depends on empirical facts about the properties of typical videos.

Footnote: In computer vision applications, it is usually more effective to use the normalized correlation coefficient (NCC). So, I recommend that you use NCC for this. The NCC is essentially

$$C(p,q) = \sum_t {(I(p,t) - \mu(p)) (I(q,t) - \mu(q) \over \sigma(p) \sigma(q)},$$

where

$$\begin{align*}
\mu(p) &= {1 \over m} \sum_t I(p,t)\\
\sigma(p) &= {1 \over m-1} \sum_t (I(p,t) - \mu(p))^2
\end{align*}$$

What's the running time of this approach? The most expensive part is to compute the correlation between every pair of pixels. Naively, this takes $O(n^2 m)$ time.

However, if computation time is at a premium, it might be possible to speed that up using locality-sensitive hashing or K-D trees to find pairs of pixels for which $C(p,q)$ is larger than some threshold.

If you're looking for research literature on this problem, I would look into video descrambling. Some video scrambling methods (e.g., for copy protection or cheap video "encryption") permute the pixels, basically along the lines you outlined. I think there might be work in the research literature on descrambling such videos; a literature search might turn up something useful.

Context

StackExchange Computer Science Q#55896, answer score: 3

Revisions (0)

No revisions yet.