patternMinor
Register images containing objects with varying distance to the cameras
Viewed 0 times
containingtheobjectswithvaryingdistanceregisterimagescameras
Problem
I'm learning about using image registration to align two images, e.g., two photographs of the same scene taken from a slightly different location. What factors influence how well we can align the images? Does the distance to the objects in the image influence its effectiveness of alignment, and if so how? What about the amount of variation in the distances to the objects in the images: how does that influence the effectiveness of the alignment?
Background: I came across the notion of "global image registration" when reading the paper
"Mutual information based registration of multimodal stereo videos for
person tracking", by Stephen J Krotosky and Mohan M Trivedi, in Computer Vision and Image Understanding 2007.
I believe "global image registration" means to use one transformation function for the registration. The authors state that global image registration is accurate when all objects of interest are in the same plane. I am trying to figure out exactly what they mean.
At first, I assumed that the authors mean objects across different
frames (then needed to have a similar distance to the camera across
the frames), where the previously calculated registration is
supposed to work for each frame.
Then I thought that they maybe mean the registration of a single
image pair and that objects of interest must be in similar distance
to the camera.
To check this, I registered two pairs of image. The first pair has a lot of depth, the second pair does not, i.e., the first pair contains objects of varying distances to the cameras. It can be seen that the latter is much better registered than the former. This suggests that my second statement is correct: an image registration based on one transformation function is not able to register an image well if there is a lot of variation in the distances to the objects in the image. Is this correct?
First image pair:
First image pair registered:
Second image pair:
Second image pair registered:
Background: I came across the notion of "global image registration" when reading the paper
"Mutual information based registration of multimodal stereo videos for
person tracking", by Stephen J Krotosky and Mohan M Trivedi, in Computer Vision and Image Understanding 2007.
I believe "global image registration" means to use one transformation function for the registration. The authors state that global image registration is accurate when all objects of interest are in the same plane. I am trying to figure out exactly what they mean.
At first, I assumed that the authors mean objects across different
frames (then needed to have a similar distance to the camera across
the frames), where the previously calculated registration is
supposed to work for each frame.
Then I thought that they maybe mean the registration of a single
image pair and that objects of interest must be in similar distance
to the camera.
To check this, I registered two pairs of image. The first pair has a lot of depth, the second pair does not, i.e., the first pair contains objects of varying distances to the cameras. It can be seen that the latter is much better registered than the former. This suggests that my second statement is correct: an image registration based on one transformation function is not able to register an image well if there is a lot of variation in the distances to the objects in the image. Is this correct?
First image pair:
First image pair registered:
Second image pair:
Second image pair registered:
Solution
Yes, if there is a lot of variability in the distance to the objects in the scene, then global registration will work poorly.
Let me back and give some background on how global image registration works. It tries to align the two images. In its simplest form, the only transformation that is allowed is translation: it tries to find a way to shift all the points in the left image in a constant direction, to make it match the right image as much as possible. In other words, it looks for a translation $T$ such that $T(I_1)$ matches $I_2$ as much as possible, where $I_1,I_2$ are the two images.
With this background, you can see why global image registration won't work well for scenes with objects at multiple distances, when you have two pictures taken from a different point. Imagine taking one picture, then moving left one mile, then taking another picture. Objects that are far away won't shift their location in the image very much, but images that are close will shift to the right a lot in the second image. Global image registration tries to find a constant translation that makes everything match up in the two images, but there is no single translation that will work well. To make the distant objects match up, you'd need a small translation; to make the nearby objects match up, you'd need a large translation; and there is no way to reconcile these differences.
This is the major problem you're seeing. There are other problems as well, due to the fact that the image is only a 2D representation of a 3D scene. For instance, if you take two photographs from two different positions, then not only will the location of the objects in the images shift, but also you'll be viewing them from two different angles. Global image registration can't un-do that, because it can't see around corners. Look at the two images of the statue. Imagine that the statue had a little bit of bird dropping on the left side of the person's nose (our left; the person's right). Then the bird dropping might be visible from the second photograph, which is taken straight-on, but might not be visible from the first photograph, which is taken from an angle. Alignment can't fix this, as it can't magically make up for things that aren't visible in the photograph.
This answer talks about global image registration based purely on translation. In general global image registration typically allows a broader set of transformations: a typical example might be something that allows translation, scaling (zoom in or out), and rotation; or translation, scaling, and skew; or an arbitrary linear/affine transformation; or an arbitrary homography / projective transformation. Unfortunately, none of these really help with the problems outlined above. In general, no rigid transformation will eliminate the problems I've sketched in this answer, so this answer is pretty general. In principle, non-rigid transformations might be more effective at handling variation in depth; I don't have any experience with that form of image registration.
Let me back and give some background on how global image registration works. It tries to align the two images. In its simplest form, the only transformation that is allowed is translation: it tries to find a way to shift all the points in the left image in a constant direction, to make it match the right image as much as possible. In other words, it looks for a translation $T$ such that $T(I_1)$ matches $I_2$ as much as possible, where $I_1,I_2$ are the two images.
With this background, you can see why global image registration won't work well for scenes with objects at multiple distances, when you have two pictures taken from a different point. Imagine taking one picture, then moving left one mile, then taking another picture. Objects that are far away won't shift their location in the image very much, but images that are close will shift to the right a lot in the second image. Global image registration tries to find a constant translation that makes everything match up in the two images, but there is no single translation that will work well. To make the distant objects match up, you'd need a small translation; to make the nearby objects match up, you'd need a large translation; and there is no way to reconcile these differences.
This is the major problem you're seeing. There are other problems as well, due to the fact that the image is only a 2D representation of a 3D scene. For instance, if you take two photographs from two different positions, then not only will the location of the objects in the images shift, but also you'll be viewing them from two different angles. Global image registration can't un-do that, because it can't see around corners. Look at the two images of the statue. Imagine that the statue had a little bit of bird dropping on the left side of the person's nose (our left; the person's right). Then the bird dropping might be visible from the second photograph, which is taken straight-on, but might not be visible from the first photograph, which is taken from an angle. Alignment can't fix this, as it can't magically make up for things that aren't visible in the photograph.
This answer talks about global image registration based purely on translation. In general global image registration typically allows a broader set of transformations: a typical example might be something that allows translation, scaling (zoom in or out), and rotation; or translation, scaling, and skew; or an arbitrary linear/affine transformation; or an arbitrary homography / projective transformation. Unfortunately, none of these really help with the problems outlined above. In general, no rigid transformation will eliminate the problems I've sketched in this answer, so this answer is pretty general. In principle, non-rigid transformations might be more effective at handling variation in depth; I don't have any experience with that form of image registration.
Context
StackExchange Computer Science Q#47234, answer score: 3
Revisions (0)
No revisions yet.