Project 4: Image Warping and Mosaicing

By Dmytro Krukovskyi

Recovering Homographies

To warp images we need to find 3x3 matrix H with 8 degrees of freedom. For each corresponding pair of points (x,y) and (x', y'), we can represent them in homogeneous coordinates and get the following:

To recover the homography, we need at least four corresponding points between the two images. Each correspondence provides two equations, and with four correspondences, we get a system of 8 equations—sufficient to solve for the 8 unknowns in H. However, because such a system is sensitive to noise, using more than four correspondences is recommended. This creates an overdetermined system that can be solved with least-squares.

Warping the images

With the homography matrix H recovered, we can now warp an image to align it with a reference image. This is achieved using inverse warping, which ensures that every pixel in the output image receives a valid value by mapping its position back to the input image.

Image rectification

Image rectification is a process that uses a homography to adjust an image such that a selected object appears rectangular, correcting for perspective distortion. This is particularly useful for scenes containing paintings, posters, or tiles, where known geometric shapes can serve as reference objects.

To perform rectification, we use a single input image and define correspondences between points in the image and a desired rectangular shape. For example, if the photo contains a poster, you can manually select its four corners using a mouse-click interface and store their coordinates. Next, you define the corresponding points as the corners of a square

Blend the images into a mosaic

To create the panorama from multiple images, the following procedure was followed:

Create a Large Output Image and Predict the Mosaic Size

The size of the output mosaic was determined by transforming the corners of both input images using the computed homography. This allowed to predict the bounding box that would contain both images. A translation matrix was applied to ensure both images fit into the shared coordinate space of the output canvas.

Warp the Images Using Inverse Warping

This approach ensures that each pixel in the output canvas receives a value by mapping back to the original images, avoiding holes and gaps. The images were warped using: warpImage() function from the previous part.

Blend images together

To hide the seam i applied blending. I used laplacian stack that I implemented in project two. I got the best results when using 3 levels laplacian stack.

Even though my approach is not ideal as there are some artifacts at thebottom of the image. In general I am satisfied with the result. Here are some more examples.

Part B: Feature Matching for Autostitching

In part A I manually selected corresponding points. In this part I implemented automatic detection of such points by replicating “Multi-Image Matching using Multi-Scale Oriented Patches” by Brown et al paper. The main features that are implemented are the following.

Detecting corner features in an image using Harris detector
Extracting a Feature Descriptor for each feature point
Use a robust method (RANSAC) to compute a homography
Produce mosaic

Detecting Corner features

I used Harris detector. The Harris feature detector is an algorithm used in computer vision to identify corners or interest points in an image. It calculates a corner response function based on the local variations of image intensity, identifying points where there is significant change in multiple directions, making them ideal for tracking or matching between images. I limited number of points to 10 thousand.

Adaptive Non-Maxmimal Suppression

Using Adaptive Non-Maximal Suppression (ANMS) allows for reducing the number of detected corners while ensuring that the points are evenly distributed across the image. This approach is particularly useful in maintaining spatial coverage. After experimentation, I decided to retain 1000 points as a final selection, striking a balance between computational efficiency and sufficient feature density. Below is the formula applied, as derived from the original paper, to achieve this selection and spacing of feature points.

Feature Descriptors

To effectively match feature points across images, it's essential to capture more information than a single pixel. Therefore, feature descriptors are used to encapsulate the local image characteristics around each keypoint, ensuring consistency across different views of the same scene. In this approach, a 40x40 pixel patch is extracted around each detected feature point and then downsampled to an 8x8 grayscale patch. These patches are normalized and demeaned to enhance robustness against lighting variations.

Feature matching

After extracting feature descriptors, I matched keypoints between image pairs using the Sum of Squared Differences (SSD) as the similarity measure. For each descriptor in the first image, SSD was calculated against all descriptors in the second image to find the closest matches. To ensure reliability, Lowe's ratio test was applied: the ratio of the smallest SSD to the second smallest SSD was computed, and only matches with a ratio below a set threshold (0.65) were retained. This approach filters out ambiguous matches, retaining only those that are distinctly better than the alternatives.

RANSAC

After establishing feature correspondences between images, some matches may be incorrect or outliers, which can adversely affect the accuracy of the homography matrix. To address this, I employed Random Sample Consensus (RANSAC), a robust algorithm that mitigates the influence of these outliers. The 4-point RANSAC algorithm operates as follows:

Random Sampling: Select four random pairs of matched keypoints.
Homography Computation: Calculate the homography matrix H
Inlier Detection: Determine which keypoint pairs fit the homography well by measuring if the distance between the transformed points and their counterparts is below a threshold ϵ. I set ϵ = 1
Repeat the above steps for some number of iterations. I ran It for 6000 iterations

After I get homography matrix I apply warping from the part A and get the final image as a result.

Final Remarks

The project was very interesting especially the B part. I think I should have spent more time on the blending images when stiching panoramas, because on some pictures seams at the bottom are visible. Also, adding some kind of color normalization will really make a difference, then the final image will look like panorama that the phone generates.