Digitized Glass Plate Alignment

Project Overview

The goal of this project is to take the digitized Prokudin-Gorskii glass plate images and, using image processing techniques, automatically produce a color image with as few visual artifacts as possible. In order to do this, you will need to extract the three color channel images, place them on top of each other, and align them so that they form a single RGB color image.

Aligning small images

First thing I tried was to to simply align R, G, B channels on top of each other.

Then, I tried to align the G and R channels to the B channel. I used an exhaustive search over a displacement window of [-15, 15] pixels in both x and y directions. For each possible (x, y) displacement: I shifted the G and R channels by the current displacement values. I calculated the alignment score using SSD metric, as it was faster than NCC and produced the same results.

Understanding SSD (Sum of Squared Differences) Metric

The SSD metric is a way to measure how similar two images or parts of images are. Here’s how it works:

You compare the corresponding pixels from two images.
For each pair of pixels, you subtract one pixel's value from the other to get the difference.
You square that difference to ensure it’s positive, regardless of the direction of the difference.
You repeat this process for every pixel in the images, then sum up all these squared differences. The result is the Sum of Squared Differences (SSD).

A lower SSD value means the images are more similar, while a higher SSD value means they are less similar. In image alignment, SSD helps us determine the best alignment between two images.

Aligning Large Images

The same approach would not work for larger images, as the exhaustive search becomes too slow due to the increased number of pixels. To handle large images efficiently, I implemented a faster method using an image pyramid.

An image pyramid represents the image at multiple scales, usually by downscaling the image by a factor of two at each level. The alignment process begins with the smallest (coarsest) version of the image, where an exhaustive search over a smaller displacement window can be done quickly. Once the best alignment is found at this coarse level, the result is refined by applying the same method at progressively higher resolutions (finer levels).

This approach significantly reduces the computational cost of alignment for large images. By starting at a lower resolution, the search space is much smaller, and the displacement estimates are updated as the pyramid is traversed back to the original resolution. This way, the algorithm still achieves high accuracy without the need for an exhaustive search at full resolution.

For the large glass plate scans, this pyramid-based method allowed me to efficiently align the G and R channels to the B channel while maintaining the image's high resolution.

My pyramid has 4 layers and each next layer was two times smaller than the previous one. For the last layer I used a window of size [-15,15] and for every layer above i decreased that quantity by 4. With these parameters each image took less than a minute to compute the best displacement.

Improvements to the base algorithm

The pyramid approach works well for all images, except emir.tif. I decided to add the Canny edge detection algorithm on top of the pyramid approach by using the skimage library. Basically, I applied the Canny edge detection algorithm at every step of creating a new pyramid layer. This greatly improved the final result in the case of Emir.

What is Edge Detection?

Edge detection is a technique used to find the boundaries or edges in an image. In simple terms, edges are where there’s a significant change in brightness or color. These edges often correspond to important features in the image, such as the outline of an object or a sharp change in texture.

The Canny edge detection algorithm works by highlighting these edges, making it easier to identify key features for tasks like alignment or object recognition. By applying edge detection, we can focus on the most important parts of an image, ignoring unnecessary details like noise or small color variations.

Glass Plate Image - Emir — Misaligned Emir

Glass Plate Image - Emir's Edges — Emir's edges

Histogram Equalization

To enhance colors of the image I decided to use Histogram Equalization. Histogram Equalization is a technique used to improve the contrast of an image by redistributing its pixel intensity values. This is achieved by stretching the intensity range to ensure that the most frequent pixel values are spread across the entire available range. As a result, details that were previously hidden in darker or lighter areas of the image become more visible. The process works by first calculating the image's histogram, then normalizing its cumulative distribution function (CDF), and finally mapping the original intensities to new values based on this distribution.

Gray World White Balancing

In addition to enhancing colors with histogram equalization, I applied the Gray World Assumption for white balancing. White balancing is a technique used to adjust the colors in an image so that they look more natural, especially under different lighting conditions.

The Gray World Assumption works by assuming that the average color of an image should be a neutral gray. Based on this assumption, the algorithm calculates the average intensity of the red, green, and blue channels and adjusts them so that the overall color balance shifts toward gray. This helps remove color casts, making the image look more balanced and natural in terms of lighting.

My first few attempts of creating this balancing failed, because I didn't crop the image and I took into account a lot of unnecessary colors on the edges, but after I applied 10% crop to each side the balancing looks normal.

sc-ng — Sculpture before applying white balancing

Final remarks and results

I also tried to implement an auto cropping feature based on the variance of pixels, but in the end I decided to leave the edges in the final result, so the photos have this old feeling. I mean they are more than 100 hundred years old! Also, I played with different cropping multipliers and decided to set it to zero, as I haven't seen any improvments from using cropping.

g:(38, 16) r:(88, 22)