Project Banner

CS 180 Project 1: Images of Russia

By Vishal Bansal

Context

Sergei Mikhailovich Prokudin-Gorskii believed color photography would be the future. To pursue this dream, Prokudin-Gorskii won the Russian Tzar's special permission to travel across the vast Russian Empire and take color photographs of unique people, places, and things. Prokudin-Gorskii recorded three different exposures of every scene he saw onto a glass plate using a red, green, and blue filter. He then planned to stack the three images onto each other to create a color image. The goal of this project is to create a color image by aligning Prokudin-Gorskii's three color glass plate photos on top of each other in a (r, g, b) fashion.

Single Scale Alignments

For single scale alignments on the smaller images with a roughly 400 x 400 dimension, I implemented a program which searched over a defined displacement of pixels (default 15) and scored each row and column alignment using a metric. It then picked the row and column alignment which produced the highest similarity in terms of that metric, which was then used to create the final color image. I used blue as a reference, and aligned both the green and red images with the blue image.

I implemented & applied three different metrics to score the row and column alignments: euclidean distance (square root of sum of squared distances), normalized cross correlation (normalized dot product of the two flattened images), and scikit-image's structural similarity library. The structural similarity library performed the best in determining which row and column shifts resulted in the highest quality alignment, as when I experimented with different search ranges and stacking orders and used NCC or SSD, these metrics were not able to determine the correct shifts consistently. Thus, I chose to use structural similarity.

My function to accomplish this was a for loop which looped over [-search_range, search_range] shifts for both rows and columns, resulting in search_range^2 permutations being checked during the alignment function. It is also important to note that all three metric functions mentioned above are bounded by O(n) with n being the number of pixels in the image. These three .jpg files are relatively small in dimensional size, meaning that they ran efficiently just using the vanilla single scale alignment approach.

Image 1

cathedral

g: (5, 2), r: (12, 3)

Image 2

monastery

g: (-3, 2), r: (3, 2)

Image 3

tobolsk

g: (3, 3), r: (6, 3)

Pyramid Alignments

As I mentioned previously, the metrics which score similarity between a reference image and a target image to shift are bounded by O(n) with n being the number of pixels in the image. For the smaller approximately 400 x 400 .jpg file images, this ran efficiently without any additional modifications. However, for the larger dimensional .tif file images, which had approximately 3000 x 3000 pixels, the vanilla approach would not be efficient and complete in the given runtime limit of 2 minutes per image.

To solve this issue, I implemented an image pyramid alignment function built upon my single scale alignment function. This image pyramid successively scaled down the input image at each level by a factor of 0.5 until it reached the bottom level, where the number of levels were predefined to a set value by the user. At the bottom level, once the dimensionality of the 3000 x 3000 pixel .tif file had been reduced to a 400 x 400 pixel dimensionality, the single scale alignment function that was previously defined was run on this scaled down image which still encoded the original image's information and metadata. Then, as a final step, the shifts determined at the bottom level = 0 were propagated up to the topmost level by multiplying them by 2 recursively. This multiplication by 2 accounted for the scale down by a factor of 0.5 which was performed.

This image pyramid enabled me to write an algorithm that could process and align large dimensional images in an efficient timeframe. Through experimentation, I discovered that 4 levels were sufficient to discover an alignment. If I attempted to use 5 levels, the image at the bottom level = 0 lost too much information.

Image 4

church

g: (24, 0), r: (56, -8)

Image 5

emir

g: (48, 24), r: (104, 40)

Image 6

harvesters

g: (56, 16), r: (120, 16)

Image 7

icon

g: (40, 16), r: (88, 24)

Image 8

lady

g: (56, 8), r: (120, 8)

Image 9

train

g: (40, 8), r: (88, 32)

Image 10

onion_church

g: (48, 24), r: (104, 40)

Image 11

sculpture

g: (32, -8), r: (136, -24)

Image 12

three_generations

g: (48, 16), r: (112, 8)

Extended Pyramid Alignments

The image pyramid alignment approach mentioned previously with a default search range of 15 and no data preprocessing successfully aligned all .tif files except two. The two .tif files which did not align successfully were melons.tif and self_portrait.tif.

To improve the alignment of these two images, which were being stubborn with the old approach, I implemented a preprocessing step before calculating the structural_similarity metric on the shifted image with the reference by cropping 10% of the borders which had each lines of color or black rows. In addition, I increased the search range of the algorithm to [-30, 30] from [-15, 15] in both the row and column dimension spaces. The combination of these two changes successfully enabled me to find alignments of melon.tif and self_portrait.tif, showcased below!

Image 10

melons

g: (48, 24), r: (104, 40)

Image 11

self_portrait

g: (32, -8), r: (136, -24)

Other Pictures from the Prokudin-Gorskii collection

I ran the extended image pyramid alignment algorithm discussed in the previous section on two images from the collection: a picture of "Napoleon, waiting for peace" and a picture of a boy on a bridge named "Na ostrovie Kapri".

Image 10

napoleon

g: (48, -16), r: (104, -8)

Image 11

boy on bridge

g: (64, 8), r: (136, 0)

Conclusion

In conclusion, I used an image pyramid with 4 levels, a search range of [-30, 30] pixels, and scikit image's structural similarity library to successfully align Prokudin-Gorskii's images.