Colorizing the Prokudin-Gorskii Photo Collection - CS 180

Francesco Crivelli

<aside> 💡

Context

Sergei Mikhailovich Prokudin-Gorskii (1863-1944) captured color photos of the Russian Empire using glass plates with red, green, and blue filters. Although color printing technology wasn't available at the time, his vision was to combine these images into full-color photos. The Library of Congress later digitized these glass plate negatives.

Overview

The goal of this project is to automatically align and merge these digitized images to produce high-quality color photos with minimal artifacts.

My Implementation

Using l2-norm, the green and red channels are aligned to the blue channel using exhaustive search for small images.

For large images, an image pyramid algorithm is employed, reducing image size at each level to speed up the alignment process. Additionally, automated border cropping helps remove unwanted edges for a cleaner final image.

</aside>

emir.tif

Methodology

This project involved reconstructing color images from Prokudin-Gorskii's digitized glass plate negatives by aligning the red, green, and blue channels. The alignment was achieved through a multi-step approach combining exhaustive search, image pyramids, border cropping, and edge detection.

lady_output.jpg

Exhaustive Search: I began by implementing an exhaustive search within a 15x15 pixel window to align the green and red channels to the blue channel, using the L2 norm (SSD) to evaluate alignment accuracy.

Screenshot 2024-09-09 at 10.24.10 PM.png

cathedral_output3.jpg

Image Pyramid for Efficiency: To improve performance, I introduced an image pyramid technique, which recursively downsizes the images to align them at different scales. I made the pyramid levels adaptive such that the coarsest level's resolution was set to approximately 300 pixels. This allowed me to align images at lower resolutions, progressively refining the alignment at higher resolutions. The optimal shift found at each level was scaled and applied to the next finer level, thus significantly speeding up the alignment process without sacrificing accuracy.

Pyramid image processing.svg.png

Dynamic Border Cropping: Recognizing the impact of noisy borders on alignment, I implemented an automated border cropping strategy. The border size was dynamically set as 15% of the image size at each pyramid level to account for variations in resolution. This step helped to remove edge artifacts and focus the alignment on the central part of the images, where the content is most important.

Normalization and Window Refinement: For each pyramid level, the window size was refined dynamically, starting with a larger window at coarser resolutions and reducing it at finer resolutions. This allowed for more precise shifts at higher resolutions and ensured the process remained computationally efficient while maintaining alignment accuracy.

<aside> 💡

Edge Detection: In more challenging images, such as the "Emir," I implemented edge detection using a combination of grayscale conversion, Gaussian filtering, and Sobel filters. By focusing on the edges, I improved alignment by ensuring that structural elements of the images (e.g., lines and shapes) were matched accurately, leading to better results, particularly in cases where intensity variations between channels caused issues.

In more details, this is how the naive implementation of the emir, using image pyramid looks before and after:

before alignment

before alignment

after alignment with image pyramid only

after alignment with image pyramid only

We can tell that due to the irregular brightness of the image the alignment becomes harder than the other.

That we why we apply edge detection: we apply the Sobel filer to each of our rgb grey scale channels, I then apply a gaussian filter to make the image more regular and we get the result on the right hand side. that is what we put into out pyramid_align function to get the image properly aligned, getting the result from below:

emir_output.jpg

</aside>

church_output.jpg

church_output_old.jpg

Images with Sobel and gaussian filtering applied. (each filter represents a channel

Screenshot 2024-09-09 at 10.37.38 PM.png

Screenshot 2024-09-09 at 10.37.33 PM.png

Screenshot 2024-09-09 at 10.37.25 PM.png

Provided Examples:

Emir.tif
R:(104, 40)  and G:(48, 24)

Emir.tif R:(104, 40) and G:(48, 24)

church.ti
R: (58, -4), G: (24, 4)

church.ti R: (58, -4), G: (24, 4)

train.tif
R: (86, 32), G: (43, 3)

train.tif R: (86, 32), G: (43, 3)

tobolsk.jpg
R: (6, 3), G: (3, 3)

tobolsk.jpg R: (6, 3), G: (3, 3)

three_generations.tif
R: (110, 8), G: (54, 12)

three_generations.tif R: (110, 8), G: (54, 12)

self_portrait.tif
R: (176, 36), G: (77, 29)

self_portrait.tif R: (176, 36), G: (77, 29)

sculpture.tif
R: (136, -28), G: (35, -12)

sculpture.tif R: (136, -28), G: (35, -12)

onion_church.tif
R: (104, 36), G: (54, 27)

onion_church.tif R: (104, 36), G: (54, 27)

melons.tif
R: (176, 12), G: (79, 9)

melons.tif R: (176, 12), G: (79, 9)

monastery.jpg
R: (3, 2), G: (-3, 2)

monastery.jpg R: (3, 2), G: (-3, 2)

lady.tif
R: (112, 8), G: (54, 7)

lady.tif R: (112, 8), G: (54, 7)

icon.tif
R: (88, 22), G: (41, 17)

icon.tif R: (88, 22), G: (41, 17)

harvesters.tif
R: (120, 12), G: (63, 17)

harvesters.tif R: (120, 12), G: (63, 17)

cathedral.jpg
R: (12, 3), G: (5, 2)

cathedral.jpg R: (12, 3), G: (5, 2)