I shot 3 sets of two photographs such that the transforms between the two sets of images were projective and swiveled to reveal elements of the scene that were not fully present in the first photo. While shooting the 3 sets of photos that I will create mosaics of, I fixed the center of projection (COP) and rotated my camera while capturing photos. This made it so that the photos in the set were captured at the same angles, just rotated, meaning that when stitched together, they would form a panorama!
One important detail to not that is present in the sets of the images below is the difference in exposure as a result of the imperfections of the iPhone 15 camera that I used to to shoot these images. That is why later on, when blending the images, to account for the differences in exposure and color, I will "blend" them instead of simply overlaying one upon the other. I also tried 2 sets of real-world images, and one set of minecraft images because I wanted to explore how the geometry and colors would differ from a real-world environment, and I thought it would be cool to experiment with a 3D computer video game.
The next step was to take the two sets of photos, and recover the homography matrix used to map the points and geometry of the first left image to the right image. Then I used this homography to warp the left image into the geometry of the right image, creating two images (warped A and b) that could then be used to create a mosaic through blending and overlaying functions.
Using the correspondence tool, I defined correspondences (key points on key objects that I could use to map the orientation of one image to another). This included edges and vertices of prominent objects in the frame that I mentally used as a reference point in the warping and mosaic overlaying/blending process. Using a function computeH(im1_points, im2_points), I recovered the parameters of the transformation between each pair of images. I set up a linear system of equations Ah = b for each point pairing of correspondences that I had selected. I solved this system of equations using least squares (np.linalg.lstq) as it was underconstrained) to recover the homography vector parameters, which I then reshaped into a 3x3 matrix with 9th element of 1.0.
Now that I had calculated the parameters of the homography matrix, I used the homography matrix I had calculated to warp one image in each set towards the reference image. In my case, since the homography matrix for each set mapped correspondences going from left to right, I set the right image as a reference, and warped the left image into the right image's shape. I wrote a function warpImage(image, H) which warped the input image according to the homography matrix H.
I first warped all 4 corners in the original image "a" space into the "b" space using the homography matrix. This resulted in negative values, so instead of directly using the corners, I created a bounding box using the warped corner values with a width and height of max_y - min_y, max_x - min_x. After this process was complete, I looped through all the pixels in the bounding box, applying the offset of min_x, min_y that I had previously calculated, and used an inverse Homography Matrix H' to map those "b" points back to the "a" space. If the inverted points were in bounds of the "a" space, then I sampled the "a" color from that pixel, and set it equal to that value in the warped image, thus creating a warped image. The pixels without values were marked as 0.0, which you can see in the black space.
To test my warping and homography matrix calculaton functions, I utilized a picture of museum (I believe the Met) at a slanted angle, showcasing paintings at a wall at the slight slanted angle of the phone. I then warped the four corners of the painting (selected as correspondences), including the frame as I thought the frame looked cool, into a default rectangular shape like [0, 0], [200, 500], [0, 200], [500, 0]. Using these correspondences and image, I performed the same warping process as before using my computeHomography() and imWarp() functions to rectify the painting into a flat shape. The result is displayed below!
The first step in overlaying/blending the warped A and b images that we created in Part 3 smoothly is to create Distance Transform graphs, which are basically graphs which indicate how close the nearest 0 pixel is. I used the cv2.distanceTransform function (equivalent to bwdist in MatLab) along with a binary mask of 1s and 0s overlaying the image placed onto the canvas to create the distance transforms displayed below. To explain what I mean by "image" on canvas, I created a canvas for the mosaic with size (image_a.width + image_b.width) and (image_a.height and image_b.width). I created a np.zeros() array with this size. This was my canvas. Then I overlayed each of the warped images on the canvas (the warped_a image was translated back to the original and aligned using the min_x, min_y offsets I had calculated previously). I then created distance transforms on the aligned, warped photos on the canvas, resulting in the graphs below.
I used low and high frequencies and filters to perform the blending that fixed the issues of differing exposure and sharp boundaries between the two images after overlaying, resulting in a smooth, cohesive panorama. The first step was to create the high and low frequency versions of both of the images that were to be in the mosaic. I did this by using a similar technique to Project 2, where I convolved the image with a 2D Gaussian Kernel to get the low frequency version, which was blurred and contained the colors and higher level details of the image. Then to get the high frequency image, I subtracted the low frequency image from the original image, resulting in only the higher frequencies remaining. I did this for all 3 sets of images.
I then blended the low and high frequency pairs of all the sets of the images, as displayed below, resulting in a lower frequency blend and a higher frequency blend. As you can see, the finer details and edge information is contained in the high frequency blend, while the low frequency blend contains the higher level details and color gradients. I merged the high frequency images by comparing at each pixel, which distance transform had a greater value (higher distance to a 0 pixel, as intuitively this pixel then was closer to the middle of the picture and not the edges). I blended the lower frequency elements by normalizing the distance transforms to create weights that I then applied to each pixel of the low_pass frequency warped A and b, adding them together adjusted by the weights to create a low pass blend. Then, the final step was to add the low frequency and high frequency blends that resulted in the final mosaics below! I attempted to use Laplacian stacks to blend, but ultimately found this method of blending resulted in a smoother blended image with improved performance.
The intermediate high and low frequency blends for each set, alongside the low and high frequency warped A and b's for each set of images are contained in the media folder on my github repository. I chose not to include them here for the other two sets, as the website already had quite a few imgs, and course staff announced they'd like it if the website was not difficult to scroll through on Ed. But if you'd like to take a look, they are there!
Using the Harris corner detection code provided in the spec, alongside parameters of sigma, min_distance (which determines the minimum distance between Harris points detected), and a threshold which thresholds based on the strength of the Harris corner points that were detected. These points can be controlled in terms of density and number based on the parameters explained. As you can see, the Harris points are not well distributed – they are closely packed together, which the ANMS will solve. Also, they are not one to one correspondences between these points, because the Harris corner detection process was performed separately on each image, and feature matching will solve this problem. But the combination of the threshold, sigma, and min distance parameters, combined with the Harris detection algorithm leads to points along prominent edges and contours in the image.
Adaptive Non-Maximal Suppression is a method I applied to the Harris points from the last resulting step to spatially distribute the Harris points, as close by points encode redundant information. To accomplish this, for each point, I calculated the distance to the nearest point that had a stronger corner strength than it, which I set as the radius of that specific point. Through this approach, the radius also encodes harris strengths as the strongest point will have an infinite radius, as there is no point with a stronger Harris strength. Then, I chose the 100 strongest points in terms of the radii I calculated, suppressing the Harris points effectively using ANMS.
The next step was to match the points between the two images, as the processes so far on both images to automatically extract correspondences. I created a 8x8 patch sampled every 5 pixels starting from -17.5 pixels away, creating a 40 x 40 pixel patch around each of the points in the images. Then at each 5 point interval, I sampled the original image. This resulted in, for each feature point, a 64-point feature vector that can then be used for matching. Then using the dist2 function, I created a 100x100 matrix containing the distances between each of the feature vectors created in the previous step. THen I picked the points which satisfied the ratio test (if sorted_similarity[0] < 0.8 * sorted_similarity[1]) comparing the nearest neighbor with the second nearest neighbor resulting in a set of feature matched correspondences between both images. Then, this set can be further improved by the RANSAC method which calculates homography on the a randomly sampled set of points.
RANSAC is a method where I randomly selected 4 points from the matched feature set of correspondences between the A and B images. Then using this 4 point subset of the larger overall set of feature matched points, I computed the homography matrix mapping the A points to the B points space, similar to the Project 4A to essentially calculate a homography matrix that can be used to warp the image A into the image B space and form a mosaic through blending. After calulating the random 4 point set's Homography matrix, I then looped through all of the points in the feature matched set and project the A points to the B points space using the homography matrix. If the projected A point was within a distance threshold of 3 from the B point that it was feature matched too, then I added it to the result. I ran this RANSAC (random) algorithm for a total of 500 iterations, and picked the result which led to the most number of points being within the distance threshold that I had specified. Then in the next warping system, the homography matrix is computed using this result set of points between A and B!
These are the comparisons between the automatically stitched panoramas where using Harris corner detection, ANMS point suppression based on point radii, Feature extraction and matching, and the RANSAC homography method. The combination of these methods allowed us to discover new correspondences between the two images to create the mosaic. In contrast, the images with manual correspondences followed a similar format in the process of warping the images and computing the homographies, but the process of the manually selecting point correspondences using the online tool was replaced. They both are similar in the sense of orientation and the mosaic's frame, but there are slight tweaks and modifications in shape and color of the image's warping. The Minecraft picture in particular has a lot of sharp edges and contours, which I believe may have contributed to the autostitched and manual mosaics being similar.
I had fun experimenting with different geometries to blend images into mosaics! The most important thing I learnt during part A was how to use low and high frequency passes of images using 2D Gaussians to blend images smoothly as an alternative to Laplacian stacks. The most important thing I learnt during part B was how to use RANSAC to randomly sample points to extract a homography out of auto defined correspondences!