For part 1.1, I used the finite difference operators, dX = [[-1, 1]] and dY = [[-1], [-1]] to find the partial derivative in x and y of the cameraman image by convolving the dX and dY operators with the image. After computing the partial derivates in each direction, I was able to use this information to compute the gradient magnitude, as the gradient is a vector showing partial derivatives on each axis.
To further turn this into an edge image, I binarized the gradient magnitude by picking an appropriate threshold of . This meant that all values above or equal to the threshold were set to 1, and below were set to 0, creating an edge image! The threshold I chose was 0.15, but there was still noise present in the grass, but I could not increase this threshold because it then removed information about the buildings in the back from the image.
For part 1.2, I noted the results with just the difference operators had noise. To reduce noise, I applied a gaussian smoothing filter by convolving a 2D gaussian filter over the image, creating a blurred version of the image, and then performing the same operations as before. The differences I see from Part 1.1 are that the edges are more pronounced and clear, compared to the noisy, faded edges from Part 1.1.
In addition, I also experimented with DoG filters, which are the filters created by convolving dX and dY with the 2D gaussian filter, and then applying them to the original image as before. As you can see below, both the smoothing and DoG filters resulted in the same edge image output.
For a brief description of the gradient magnitude computation, the gradient is a vector [dX, dY], so for each point in the grid, the magnitude of the gradient can be computed using the formula for the magnitude of a vector, which is sqrt(dX ^ 2 + dY ^2). This was how I computed the gradient magnitude.
In part 2.1, I derived the unsharp masking filter operation by noting that when we convolve an image with a 2D gaussian filter, we are essentially blurring the image, and that isolates the low frequencies of the image. To make a sharpened image, we can add more of the high frequencies (image - the blurred image) to the original.
So, to sharpen the images below, I sharpened each of them by blurring them with a 2D gaussian filter and subtracting that blurred image from the original to get the higher frequencies, and then adding the higher frequencies to the image. I created the 2D Gaussian Kernel by filter creating a 1D gaussian using cv2.getGaussianKernel, and taking the outer product of the 1D Gaussian with its transpose.
The equation to blur an image was blurred = image (convolve with) 2D gaussian filter. The equation to sharpen an image using this blurred image was sharpened = image - alpha * (image - blurred). Alpha in this case was a parameter that I could tune that controlled how sharp the image was, as when alpha was set equal to a large value, the resulting image was adding the higher frequnecies from the image more strongly. In contrast, when the alpha was small, the image was not as sharp, as higher frequencies were not as strongly added. In addition, the 2D Gaussian filter was created with the parameters sigma = 1.0 and size = 8.
For evaluation, I chose the sharp taj mahal image. I then blurred this taj image with a 2D gaussian filter as described above. I resharpened the taj image with the equation described above with alpha = 1. When comparing the original sharp image with the resharpened image, I noticed that the finer details of the taj image, which were present in the original such as the intricate carvings, and the scaffolding on the left tower were lost when the image was blurred. So when the image was resharpened, the lost finer details were not recovered, as the other major parts of the image were correctly resharpened. The resharpened image is also noisier and not as clear compared with the original.
Hybrid Images are static images that can change as a result of the interpretation of low and high frequencies at varying distances. At a close distance, the high frequency elements dominate the viewer's perception. But, at a distance, the low frequency elements are more visible. And by blending two images – the high frequencies of image 1 and low frequencies of image 2 – we can create a hybrid image between the two.
To create the hybrid image, I first blurred the first image with a 2D Gaussian Kernel, created by transposing a 1D Gaussian Kernel with a size and sigma as parameters and taking an outer product. The blurred image was representative of the low frequenices. And to recover the high frequencies of the second image, I did a similar operation where I convolved the second image with the same 2D Gaussian to extract the low frequencies, and then subtracted the low frequencies * an alpha parameter, in this case 0.4, to recover the high frequencies. Then finally, to create the hybrid image, I had to add these low and high frequencies together, and clip the values to 255 to be inside the valid RGB range.
I had to tweak the strength of the 2D Gaussian Kernel according to the pair of images I was using, as often the strength of the 2D Gaussian Kernel was based on how strong it had to be to blur the low frequency image seen at a distance to a respectable amount. I experimented with different values for Sigma and Size to get the best result. Below, I've displayed the sample pairs of images provided, in addition to pairs of images I chose to blend that showcase changes over time, expression etc. For my favorite pair of images, I also display the log magnitude of the Fourier Transform of the two images, their low and high frequency versions, and the hybrid image between the two.
Below, I display the different pairs of images I created hybrids of in addition to the sample images used for debugging. My favorite result was blending Shah Rukh Khan, a popular Indian actor, with a picture of a cat I found online. For this favorite result, I displayed the FFT Log Magnitudes below. I also displayed an example of a failure case, where my hybrid image did not showcase the distance effect, where at low distances, the high frequencies are more prominent.
An example of a failure is when I tried to create a change of expression hybrid of the Rock by hybridizing an image of a happy rock and a serious rock. However, due to the differing sizes and angles of the rocks faces in the respective images, the hybrid image was not properly aligned and hybridized.
I built functions to create Gaussian and Laplacian Stacks for an image. The Gaussian Stack is created by convolving the image with a 2D Gaussian Filter at increasing sigma values with the original image, leading to a stack of increasingly blurred images. The Laplacian Stack is created by taking the differences between the gaussian stack at different indexes: gaussian_stack[i] - gaussian_stack[i+1]. The last image I appended to the Gaussian Stack was the last element in the Gaussian Stack, for reconstruction purposes to provide a base.
There was a difference in relation to the image pyramid alignment algorithms in the previous Project 1 was that the image was being downsampeld at each stage. This is no longer the case with the Gaussian and Laplacian Stacks. In addition, to display the Laplacian Stack visually, I normalized the stack using the formula, (img - img.min()) / (img.max() - img.min()). Both the Gaussian and Laplacian Stacks for the Apple and Orange are visualized below.
Gaussian Stack for Apple
Gaussian Stack for Orange
Laplacian Stack for Apple
Laplacian Stack for Orange
To blend the images, I created the Gaussian and Laplacian stacks for the orange and the apple. The Laplacian Stack was calculated using the Gaussian Stack taking the difference between consecutive elements of the Gaussian Stack Array. Then to use the Laplacian Stacks for image blending, I had to do one additional step: I created a left mask, which took the apple image and set the left half equal to 1, and the right half equal to 0. Then I created a Gaussian Stack for the mask with the same number of levels as the image's Gaussian Stack.
To do the actual blending, I looped through the stacks, I did an elementwise multiplication between the left mask's Gaussian Stack[i] and the apple LP stack[i], and the same thing for the orange LP stack[i] with (1 - the Mask) to create the right mask from the left mask. Finally I reconstructed the images using a blended base of the left_mask[-1] * the apple_lp[-1] and the right_mask[-1] * orange_lp[-1]. The oraple is displayed below.
At each level of the Gaussian Stack, I increased the blur by altering the parameters of the 2D Gaussian Filter which was convolved with the image. The value of sigma was set to 20, and the size was set to 4 * l + 1, meaning at each level the size of the 2D Gaussian Filter increased, leading to a greater blurring effect. The horizontal and vertical seam between the apple and orange was blended by creating Gaussian Stacks of the original mask, meaning at each level, there was Gaussian blurring, smoothing out the transition between the two images.
I also performed blending on other pairs of images, and the results are displayed below. In addition to the classic vertical seam mask, I created an irregular mask in shape of an ellipse to blend together Hermione and Ron.
I had fun playing around with filters and frequenices to manipulate images in unique ways during this project! The most important thing I learnt from this project was how to interact with low and high frequencies using convolutions and subtraction/addition operations to transform images in various creative ways.