Poisson Image Editing

Introduction
Matching the Gradient with Boundary Conditions
Discrete Solution and Implementation with a Sparse Matrix
Results and Discussion
Texture Flattening
Try It Out Yourself!
Russian Translation
Chinese Translation
Source Code

Introduction

Poisson Image Editing is a technique for "seamlessly blending" two images together fully automatically. It was originally conceived in this paper by Patrick Perez, Michel Gangnet, and Andrew Blake from Microsoft Research UK. To motivate the problem, look at an example:

What if we wanted to take this image of me and some college friends at the beach...

...and this image of a Great White shark

...and make it way more badass by cutting out the shark and putting it in the water in front of us:

The problem is, it's really obvious that picture is fake. The water in both images is a slightly different shade of blue/green and the boundary where I cut out the shark is clearly visible because of this. We could try something simple in Photoshop/Gimp like making the image of the shark brighter, but the boundary will almost surely will still be visible no matter how hard we try to tweak it, or if we do get it right it will probably take a lot of trial and error. Is there a way to somehow blend them seamlessly together automatically in one shot with some more clever technique to get an image like this?

Well, since I'm writing this, the answer must be yes (it's Poisson Image Editing). My goal in this tutorial will be to explain this algorithm in a simple, accessible way, and I will end with a user-friendly program that will allow you to try it out yourself with your own images! I used this program to wreck my friends in a mock Facebook album, and I hope you will find similarly hilarious uses for it. So without further ado, let's do some math...

Please Note: The images used in the rest of the writeup were either taken by me and my friends, or were taken from Wikimedia commons. Click here to see a list of sources for the Wikimedia ones.

Matching the Gradient with Boundary Conditions

This whole algorithm actually boils down to a very simple idea. To get at this, let's first try to define more carefully what we'd like to accomplish when we blend two images together. Call the image we're changing Image A and the image we're cutting out and pasting Image B:

It would be nice if we could allow the colors in Image B to change when we pasted it on, but to somehow keep all of the "details" of Image B intact. Some details we'd like to preserve would be all of the edges, corners, transitions, etc in image B. But if you read my tutorial on Edge, Corner, and Blob Detection, you'll already know techniques for extracting these details from images, and the first step in every technique is to calculate the image gradient. The image gradient is just a mathematical way of describing how the image pixels change relative to the pixels around them (it is essentially the difference of a pixel and its neighbors). And a relative descriptor is just what we're looking for, since the disagreement between image A and image B that makes them jut out against each other is the absolute magnitude of their colors.
Hence, the goal of Poisson Image Editing stated with a bit more rigor is: allow for the tweaking of absolute information of image B (colors), but preserve the relative information (image gradient) of image B as much as possible after it's pasted. Below is the image gradient of the shark we're trying to paste, so you can get an idea of what that relative information might look like:

Image	Gradient Magnitude

Given alone, the image gradient is under-constrained. As an analogy, consider I told you I wanted you to trace a path on a piece of paper, and I told you "first trace two inches, then turn 30 degrees clockwise, then trace 3 inches, etc", but I didn't tell you a starting position or any specific points on the path. Then the curve you trace out could start anywhere and be at any rotational orientation and still satisfy the conditions I told you. The relative information contained in the image gradient is the same way. We need to fix RGB values of some specific pixels to get a solution to our problem. This gives us an awesome opportunity to make image B more like image A. What we do is fix the pixels on the boundary of image B to be equal to the pixels on image A where B resides, and then solve for the rest of the pixels on the interior of image B that preserve the original gradient of image B. That's it! In the next section I will give more specific equations for doing this and give even more intuition for how this works, but hopefully now you at least see what needs to be done.

Discrete Solution and Implementation with a Sparse Matrix

Let me now present the system of equations that solves the problem. Let's call the image which we're pasting on A (as before), the image which we're pasting B (as before), and the new image to be pasted H (where H should be some improved version of B that blends in with A better). First the easy part: the boundary constraints. We said before the pixels on H's boundary should be exactly the same as the pixels of A on that boundary, so that we can match those pixels on the outside of the selection and blend them inwards. In math speak

\[ H_{(x, y)} = A_{(x,y)} \forall (x, y) \in \partial B \]

So we already know the exact solution to all pixels on the boundary of H (which is also the boundary of B). Now we need to solve for the pixels in the interior of H. We want the gradient of the pixels on the interior of H to equal the gradient of the pixels on the interior of B. To turn this gradient matching into a system of linear equations, we're going to sum up local information about a gradient using what's known as a "finite difference scheme of the laplace operator." It is simply the sum of the differences between that pixel and all of its neighbors; or, in other words, it's a sum of the "flow" of the function out of each pixel. Here's how it is defined for the image we are pasting in

\[ \Delta B_{(x, y)} = 4B(x, y) - B(x-1, y) - B(x+1, y) - B(x, y-1) - B(x, y+1) \]

If one of the neighbors happens to be a boundary pixel, then its value is fixed. If one of the neighbors happens to be out of bounds of the selection, then it should be excluded. This is summarized for all cases for every point in H by the following difference equation (lifted from the paper and rearranged in a way that I think makes more sense):

\[ |N|H(x, y) - \sum_{(dx, dy) + (x, y) \in \Omega} H(x+dx, y+dy) - \sum_{((dx, dy) + (x, y) \in \partial \Omega } A(x+dx, y+dy) = \sum_{(dx, dy) + (x, y) \in (\Omega \cup \partial \Omega)} (B(x+dx, y+dy) - B(x, y)) \]

where (x, y) is the location of the point of interest on the 2D grid, "N" is the number of valid neighbors the pixel actually has within the selection region including the boundary (less than or equal to 4), "Omega" is the selection area of B and H excluding the boundary, "partial Omega" is the boundary of the selection area, and (dx, dy) are the possible neighbor locations that range over {(-1, 0), (1, 0), (0, -1), (0, 1)} (this accounts for the 4 possible neighbors of each point). This looks like a mess but it really isn't, so let me break down this equation in English:

The left hand side of the equation is computing the spatial gradient of the unknown point H(x, y) by taking summing the difference between H(x, y) and all of its N neighbors. Each difference that goes into the gradient has the form H(x, y) - other(x', y'), where (x', y') is the position of a neighbor. The first sum on the left hand side represents the difference of H(x, y) with other H(x', y') points that are on the interior of the selection (Omega). The second term represents the difference of H(x, y) with border points, which are fixed at the value of the image that we're pasting onto, A, which is why we have to treat them separately (they do not vary and we do not solve for them).
The right hand side of the equation is simply the gradient of image B at (x, y), which we would like to match with the gradient of our new image H at (x, y). Hence the equality
Note that for color images, these equations are set up and solved for the Red, Green, and Blue channels independently

Hopefully this makes some sense. If it doesn't, just remember the basic idea: we're trying to solve for a new image H that matches the background of A better, and the border of this image H matches up with A exactly, while the difference between adjacent points of H matches up with the difference of the corresponding adjacent points in B.

What we do next is write out this equation for every point in H, and notice that we have a system of linear equations in k variables, where k is the number of pixels we need to solve for in H. To solve the system of equations, the most straightforward thing to do would be to put them in matrix form and invert the matrix. However, k is actually quite large. For instance, in a 200x200 selection, k = 40,000. We do not want to sit around and wait to invert a 40,000 x 40,000 matrix. We should also note that the matrix is extremely sparse because every point has at maximum 4 neighbors (and also positive definite for you really observant math heads reading this). So each row has at most 5 nonzero elements and the rest are zero. This lends itself well to an iterative matrix solving technique. For simplicity, I decided to use something called the Jacobi Method to solve the sparse linear system. The Jacobi Method is a special case of Gradient Descent (if you've ever heard of that). Basically here's how it works:

Set up your matrix equations in the form \[ A x = b \] where A is the matrix of equations that I defined above (the equations that related gradients to gradients or set the boundary pixels equal to some constant), x is what we'd like to solve for (the pixels of the H image in this case), and b is what the equations should equal. It would be smart to compress the matrix A if you have a sparse matrix (my program only stores the nonzero entries in its data structure)
Initialize x to all zeros (this is an all black image)
Compute the product Ax
Compute the difference (b - Ax), which measures the error between what the current guess of x (our H image) is and what we need it to be.
Add the difference (b - Ax) back to x. This is the "gradient descent" part where we try to get our guess of the solution (x) to move in the right direction
Repeat steps 3-5 until the error between x and (b-Ax) is small enough

This process is guaranteed to converge on the correct solution for x if A is positive definite (which it is), and it does so exponentially. The code I wrote to implement the Jacobi method can be found in "MatrixSolver.java" in the source code archive. What ends up happening with this iterative solution is that the new image starts off with the correct border and a black interior, and then slowly fills in color from the outside in. Here's an example progression from 1%, to 33%, to 66%, to 100% convergence:

There are better techniques to use than the Jacobi Method in this case, but it's very simple, easy to explain, and easy to implement, so I just went with it. Performance suffers a bit for large selections (you may have to wait a minute or 2 for selections much bigger than 150px x 150px), but it's quite fast for small selections (most of the examples in this writeup completed in a matter of seconds).

NOTE: The actual paper on Poisson Image editing is much more rigorous than I am. It starts out thinking of image A and image B as continuous functions in 2D space, and formulating/solving a partial differential equation that describes the implantation of the gradient of image B onto image A in a least squared sense. It turns out you get something called the "Poisson Equation" when you do it this way in continuous space, which is why the technique is called "Poisson Image Editing". But you definitely don't need to understand partial differential equations to understand how this technique works on images with discrete pixels, so I omitted it in my discussion. Interested math-inclined readers should definitely check this section of the paper out, though, because the Poisson Equation pops up everywhere and it's neat to see how it fits in here.

Results and Discussion

Now comes the really fun part: testing. Below I'm going to show a bunch of test cases running this algorithm, and examine where the algorithm succeeds and where it fails (and try to explain why given what it's doing). Alright let's do it!

Example 1

Image A	Image B

Before Blending	After Blending

This is a fairly successful case. I was pretty tan and in indoor lighting when my picture was taken, while the Harry Potter cast was a little pasty and outside in overcast lighting. If you look carefully you can see that the boundary matches up exactly (as it should) and that the interpolation inwards smoothly preserves all of the original detail (gradient) of my face. Looking at specific parts of the boundary, the transition from the bottom of my face to Daniel Radcliffe's neck is seamless (you can't even really see where it happened), and my hair is now the same color as his. And the rest of my face is the same color too. So there we go...I can pretty much check hanging out with Emma Watson off the bucket list.

Example 2

Image A	Image B

Before Blending	After Blending

This is an example very similar in spirit to the last one. I pasted President Obama's face onto some goon Johannes I know, and they blended seamlessly together. Notice how I was careful to make the boundary match make sense; I chose skin pixels on image B to line up with skin pixels on image A so that the correspondence would make sense.

Example 3

Image A	Image B

Before Blending	After Blending

Here's what I would consider to be the first failure case. There are a few problems here. First of all, there isn't much contrast between the lion and his background. So when we put him in an environment where the grass is much brighter green and we would expect a contrast, the algorithm tries to preserve the original contrast between the lion and his background so he ends up light green. Also, the grass is much different in the two images, so even though it changes to the correct color there is a visible seam. In more technical computational photography terms, even though colors match up at the boundary, there is a "texture discontinuity" there. Poisson Image Editing only solves the color discontinuity problem so this is a problem we're stuck with.

Example 4

Image A	Image B

Before Blending	After Blending

Hey everyone, it's Carlton the friendly fat zombie from Doom 3! Seriously I love this example, but there is a small problem with it. Notice how I didn't match up Carlton's skin perfectly with the zombie's skin on the left. Instead, that region of the boundary is going to match to a dark background in image A. When the smooth gradient matching takes place it has to start out with those boundary conditions. So what ends up happening is the black background smoothly seeps into the image of Carlton's face from the left, so his face is way too dark on that side. Be careful that you have really good boundary matches when you use my program, otherwise you will have colors (like black in the case) bleeding in that shouldn't be there.

Example 5

Image A	Image B

Before Blending	After Blending

This has a similar problem to example 4, in that part of Carlton's face on the right got matched with Michael Jackson's hair, so some darkness got blended in from the right.

Example 6

Before Blending	After Blending

Not much to say here

Example 7

Image A	Image B

Before Blending	After Blending

This examples shows what truly bizarre things you can accomplish with Poisson Image Editing.

Example 8

Image A	Image B

After Blending

The last batch of apples I got at Trader Joe's had two apples with an uncanny resemblance to my parents...

Okay that's enough examples for now! Just one more quick note on an application of this technique, and then you can try it out yourself

Texture Flattening

One simple of extension of Poisson Image Editing that I tried (and failed at) is "texture flattening". The idea is to only keep gradient information where there's a sharp edge, and to zero out the gradient everywhere else. This way, only the sharpest edges will be transferred over to the new image. Well I tried this out by implementing a Canny Edge Detector to find the edges but didn't have the most success. Try it out though still and see if you can get it to do anything interesting!

Try It Out Yourself!

I made a program in Java that implements Poisson Image Editing as I described above, and I used that program to generate every one of my examples. Why Java you may ask? Well it's certainly not my language of choice but it is a very simple language, and it's still the best cross-platform solution with good data structure support that I know of. This is important for people who don't know how to compile things from source but who would like to try out my program. Since I would like this program to reach as wide an audience as possible and to see what a diverse array of people use it for, it needed to be easy to set up and run on many different types of computers "out of the box".

Filename	Purpose
Poisson.java	The main applet interface that allows the user to select drag and paste parts of the images onto each other

PoissonStandalone.java	Same as Poisson.java but designed to be a standalone application with a "main"
ImageSelector.java	A GUI interface for selecting, rotating, resizing, and flipping images
MatrixSolver.java	The meat of the algorithm that implements a Jacobi Matrix iterative solver for solving the discrete Poisson Image Editing problem
CannyEdgeImage.java	An implementation of canny edge extraction used in texture flattening
Coord.java	A really simple helper class for storing where selections were made

Feel free to e-mail me with any questions or to ask them in the comments section below