www.pudn.com > Criminisi-original-gray.zip > index.html, change:2012-05-18,size:10667b

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<title>Object Removal by Exemplar-based Inpainting</title>

<body>
<h1>Object Removal by Exemplar-based Inpainting</br>
<span style="font-size:0.6em">A
<a href="http://www.cc.gatech.edu/classes/AY2005/cs4495_fall/">CS7495</a>
Final Project by Sooraj Bhat</span></h1>

<img src="bungee0.png"/>
<img src="bungee1.png"/>
<img src="bungeeA.png"/>

<p><b>Deliverables:</b>
<a href="slide.ppt">presentation slide (PPT)</a>
<a href="slidesmall.png">(PNG)</a>
<a href="slide.html">(HTML)</a>,
<a href="inpainting.zip">everything but the movies: source code, samples & results (ZIP)</a>
</p>

<h2>The paper I implemented:</h2>

<a href="http://research.microsoft.com/~antcrim/">Antonio	Criminisi</a>,
<a href="http://www.irisa.fr/vista/Equipe/People/Patrick.Perez.english.html">Patrick Perez</a>,
<a href="http://research.microsoft.com/users/toyama/">Kentaro Toyama</a>.
<a href="http://www.research.microsoft.com/%7Eantcrim/papers/Criminisi_cvpr2003.pdf">
<b>Object Removal by Exemplar-based Inpainting (PDF)</b></a>
Jun. 2003  Madison, WI <i>Proc. IEEE Computer Vision and Pattern Recognition</i>

<h3>Abstract (from the paper)</h3>
<em>
<p>
A new algorithm is proposed for removing large objects from
digital images. The challenge is to fill in the hole that is
left behind in a visually plausible way.
</p>
<p>
In the past, this problem has been addressed by two classes of
algorithms: (i) "texture synthesis" algorithms for generating
large image regions from sample textures, and (ii) "inpainting"
techniques for filling in <em>small</em> image gaps. The former
work well for "textures" -- repeating two-dimensional patterns
with some stochasticity; the latter focus on linear "structures"
which can be thought of as one-dimensional patterns, such as lines
and object contours.
</p>
<p>
This paper presents a novel and efficient algorithm that
combines the advantages of these two approaches. We first note
that exemplar-based texture synthesis contains the essential
process required to replicate both texture and structure; the
success of structure propagation, however, is highly dependent on
the order in which the filling proceeds.  We propose a best-first
algorithm in which the confidence in the synthesized pixel values
is propagated in a manner similar to the propagation of
information in inpainting. The actual colour values are computed
using exemplar-based synthesis. Computational efficiency is
achieved by a block-based sampling process.
</p>
<p>
A number of examples on real and synthetic images demonstrate
the effectiveness of our algorithm in removing large occluding
objects as well as thin scratches. Robustness with respect to the
shape of the manually selected target region is also
demonstrated. Our results compare favorably to those obtained by
existing techniques.
</p>
</em>

<h2>What exactly was implemented:</h2>

<p>I implemented the entire algorithm from the paper.  A
pseudo-code description of the algorithm (from the paper) follows.
Let <i>R</i> represent the region to be filled, <i>I</i> the
entire image, <i>S=I-R</i> the source region from which candidate
exemplars are chosen, <i>P(p)</i> the priority of a pixel,
<i>C(p)</i> the confidence term for pixel <i>p</i>, <i>D(p)</i>
the data term for pixel <i>p</i>, and <i>t</i> the iteration:</p>

<ul>
<li>Extract the manually selected initial front <i>dR<sup>0</sup></i>.</li>
<li>Repeat until done:
<ul>
<li>Identify the fill front <i>dR<sup>t</sup></i>.  If <i>dR<sup>t</sup></i> = {}, exit.</li>
<li>Compute priorities <i>P(p) = C(p)*D(p)</i>, \forall <i>p</i> \in <i>dR<sup>t</sup></i>.</li>
<li>Find the patch <i>Hp</i> with maximum priority,
<i>i.e.</i> <i>Hp</i> | <i>p</i> = argmax<sub>{<i>p</i> \in <i>dR<sup>t</sup></i>}</sub> <i>P(p)</i>.</li>
<li>Find the exemplar <i>Hq</i> \in <i>S</i> that minimizes
the sum squared error (SSE).</li>
<li>Copy image data from <i>Hq</i> to <i>Hp</i>.</li>
<li>Update <i>C(p)</i> \forall <i>p</i> | <i>p</i> \in (<i>Hp</i> \intersect <i>R</i>).</li>
</ul>
</li>
</ul>

<p>The main contribution of this paper is the
priority/patch-ordering mechanism that allows an exemplar-based
approach to respect the structural features of the input image.
The priority is composed of a <i>confidence term, C(p),</i> and a
<i>data term, D(p),</i> both defined over pixels:</p>

<ul>
<li><i>C(p) = ( Sum<sub>{q \in Hp \intersect ~R}</sub>C(q) ) / ( area of patch )</i></li>
<li><i>D(p) = abs( Isophote(p) \dot Normal(p) ) / \alpha  </i></li>
</ul>

<p>Intuitively, the confidence term measures how sure a pixel is
of its own value; this is computed from the confidence of
surrounding pixels that have already been filled (or weren't in
the fill region to begin with).  Confidence tends to decay as the
center of the fill region is approached.  Because of this, if the
priority only consisted of the confidence term, the patches would
be selected in an "onion-peel" manner, which is typical of current
exemplar-based approaches.  Confidence ignores structural
information in the image, however.  This is why the data term is
necessary; it is a combination of how strongly an <i>isophote</i>
at a pixel collides with the contour at that same pixel.  An
isophote is basically the gradient at a pixel rotated by 90
degrees---it captures the "strength of flow" of an edge.  If only
the data term is used in the priority, however, edges end up
propagating where they shouldn't.  It is the harmony between the
two factors that creates good results.  In this vein, both
quantities are normalized (to lie between 0 and 1) by appropriate
factors.  Figure 3 in the paper diagrams the quantities.</p>

<img src="isophote.png"/>

<h3>Miscellany</h3>

<p>I did not use the CIE Lab colorspace as they did in the paper
because the choice of colorspace is orthogonal to the algorithm.
The CIE Lab colorspace <i>does</i> have better perceptual
uniformity than the RGB colorspace, as the authors point out,
which is why they use it in their implementation.</p>

<p>I developed the algorithm in Matlab and coded a critical portion
in C (<i>i.e.</i> a Matlab MEX-file).  The code is small and easy to
use: it is contained in one M-file and one C file, totaling 188
lines of code and 104 lines containing semicolons ("logical" lines
of code). For usage notes, see <tt>README.txt</tt>.</p>

<h2>Results:</h2>

<p>In short, the results turned out quite well.  The only major
discrepancy was in the execution speed between my implementation and
the authors'.  All experiments were run on a 2GHz Pentium 4 laptop
with 256MB of RAM.  All results presented in this report are mine
alone.  In the images for the confidence and data terms, blue = 0,
red = 1.  Values in between are scaled to the appropriate color.</p>

<h3>A synthetic image</h3>

<p>I created a 213x284 synthetic image for testing and sanity
checking.  My implementation took 25 seconds to inpaint this image.
It correctly painted in the missing parts of the synthetic
presence of the red dots in the data term; they lie along the hard
edge between the black and gray regions.  Because of the presence of
such a strong edge, the pixels along the edge get a high priority;
therefore, those patches get filled first.  Here is a movie showing
the region-filling for the synthetic image (.avi files): <a
href="bw.avi">compressed, 52KB</a>, <a href="bw2.avi">uncompressed,
~14MB</a>.</p>

<img src="bw0.png" alt="Original image" title="Original image"/>
<img src="bw2.png" alt="Fill region" title="Fill region"/>
<img src="bwA.png" alt="Inpainted image" title="Inpainted image"/>
<img src="bwP2.png"  alt="Confidence and data terms, respectively"
title="Confidence and data terms, respectively" width="675" height="350"/>

<h3>A natural image</h3>

<p>For a natural image to work on, I used the photo of a bungee
jumper from the paper.  This allows me to compare my implementation
with the authors'.  My implementation takes 114 seconds to inpaint
this 206x308 image; theirs, 18 seconds.  The previous
state-of-the-art (Harrison's resynthesizer), takes 10 minutes, as
they report.  Their experiments, however, were run on on a "2.5GHz
Pentium IV with 1GB of RAM", so it's not quite a fair comparison.
Furthermore, 57% of of my execution time is spent in one routine,
<tt>bestexemplar</tt> (even <i>after</i> coding its critical portion
in C!), which is time-intesive because it needs to do a SSE
calculation <i>for each patch (sliding window) in the source
region</i>.  It is possible they have a smarter implementation of
that subroutine, but it isn't mentioned in the paper.  Another 20%
is wasted in <tt>getpatch</tt> which merely does some indexing
calculations.  Also, the authors don't mention what language they
used, but I suspect it was either C or C++.</p>

<p>Here is a movie showing the region-filling for the natural image
(.avi files): <a href="bungee.avi">compressed, ~1MB</a>, <a
href="bungee2.avi">uncompressed, ~60MB</a>.</p>

<td><img src="bungee0.png" alt="Original image" title="Original image"/></td>
<td><img src="bungee1.png" alt="Fill region" title="Fill region"/></td>
<td><img src="bungeeA.png" alt="Inpainted image" title="Inpainted image"/></td>
</tr></table>
<img src="bungeeP2.png" alt="Confidence and data terms, respectively"
title="Confidence and data terms, respectively" width="650" height="375"/>

<p>Again, the data term reveals something about how the linear structure
is preserved.  At two points in the jumpers's torso, the data term
becomes larger, along the two edges created by the roof of the
building in the background (it's hard to see; look for the red dots).</p>

<hr>
<!-- Created: Mon Nov 29 15:51:57 Eastern Standard Time 2004 -->
<!-- hhmts start -->