Interactive Segmentation



Tel-Aviv University

The Raymond and Beverly Sackler

Faculty of Exact Sciences

School of Computer Sciences

Image Segmentation using 1D Matching:

a Combined Segmentation and Editing Tool

By

Alon Shmueli

Prepared under the supervision of

Prof. Daniel Cohen-Or

Dr. Hayit Greenspan

Submitted in partial fulfillment of the requirements

for the degree of M.Sc. in Tel-Aviv University

May 2007

Acknowledgments

I want to thank Eyal Zadicario for accompanying me the whole way, for giving me the initial idea on the way of interaction, and for many more good ideas along the way.

I also want to thank my dear wife, Nitza Shmueli, for giving me all the support I needed throughout the way.

Contents

Abstract 7

1. Introduction 9

1.1. The Segmentation Problems 9

1.2. Image segmentation Categories 9

1.3. Contour-Based Interactive Segmentation 13

1.4. Medical Image Segmentation 14

2. Active Contours and Level Set Methods 16

2.1. Active Contour Models ("Snakes") 16

2.2. Level Set 21

2.3. The Narrow Band Approach 26

2.4. The Fast Marching Approach 26

2.5. Geodesic Active Contours 28

2.6. Active Regions and Level Sets 31

3. Live Wire Methods 37

3.1. Live Wire (Intelligent Scissors) 38

3.2. Live Wire and Live Lane 41

3.3. Phase-Based Live-Wire 42

4. Graph Cut Methods 46

4.1. Graph Cut 46

4.2. Grab Cut 51

4.3. Lazy Snapping 54

5. Interactive Segmentation by 1D Ray Matching 58

5.1. Proposed User Interface 58

5.2. Image Segmentation using 1D Rays – Previous Work 62

5.3. Algorithm Outline 63

5.4. 1D Ray Matching 64

5.5. Regularization Term 67

5.6. Multi Scaling 69

5.7. Interaction and Editing Details 71

5.8. Results 74

6. Conclusions 85

7. Reference 86

List of Figures

Figure 1: Snake detecting Subjective Contours Illusion 19

Figure 2: Embedding the curve in a higher dimension 22

Figure 3: The scalar function[pic] in the image plane. 23

Figure 4: Updating the function [pic] only in a narrow band 26

Figure 5: Constructing the Fast Marching surface T 28

Figure 6: A piecewise constant "Cartoon" image 33

Figure 7: Live Wire behavior 39

Figure 8: The effect of "On the Fly Training" 41

Figure 9: Global Phase vs. Local Phase 43

Figure 10: Detecting edges using Local Phase in a single dimensional signal.. 44

Figure 11: Brush-strokes interaction in Graph Cut segmentation 47

Figure 12: The segmentation problem as a Graph Cut problem 48

Figure 13: Defining the Graph Cut edge weights. 50

Figure 14: Interaction using GrabCut 52

Figure 15: Border segmentation in Lazy Snapping 57

Figure 16: Ray Segmentation user interaction 61

Figure 17: Neighbors regulatory term. 68

Figure 18: Editing the polygon 73

Figure 19: The Multi-Scale benefit 76

Figure 20: Segmenting thin structures 77

Figure 21: Segmentation of an object with blurred edge. 78

Figure 22: Comparison to other methods 79

Figure 23: Segmentation with margin 80

Figure 24: Segmenting a multi-parts object 81

Figure 25: Overcoming disturbing edges 83

Figure 26: Overcoming occlusions. 84

Abstract

Interactive image segmentation has become more and more popular among researched in recent years. Interactive segmentation, as opposed to fully automatic one, supply the user with means to incorporate his knowledge into the segmentation process. However, in most of the existing techniques, the suggested user interaction is not good enough since the user cannot intuitively force his knowledge into the tool or edit results easily. Therefore, in ambiguous situation, the user has to revert to tedious manual drawing.

I present here a novel method developed as a combined segmentation and editing tool. It incorporates a simple user interface and a fast and reliable segmentation based on 1D segment matching. The user is required to click just a few "control points" on the desired object border, and let the algorithm complete the rest. The user can then edit the result by adding, removing and moving control points, where each interaction is follows by an automatic, real-time segmentation by the algorithm.

Before introducing my work I present a very extensive survey on "Contour Based Interactive Image Segmentation" techniques. It includes a detailed review of Active Contours methods ("Snakes", Level Sets, Fast Marching Methods, Geodesic Active Contours etc.), Live Wire methods (Live Wire, Phased Wire etc.) and Graph Cut methods (Graph Cut, Grab Cut and Lazy Snapping).

In the second part of the work I describe my "Ray Segmentation" method: The user initialize the process with set of "control points" one the objects border. The system then connects them to an initial polygon and iterates on it until convergence. In each iteration, each contour point tries to move to the best next location along the polygon's normal. Best location is achieved by comparing a 1D neighborhood in the normal direction, between the candidate point and two neighboring control points along the polygon. The comparison process is implemented as simple correlation between two 1D profiles (also called "segments" or "rays").

For better convergence a regulatory term is introduced. This term is designed to prevent a point from moving too far from the previous iteration's local contour. The few adjacent contour points are projected onto the current point’s normal and their average is calculated (to produce a 1D shift along the normal). Any new candidate location for the current point is panelized for diverting away from this neighbors' recommended location (shift). Averaging on the recommended location is weighted, where neighbors with higher "confidence level" contribute more to the average.

The algorithm operates as a real-time tool, responding immediately to user actions on any average modern computer. Convergence is reached in a low number of iterations, even when initialized from a remote initial solution, due to an adaptive, multi-scale approach. The highest scale level in use takes into account the size of object to segment, allowing the algorithm to reach the right contour, for objects of any size, even if initialized far from it.

The algorithm is benchmarked on both medical as well as on realistic (photographic) images and shows its added values versus other image segmentation tools. We compare results with the Fast Marching (Level Sets) method as well as the Live Wire method, showing the extra versatility and the ability to segment objects in highly complex situations.

Introduction

1 The Segmentation Problems

Image segmentation is the problem of extracting (segmenting, cutting-out) foreground objects from background in an image. It is one of the most fundamental problems in computer vision and has interested many researches throughout the years. As the use in computers increase through the years, reliable image segmentation is needed in more and more application, in the industrial, medical and personal fields. Fully automatic segmentation is still an open problem due wide variety of possible objects' combination, and so it seems that the use of human "hints" is inevitable. Interactive image segmentation has therefore become more and more popular among researched in recent years. The goal of interactive segmentation is to extract object(s) from background (or just split the image into continues classes) in an accurate way, using user knowledge in a way that requires minimal interaction and minimal response time. This paper will starts by describing general ways to categorize segmentation techniques, continues with an extensive survey of current contour-based interactive image segmentation techniques, and ends by introducing a new combined editing and segmentation tool.

2 Image segmentation Categories

Extensive work has been done in the field of image segmentation. Traversing the huge amount of existing techniques, it could be helpful if we could categorize them into groups. When coming to do so one immediately understands it's a multi-dimensional task. That is, there are multiple ways to categorize a segmentation method. I present here a list of possible categorization parameters. This doesn't include validation (e.g: accuracy, efficiency), only the a-priori "type of segmentation". For a more extensive talk about Medical Image Segmentation techniques see [38, 41]. For an extensive survey on man-machine user interaction in segmentation see [17].

1. Automatic vs. Interactive - Automatic segmentation takes no user input while Interactive (or semi-automatic) segmentation uses user "hints" for initialization or guidance. Examples for fully-automatic methods are the watershed algorithm [51], Marching Cubes [26], normalized cuts [43], Split and merge (see chapter 10 in [20]) etc. Interaction methods vary from "initialize and let go" (Level Sets [1,27,39,44-47], Region Growing methods (chapter 10 in [20])), through specifying "constrains" (Snakes [23], Graph Cuts [4,5,42]), to "full-guidance" through-out the process (Live Wire [15-16,30-35,37-38,50]).

2. Region vs. Contour Detection – The segmentation algorithm can either try to "label" each pixel to be foreground or background (Graph Cuts methods, Region Growing methods like Adobe Magic Wand [2], Thresholding methods etc) or look for the separating contour that defines the objects (Snakes, Level Sets, Live Wire). The most advanced methods today combine both attitudes (Geodesic Active Regions [40], Lazy Snapping [25]) and therefore generally give the best results.

3. Usage of Knowledge – Some methods are knowledge based, i.e. they try to match the object to one of a pre-defined set of images (using Classification [14]), or try to deform a known template to the object at hand (Deformable Templates [58]). Other method assume some prior assumptions and build a model that the object must comply with, like having a smooth curvature (Snakes & Level Sets) or a smooth local statistics (Markov Random Fields). Finally there are methods who assume nothing on the object (Live Wire, Thresholding, use of Morphological Operators)

4. Use of Training (related to "Knowledge Based methods") – Some knowledge based methods require pre-training stage before starting the segmentation (Clustering (supervised learning) methods, e.g: k-nearest-neighbors). Other methods only assume a model and "teach" themselves on the fly building some knowledge base from one image to another (Classification (unsupervised learning) methods, e.g: K-Means). See [14] for more an extensive discussion on these issues.

In another, very popular, type of "training", information prorogates throughout the segmentation of a single image from segmented parts to none segmented parts. Segmented pixels build some knowledge-base on the object or borderline which serve when examining non-segmented pixels (Live Wire, Markov Random Fields methods, some Neural Networks).

5. Use of Hard Constraints – Some algorithms require themselves to make sure a certain piece of knowledge will be reflected in the final segmentation. This is called Hard Constraints or Seed Points (Graph Cut, Region Growing). Other impose some general model which is sometime called Soft Constraints

6. Intensity / Color / Texture – The basic information to segment can be either the intensity (mostly in grey-level images) or the color (in color images). It is also common to attach a "feature vector" to each pixel or its small environment and to segment them. The last attitude usually proves better when coming to segment a highly textured image.

It is very common for a new segmentation algorithm to emerge as a gray-level segmentation and later develop to handle color and textured images. An example from the Graph Cut world is the evolution from the original gray-scale Graph Cut [5] to color methods like GrabCut [42] and Lazy Snapping [25] and to textured-base Graph Cuts like [29]. Another example from the Level Sets world is the evolution from Level Sets [27] and Geodesic Active Contours (GAC) [9] to texture-based variants like [22].

7. Medical / Realistic Images – Those are the two most common types of images that segmentation algorithm try to benchmark on. The two types are relatively different and therefore many of the algorithms tune themselves to only one of the types see further discussion on characteristics on Medical Images in section 1.4.

8. Segmentation with Transparency – Some systems, especially the once dealing with realistic images, extend segmentation to "image-cutout", that is, the ability to cut the segmented object out and put it onto another background in a realistic way. This involves giving some of the pixels (especially the ones close to the border) a mixed labeling of "foreground" and "background" rather then a binary one. This issue is usually called "Border Mating". For examples see Lazy Snapping [25] and GrabCut [42].

9. Local / Global Solution – Many segmentation methods try to prove that the final obtained segmentation is some minimization of a pre-defined functional. The difference is whether this minimum is a local one (Snakes & Level Sets) or a global one (Graph Cuts) and therefore is less dependant on initialization.

10. Dimensionality – Some methods are naturally extendable from the 2D case to upper dimensions, or even originally formulated in N-D (Graph Cut, Level Sets). For others the native world is 2D, where extensions to 3D usually involve applying the same method to each slice.

3 Contour-Based Interactive Segmentation

Out of all the all possible combinations above I've chosen to focus in this thesis on Contour-Based Interactive Image Segmentation techniques. Therefore the outline of the rest of the work will be as follows:

Chapter 2 will cover methods called Active Contour Models, that is, Snakes and Level Sets. Those methods start with a contour drawn by the user and iteratively deform it to get the final, best describing object(s) contour. The methods iteratively minimize a pre-defined energy functional which is usually a balance of two forces: external forces, that attract the contour to its place and internal forces that keep the contour smooth. Minimizing the functional is achieved using variational calculus methods, in which the equivalent partial differential equations (PDE) is found (usually using Euler-Lagrange equation) and then numerically solved (using finite differences).

Chapter 3 will cover the Live Wire methods. These methods let the user control ("steer") something like an evolving "wire" that gradually wraps the object throughout the segmentation process. The "wire" snaps to the desired border-line and eventually generates the desired closed contour. Live Wire methods are based on Dynamic Programming to find the shortest path (in a weighted graph induced by the image) of the "wire" from the current mouse point to the border.

Chapter 4 will describe the Graph Cut based algorithms. In this method the image is converted into a graph, in which pixels are nodes and connection are edges. The segmentation problem is reduced to finding the minimal cut in the graph, which will split the foreground pixels from the background ones. This is achieved by an optimal min-cut/max-flow algorithm that finds the globally optimal solution. In fact, Graph Cut is a region-based segmentation but I will introduce some variant which incorporate contour knowledge too.

Chapter 5 describes my combined segmentation and editing algorithm, described briefly in the "Abstract". It first details previous work done in using 1D profiles to infer 2D segmentation. It then describes the new proposed user interaction scheme and the algorithm in details. Finally, it shows results of applying the algorithm to medical and realistic images and compares results with other methods.

4 Medical Image Segmentation

The use of digital images in medicine has become very popular these days. All diagnostic imaging devices are computerized and can manipulate digital data. This includes Magnetic Resonance Imaging (MRI), Computed Tomography (CT), Ultrasound imaging and more. All those devices can manipulate images, perform measurements and export it in a digital way. A natural need for image segmentation comes from various analyses and measurements that can be done, for example, to detect pathologies.

A more extensive use of image segmentation comes from medical therapy devices. Procedures such as minimal invasive surgeries (e.g: guided surgeries) and non invasive surgeries (e.g: X-Ray radiation, Focused Ultrasound etc.) become more and more popular and require detailed computerized feedback to compensate for the lack of direct visual contact by the surgeon. In many such systems the user has to segment some object(s) in order to prepare the treatment plan as well as to avoid hurting other sensitive objects. A good interactive segmentation tool can save a lot of time, replacing manual drawing by the surgeon.

Medical images tend to suffer much more noise than realistic images to due the nature of the acquisition devices. This of course poses great challenges to any image segmentation technique. In order to reduce noise many devises increase the (already existing) partial voluming, that is, they average acquisition on a thick slice. This (among other reasons) leads to blurring the edges between the objects, which make decisions very hard for automatic tools.

The different acquisition modalities, the different image manipulations and variability of organs all contribute to a large verity of medical images. It can be safely said that there is no single image segmentation method that suit all possible images. Furthermore, the expert user can decide to draw part of an object, a cluster of object or an object with margin, and may request that the segmentation border pass where there is no or edges. This can pose great problems for any segmentation method. It is clear that the need for combined editing-segmentation tool is needed where the expert can give the computer some hints/marks as to where he/she want the contour to pass and let the computer complete the non-marked parts. This is the exact motivation for the interactive segmentation method presented here.

Active Contours and Level Set Methods

This chapter will describe the optimization methods generally called Active Contour Models. The "active contours" start with an initialized contour and actively deform themselves to the desired border while reducing the defined energy in each iteration until convergence. Convergence is achieved when reaching a balance between the "External" powers that attracts the contour to its place and the "Internal" powers which keeps it smooth, usually by maintaining some function of its curvature.

The first model was suggested by Kass et al [23] and was called "Snakes" (section 2.1). Later evolution occurred in the Level Set model [27] that embedded 2D contour in a surface in 3D (section 2.2). Further optimization to the Level Set algorithm reached in the form of Narrow Band [1] and Fast marching [44] (sections 2.3 and 2.4). The next changes in the theory appeared as Geodesic Active Contours [9] (chapter 2.5) which showed a relationship between Snakes and Level Sets and proposed a better convergence scheme. The last model I describe here is the Geodesic Action Regions [40] (chapter 2.6) which incorporates region information to the border one.

1 Active Contour Models ("Snakes")

Active Contour Models where first proposed by Kass et al [23] in 1987 as an Interactive Segmentation method for 2D images. Instead of following previews bottom-up approaches (e.g: Canny 1986 [7] where edges are first detected and then linked to form a contour) Kass et al suggest a top-down approach.

The model treats the desired contour as a time evolving curve and the segmentation process as an optimization over time of an adequate energy functional. The curve location C is parameterized by a spatial parameter p and the iteration time t as follows:

[pic] (2.1)

The curve is considered to be applied by 3 types of forces, reaching equilibrium when the energy is minimized:

1. "External Forces", attracting the contour to its desired location using image features and other knowledge.

2. "Internal forces", keeping the contour regular and "well behaved", assuming some form of tension in the curve.

Therefore, the energy functional to minimize is generally written as:

[pic] (2.2)

Equation (2.2) will be used all throughout this chapter where each method defines its own external and internal forces terms. Optimal solution is achieved when the energy minimization process achieves equilibrium, that is, the energy functional reaches a local minimum. Note that global minimum is not guaranteed by this method and this is the reason why user interaction is needed.

Kass et al chose to split the external forces to:

a. "Image forces", attracting the contour ("Snake") to significant image features.

b. "External Constraint", added by user to make the Snake attract (or retract) to (from) significant high-knowledge features.

Therefore, the Snake's energy functional is written as:

[pic] (2.3)

The next sections will describe in details each of the above terms as described in Kass et. al's work.

a. Image Forces

The image based term should reflect the attraction of the curve to image significant information. Kass et. al. chose to handle attractions to the following three features:

1. Image intensity (bright or dark, as required).

2. Image edges.

3. Image "features" or "terminations", defined by sharp curves in the detected image gradient.

Hence the energy functional is written as:

[pic]

[pic] (2.4)

Where [pic] is the curvature imposed by the detected image gradient, [pic] is the gradient angle and s is the arc length parameter.

b. Internal Forces

The curve is though of having internal tension forces that keeps it as regular as desired. Kass et. al. wrote the energy functional as follows:

[pic] (2.5)

The first order term makes the snake behave like a membrane (elasticity term, prevents it from "breaking"). The second order term makes it behave like a thin plate (rigidity term, make it behave like a circle). In the original model [pic] and [pic]can be specified per contour point. In practice[pic] is usually kept constant and [pic] is either a positive-constant or zero [9,54,56,57]. The two parameters define the ability of each point to break or bend. Assuming [pic]for any p, the contour is always continuous and cannot get disconnected even if the edges in the underlying image are not connected. Therefore "Subjective Contours" (illusions) can also be "detected", as opposed to many button-up methods. See Figure 1 for an example.

[pic]

Figure 1: Snake detecting Subjective Contours Illusion: Right: Subjective Contours illusion. Left: Final Snake "detects" the subjective contours. (Taken from [23]).

The above internal forces also give the snake the desired "hysteresis" regarding dynamic user interaction (will be described next).

c. External Constraints

Since global minimum is not guaranteed, this technique may be subject to high dependency on initial user drawing. User is therefore supplied with means to "push" (and "pull") the contour to the right solution and away from a wrong one. Once the contour is close to a desirable feature, the image-term will pull the snake the rest of the way.

Any form of constraint can be used for this purpose. Kass et al suggest a user interface which enables the user to place "springs" and "volcanoes" in the image. "Springs" pull the contour towards a desired local minimum using a force of [pic] (where k is the spring's constant), while "Volcanoes" repel the contour at a [pic]force. Many other types of image forces have been developed over the years. For a comprehensive list of external forces see [57].

Note that external constraints need not be limited to only user interface. Any high level reasoning procedure can output external constraints to be incorporated into the energy functional.

d. "Balloon" force.

One problem with the original snakes model is that the contour should be drawn near (or pushed to) an image feature. In the absence of a close feature the snake will reach equilibrium based on its internal forces and may even collapse onto itself. To overcome this problem, Cohen et al [11] suggested adding to the external forces a "balloon force" term [pic]. It represents a constant force extracted in the normal direction, causing the snake to naturally inflate or deflate. The sign of [pic] distinguishes inflation from deflation and pre-defines if the user should draw the contour surrounding the object or included in it. During optimization, internal and image forces will gradually restraint the inflation force but the later can help the snake disregard insignificant image features (e.g: weak edges) on its way.

e. Implementation

The Energy functional is solved using variational calculus. The relevant Euler PDE system is set and the derivatives are approximated using finite differences. It can then be solved in O(n) time using sparse matrix methods.

Amini et al [3] have showed an alternative solution to the functional using dynamic programming rather then variational calculus. This approach enables introducing "hard constraints" that must be satisfied completely, as opposed to the energy minimization formulation which reaches equilibrium when the different forces "compromise" and balance each other.

f. Following Work

Many works have been done on improving the basic model by Kass et al. Issues that where addressed where: speeding up the convergence rate (scale space solutions), stabilizing the solution with respect to initial condition (local minima problems), dealing with multiple objects and topological changes, suggesting different image forces and external forces terms, introducing hard constraints [3, 54]. Some of the above problems are naturally solved by the Level Set formulation.

2 Level Set

The following three problems are the major drawbacks of the Snakes algorithm:

• Inability to handle topological changes (split and merges).

• Need to initialize or push the snake be close to edge.

• Inability to deal with protrusions and sharp/thin structures (due to its internal forces).

Level Set is a PDE based method that naturally deal with those problems. Level Set was first introduced by Osher and Sethian [39] in fluid dynamics. Applying it to image segmentation was simultaneously suggested by Casseles et al [8] and Malladi and Sethian [27]. To distinguish it from Snakes it is sometime called implicit (vs. explicit for Snakes) or geometrical (vs. parametrical) Active Contour Models. See [57] for a discussion on relationship between the two. See also [52] for a good summary of Level Set's underlying mathematics.

a. Embedding the curve in a higher dimension

The basic idea is to embed the planar-curve in a 2D scalar function [pic] (surface in R3) such that the original curve is retrieved by intersecting the surface with the xy plane. Since [pic] on the plane, the extracted contour is called the zero-level-set (see Figure 2a). Over the rest of the plane [pic] is defined to get the (signed) distance between the plane and the surface, where sign is defined to be positive for points outside the curve and negative for points inside. Now instead of evolving the curve in the 2D plane (as snakes do) we move the surface-function [pic] with respect to the xy plane and retrieve the new contour as the zero-level-set of the function. This approach enables topological changes to occur naturally in the 2D plane while the surface in the 3D space exhibit no change (see Figure 2b)

[pic] [pic]

(a) (b)

Figure 2: Embedding the curve in a higher dimension: (a) The red contour is given by intersection of the xy plane with the scalar function (surface). When the surface moves downward (in –z direction) the contour evolves as the surface's front. (b) Topology changes can naturally occur. In this case the two red contours will eventually merge. (Taken from Sethian's home page).

[pic]

Figure 3: The scalar function [pic], embedded in the image plane. (taken from a presentation by ByungMoon Kim)

Getting back to the image world, [pic]can also be seen as a distance map that for each pixel[pic]gets the value of value of the[pic]function. Pixels that lie on the contour will get 0 values while the rest get their (signed) distance from the curve where sign is defined as above (see Figure 3)

Since the evolving front C(t) is a zero level set of the scalar function [pic] for every time t, we get for the evolving contour:

[pic] (2.6)

Derivation with respect to t, using the chain rule we get

[pic] (2.7)

Let F be the speed in which the contour propagates in the direction normal to the curve. Hence

[pic] (2.8)

Where

[pic] (2.9)

Therefore equation (2.7) becomes:

[pic] (2.10)

And we get a PDE on [pic] with an initial condition [pic] (the initial drawn curve). This equation can be solved using finite differences approximations for the special and temporal derivatives [27].

b. Choosing the function F

In the Active Contour Model we had to select the different energy terms in order to specify the convergence and smoothness of the system. The equivalent in the Level Set model is to choose the Propagation Speed terms defining F. The requirements will be similar: we want to incorporate a regularization term ("internal"), a term that will encourage stopping at edges ("external") and an inflation term ("external-balloon", like introduces in [11] for snakes).

To account for regularization, Malladi and Sethian [27] suggested a curvature dependant speed[pic], for example,[pic](concave points go faster in the normal direction.

To get an inflation force we add a constant term[pic]:

[pic] (2.11)

And so we are left with the edge-stop term. Malladi and Sethian multiply the above speed by the function:

[pic] (2.12)

This function behaves as an inverse of a gradient detector. It therefore approaches zero in the vicinity of an edge, bringing the velocity to a stop. Smoothing by a Gaussian filter helps skipping weak edges. Putting it all together in (2.10) we get:

[pic] (2.13)

Incorporating the classical equation for the curvature k:

[pic] (2.14)

and changing the sign F0 (can be positive or negative, see section 2.1(d)), we get the final Level Set flow as a PDE:

[pic] (2.15)

g. Implementation

In order to solve the Level Set problem, Equation (2.15) must be discretized first. Opening the above equation we get:

[pic] (2.16)

Following the approach taken by [27, 57], the first term on the right-hand side is discretized using the second-order accurate different scheme given in [53] and the second term is discretized using first order upwind differences scheme given by [45].

The numerical solution of (2.16) gives the local spatial derivatives in each contour point, that is, "where to jump to next". We can now describe the general Level Set algorithm:

1. Initialize the function [pic] using a distance transform from the initial curve

2. Estimate special derivatives (using finite differences) => calculate [pic] in (2.16)

3. Evolve [pic] locally, based on former[pic] values and [pic] (using finite differences)

4. goto 2

5. When stopped (no advancement): find the final curve as the zero-level-set of [pic]

3 The Narrow Band Approach

One of the main problems of the Level Set method is the computational labor involved in bringing a 2D problem to the 3D world. Instead of evolving a 1D curve on an image we evolve [pic] on the whole 2D plane in space, just to get its zero-level-set. Adalsteinsson & Sethian [1] has suggested reducing computation by updating [pic] only in a narrow bad around the zero-level-set (see Figure 4) using a distance transform from the curve. This requires extracting the zero-level-set curve and updating the band in each iteration. One further refinement was to update the band (and the function[pic]) only when the curve approaches the band limit. Another optimization is gained by limiting the image grid to be the bounding box of the outer limit, so if the contour is shrinking we get a smaller and smaller image to handle. Applying all the above reduces computation from O(n3) to O(dn2) per each time step, where d is the narrow band width.

[pic]

Figure 4: Updating the function [pic] only in a narrow band (from blue to red lines) around the contour (black line). The overall grid is changed to be the bounding box of the outer limit (grey pixels) so if the contour is shrinking we deal with a smaller grid. (Taken from a presentation by N. Paragios).

4 The Fast Marching Approach

In the specific case where the speed function F is strictly positive (or strictly negative), which means the scalar function [pic] is monotonously increasing (decreasing) a further optimization can be made. Suppose we could run in the whole flow in advance, and for each point (x,y) record in a function T(x,y) the exact time the front has reached it. T has a very interesting feature: not only that slicing with the (x,y) plane (t=0) gives the initial curve but slicing at every height t gives the curve at time t. Looking at projection of T on the (x,y) plane we get all the level sets as the future curve positions. We have turned the problem to a stationary one where the scalar function doesn't change in time and doesn't move. If we knew T in advance we could get the curve for every time t.

By definition of T

[pic] (2.17)

Derivating with respect to t we get

[pic] (2.18)

Using (2.8) and (2.9) we get

[pic] (2.19)

which is known as the Eikonal equation. This equation actually says that the gradient of the arrival time function T is inversely proportional to the speed function F. Imagine the surface T raised over the (x,y) plane: if its gradient is steep (high [pic]), moving up in [pic] and slicing with a parallel plane, the new curve will be relatively close to the former one (low F).

a. Generating the surface T

To generate the surface T a greedy algorithm is taken. It highly resembles Dijkstra's Method for finding shorted path on a graph [12]. Like it, the algorithm evolves the curve one pixel at a time. The surface is built bottom-up, one element at a time, always starting from the lowest element that can be added on top of the existing constructed front (see Figure 5).

[pic] [pic]

(a) (b)

Figure 5: Constructing the Fast Marching surface T. (a): The lowest next element location is marked by the arrow. (b) element added. (Taken from Sethian's home page)

Another way of looking at it is in the image (xy) plane. The curve is evolved one pixel at the time to the "cheapest" next pixel location touching the existing front. Like Dijkstra's method, we calculate the cost of moving to all the non-visited neighbors of the curve. The neighbor with the lowest cost is advanced to, and the former curve pixel is marked as "visited". The "cost" here really means "the smallest next time step" and is calculated according to the discretized form of (2.18). It stands clear that while building the surface T we actually evolve the zero-level-set C, therefore stopping requires no further level-set extraction.

Actually Dijkstra's method as is cannot be applied here since it walks on the grid only through grid connections (no diagonals allowed). The solution is given by a similar greedy algorithm build into the numerical scheme for solving the Eikonal equation (2.18). It is called the ordered upwind method and was first introduced by Tsitsiklis's [49] and later developed by Sethian et al. [46,49].

5 Geodesic Active Contours

Geodesic Active Contours (GAC here) is a model that theoretically combines explicit Active Contour Models (Snakes) with the Implicit Active Contour Models (Level Sets). It was simultaneously introduced by Kichenassamy et al [24] and Caselles & Kimmel [9] (I will follow the formulations introduced by the latest). The model starts from the Snakes formulation and shows that it is equivalent to finding a geodesic curve in a Riemannian space with a metric based on the image content. It then reaches a PDE flow very much like Equation (2.13) only with an extra term that accounts for faster convergence to image gradients.

a. Geodesic Curves in a Riemannian space

In Riemannian geometry, assume a Riemannian manifold is defined by a metric g(). A geodesics curve is defined as the shortest path between two points on that manifold, with respect to the metric g(). For the Euclidian space (with the Euclidian metric) it will obviously be the straight line. For the geometry defined on a ball's surface the geodesic curve will be a segment of the great circle etc. In fact, a geodesic curve is considered a generalization of a straight line with respect to a specific Riemannian metric.

h. The GAC Functional

Geodesic Active Contours formulation starts from the basic snake functional (2.2) with the following modifications:

• Internal forces: Ignore the second order term (rigidity) in (2.5) (to be later justified) and prefix the first order term with a fixed [pic] (common practice).

• External forces: Take only the gradient term in (2.4), ignoring other image features or external constraints, but instead of taking [pic] a generalization is made to[pic] where g() is a strictly decreasing function. In practice, Caselles et.al choose the same g() as in (2.12).

The energy functional (2.2) then becomes

[pic] (2.20)

Using some principles from dynamic systems and some "basic math" it is shown in [9] that minimizing this functional is equivalent to minimizing

[pic] (2.21)

Now comparing this result to the classical length calculation of a curve in the Euclidian space:

[pic] (2.22)

we can give Equation (2.21) the following interpretation: Minimizing the energy functional (2.20) is equivalent to looking for a "minimal length" curve (geodesic curve) where length is measured with a special "weight" [pic]. It is therefore that minimization is done not only on length of the curve (which gives the curve a smooth shape, and justifies the removal of the rigidity term) but also on its ability to path though significant image gradients.

The appendix in [9] shows that using Euler-Lagrange formulation, the functional (2.21) is minimized by satisfying the flow:

[pic] (2.23)

Where k is the curvature defined in (2.24).

i. The GAC Flow

We now return to the Level Set paradigm. It is shown in [9] that the a curve that satisfies the flow (2.23) is actually a level set of a scalar 2D function that satisfied

[pic] (2.24)

Equation (2.24) shows some similarity to The Level Set flow (2.15) with the following differences:

• Equation (2.24) is missing the F0 "balloon force". Caselles et al show that the GAC model usually converges to the right solution also without this term, only slower. To speed up convergence they do suggest introducing it by adding it to k as in (2.15) (see below).

• Equation (2.24) adds an additional term [pic]. This term is actually the main contribution of GAC over Level Sets. It attracts the curve toward the boundaries of the object better then the g() term in (2.15) Moreover, in cases of high variation in gradient values, the level set approach may fail to stop since g() reduces the energy to zero only for high gradient values. Low values, or gaps in the edge will therefore impose problem in stopping the curve. The newly introduces [pic] term allows to track boundaries with high variation of their gradients, including small gaps.

Adding F0 to Equation (2.24) term we get the final Geodesic Active Contour flow, which is solved in similar numerical approaches as the level set:

[pic] (2.25)

6 Active Regions and Level Sets

So far we have only dealt with curves that are attracted to image boundaries, neglecting any homogeneity of the segmented regions and background. The first to suggest using Level Sets for region based segmentation were Chan and Vese [10] based on the Mumford-Shah [36] framework. Later work by Paragios and Deriche [40] incorporates both Boundary based (GAC) and Region based Level Sets.

a. The Mumford-Shah Framework

Mumford and Shah describe in [36] the image segmentation problem as follows. The image space A is split to sub-regions Ai and an on a function Ji is defined as an approximation to the original image function I. The segmentation problem is looking for the best petitioning Ai that their total set of functions J will best approximate I.

Best approximation can be achieved by minimizing the following functional:

[pic] (2.26)

where C is the set of all border separating the regions Ai, and [pic]are the components weights.

• The first term accounts for J being the best approximation of I over the whole image area A.

• The second term accounts for the boundary [pic] being as short as possible.

• The third term makes sure J (and hence I) doesn't vary much on each Ai (not including the borders)

Let us now consider a special case with the following limitations:

1. The binary case - there is only one object to split from the background.

2. J is piecewise constant, that is [pic].

It is obvious that:

• The third term in (2.26) vanishes.

• Best approximation is achieved when:[pic].

We therefore get the mumford-shah piecewise constant functional

[pic] (2.27)

Where [pic] are object and background means respectively. Note that minimizing the first and second terms actually means minimizing the variance in each of the piecewise constant areas.

The J approximation of the image can be considered as a "Cartoon" of the image being drawn with segmented areas with constant color. See Figure 6 for an example:

[pic]

(a) (b)

Figure 6: (a): the original image. (b): a "Cartoon" image obtained by a piecewise constant approximation of the original image. (Taken from a presentation by Z. Kato).

j. Active Contours without Edges

Chan and Vese [10] have added an extra regularization term [pic] to the functional to encourage small areas. They have also split[pic] to [pic]and [pic]. We therefore get:

[pic] (2.28)

But their main continuation of this algorithm is the incorporation of the solution into the Level Set framework, involving only regions (rather then contour) information. Optimization is done using only the principles in (2.28), completely disregarding edges and hence the name "Active Contours without Edges".

The general idea is to switch from a functional that is build from contour integration as well as area integration (2.28), to a single area-based integral, over the whole image space. This involves replacing the unknown contour C with an unknown surface function [pic] that matches the Level Set function definition: positive outside the contour, negative inside and 0 on the contour itself. Doing so enables solving the area functional optimization using Level Set methods.

To enable the above, the article introduced the Heavisde (step) function [pic] and Dirac's delta (unit pulse) function [pic]. Each of the terms in (2.28) is changed into an area-integral over the whole image area ([pic]). For example, the below equality holds since [pic] is defined to be [pic] on the area outside of the object.

[pic] (2.29)

A similar trick is done multiplying ([pic]) on the interior term of (2.28) and [pic] on the contour term. Putting it all together we get a new functional, using [pic] instead of C and integrating all terms over the whole image. Using Euler-Lagrange equation, it is shown in [10] that new functional's solution is:

[pic] (2.30)

Where k is the curvature given by (2.14). Obtaining the final curve is done as in the Level Set framework: iterating on the above flow using finite differences until convergence is reached and then extracting the zero level set.

k. Geodesic Active Regions

Let us lay done some of the problems with methods presented so far:

• We have used only edge-based or region-based method to obtain the segmenting curve. No incorporating between the two.

• Chan and Vese takes the first step into looking at interior objects characteristics but assume they are closely approximated by a piecewise constant function. There is no support for statistical information or texture based analysis in the regions.

• Segmentation is done into two classes – object and background. Even the Level Set method, which supports topology changes, segments multiple regions as a single class using a single Level Set function.

One way to solve the first problem is to use combined functionals. For example: in [21] a 3D segmentation technique is suggested using a functional that is a linear combination of an Edge term, a Minimal Variance (Mumford-Shah) term an a GAC term. Segmentation is achieved as a sequence of binary partitions, each one initiated by the user.

Paragios and Deriche [40] suggest a unified method to account for all the above problems. Their functional includes a contour term and a region term. The contour term includes minimal-length regularization and is based on Geodesic Active Regions. The region term is based on the Mumford-Shah space partitioning framework and is aimed in separating intensity properties into classes.

The model assume a priori information on the statistics of the boundary and region, generalizing the assumption of a piecewise constant function. The information is given in the form of probability functions for objects' location (curve location) [pic]and intensity propertied of the object and the background[pic]. The probabilities are used in constructing the energy functional. The probabilistic considerations enable extending the method to texture based segmentation.

The formulation is extended into many object-classed, using multiple Scalar functions[pic] with individual propagations. This imposes two possible problems: one of mutually exclusion, that is, making sure no pixel is labeled as belonging to two regions and the other of full coverage, meaning no pixel is left unlabeled. This is achieved using Zhao et al [59] multiphase motion scheme, by introducing new artificial forces, one that penalize a pixel for being labeled by multiple regions and the other that penalize for being left unlabeled.

Live Wire Methods

So far we have dealt with interactive models for image segmentations based only on energy minimization. As seen, these methods have their limitations:

• They are model-based to some extent. Even though a template is not matched, the model makes an assumption about overall low curvature. Moreover, many of the algorithms will fail to converge to the right edge unless places near it, or if the initial contour isn't fully inside or fully outside the object to segment (e.g.: Snakes).

• User interaction is limited to certain points in time. Interaction is in "batch mode": user can initialize the curve, put some constraints (mainly in [23]) and then "wait and watch". If result is unsatisfactory he have modify the curve and let it run again. The user cannot dynamically control the evolution of the curve, for example, if the contour approaches a strong edge but not the desired one, there is no way to enter that knowledge on the fly.

Live wire assumes no prior knowledge about the object to segment and it gives dynamic user interaction. The user controls and affects the segmentation all throughout the process. It is, to some extent, a combination between the benefit of Manual Segmentation and Computerized Segmentation. In manual segmentation we have the benefit of human recognition to reliably detect the object, while in computer-based segmentation we have the benefit of localization & accuracy. Live Wire leaves the control to the user and the accuracy to the computer.

Technically speaking, the methods discussed so far all use variational calculus to solve energy minimization problems. The following method deploys Dynamic Programming (DP) as the tool for solution. It must be noted that a DP solution to the Snakes problem was also suggested by Amini et. al. in [3].

Live Wire algorithm was developed independently by two groups – Mortensen & Barret [30-34] and Udupa et. al. [15-16] et al (initially working together on this [35, 50]. Later versions were developed using different boundary features such as Local Phase [37, 38]. In section 3.1 I will present the general Live Wire algorithm as formulated by Mortensen & Barrett [30-34]. I will then (section 3.2) present the changes in work by Udupa et. al. and will close (section 3.3) with the Local Phase version of the algorithm by O'Donnel et. al. [37, 38].

1 Live Wire (Intelligent Scissors)

In this section I will present the general Live Wire algorithms, giving examples for feature extraction from Mortensen and Barrett [30, 33]. Live Wire (also known as intelligent scissors) is a Dynamic Programming method for extracting a desired object's contour. The image is treated as a 2D directed weighted graph in which the nodes are image pixels and the arcs are 8-connectivity neighbors' links. Finding the boundary is implemented as finding the shorted path in the graph, where low cost path corresponds to passing near edges and other boundary-related features. For finding optimal path, Dijkstra's algorithms [12] is used, witch proves the path globally optimal.

To start interaction, the user clicks the mouse to create a seed (starting) point. Further user interactions are done by moving the mouse while a "Live Wire" stretches from the mouse point to the object's edge and "snaps" on the edges until completing the draw (see Figure 7). The user can, by moving the mouse, change the "snapped" contour. The user can fix a satisfactory contour by clicking a new seed point, which will now be the new starting point for the path search. Marking an accurate seed point can be a difficult task, therefore the system snaps to the strongest gradient point in the vicinity of the "click".

If the contour is left unchanged for a certain amount of time, "boundary cooling" occurs and automatic seed point is added. The model also incorporates "on the fly training", meaning new contour points learn from previous ones (see next). This enabled following the correct edge, rather then the stronger one.

[pic]

Figure 7: Live Wire behavior: The user-cursor path is shown in white. The corresponding detected boundary is shown in orange. The current Live Wire path in an intermediate point is drawn from "Free Point" through "Boundary Point" (green segment) to "Start Point" (orange segment). (Taken from [30])

a. Feature Extraction

As mentioned, we represent the image as a directed weighted graph in which the pixels are nodes and 8-connectivity connections are arcs. An arc-weight is low if the connected pixels feel a strong edge and vice-versa. For edge detection Mortensen and Barrett take the Laplacian zero-crossing, the image gradient and the gradient direction. To more specific, the const on a link between pixels p and q is given as follows ([30]):

[pic] (3.1)

where:

• [pic]= 0 or 1 depending on if q is on a zero crossing of the Laplacian.

• [pic], where [pic]is maximized over several kernels of different size.

• [pic], where[pic]is the link-vector between p and q and [pic]is the 90-deg rotation to the gradient vector at in pixel p. Note that this measure favors situations (will give low cost) where the pq-link is generally along the image-edge (90 deg rotation to gradient vector) of both pixels. Situations where the gradients are both high but not in the direction of the link are not favored.

• To account for distance differences between direct and diagonal neighbors, direct neighbors' weight in multiplied by [pic].

b. Dynamic Programming problem formulation

As the user first clicks the starting seed point, the DP algorithm starts expanding the search path. Actually, the algorithm expands a minimal spanning tree [12] (and not a single path) so when done the minimal path from every image point to the seed point is given. Now when the user moves the mouse the already-made path is retrieved and drawn.

During the process, pixels are marked by either: processed (expanded), not-seen (not reached yet) and active (the "front of the expansions").

This minimal spanning tree is expanded as follows:

1. Extract the pixel p from the active list with minimal cumulative cost to the seed point. Mark it as expanded.

2. Pick a not-processed neighbor q with minimal link-cost to p. Add q to active list with total cost: [pic].

c. Path Cooling and On The Fly Training

Path cooling refers to the system's ability to add "seed points" automatically along the way and therefore fix the path from this seed point to the starting point. This relieves the user from manually adding more and more seed. A point on the pass is considered "cool" enough and turned into a seed point by the time it stayed unchanged and by the number of repeated drawings it appeared in.

On the fly Training refers to system's ability to adapt and to favor new pixels with similar characteristics to the next edge point, rather then taking the next strongest edge point. This enables the Live Wire to follow a week edge next to a strong one. Training is achieved by looking and at last pixels in the already-detected boundary and prefer next-pixels with similar edge strength. Figure 8 shows Live Wire path with and without On the Fly Training.

[pic] [pic]

Figure 8: (a) Without "On the Fly Training": The Live Wire leaves the women's face for the man's jacket (stronger close-by edge). (b): Training helps Live Wire stay on the women's face (Taken from [30])

2 Live Wire and Live Lane

As mention, in parallel with Mortensen and Barrett's work, Live Wire was also developed by Falcao and Udupa [15-16]. Below are some of their variants in feature selection and user interaction:

• Three other new features are introduced (later also used by Barrett et al [33], all of them have a meaning only after training. It is the pixel's intensity (normalized to 0-1), and "inside" and "outside" pixels intensities. The last two measures refer to pixel intensities n pixels away from the current pixel in the direction and minus-direction of the gradient vector at the point (n is an algorithm constant).

• Further graph understanding is used in order to prune the expansion of the search tree in irrelevant directions.

• They have also extended the algorithm it to 3D (and 4D) using slice-by-slice segmentation. Previous slice segmentation is taken into account in the costs of the current slice segmentation.

• Falcao and Udupa introduce a variant of Live Wire called Live Lane in which the user does not "click" anymore other then the initial seed. Instead, the system restricts the search to a limited "lane" around the image boundary. The lane's width adapts according to the image gradient: it gets wider near stronger edges, assuming the user will act faster to detect strong edges and slower near weaker ones.

3 Phase-Based Live-Wire

Phase-based Live Wire was introduces by O'Donnell et al [37, 38]. In consists of a generally similar algorithm for path extraction only with a new edge-detection extractor based on Local Phase.

a. Local Phase

The global phase (or just phase) of a signal is defined as the complex-argument (phase) of the Fourier transform of the signal. The phase of the pure cosine signal is 0 since its Fourier transform is real and hence the complex-argument (phase) is a constant 0. Similarly, the phase of a pure sine signal is a constant [pic]. The phase can therefore be considered as a "shift from pure cosine", which is the zero-phase signal.

Now remember that pure cosine is an even function (symmetric about its origin) while pure sine is an odd function – so phase can be considered as the closeness to symmetry (or asymmetry) as you approach 0 ([pic]) phase respectively.

Local Phase is a measure similar to phase only with respect to a small window. The signal is looked at using a small window around each point, and noting down its symmetry (cosine, 0 phase) or asymmetry (sine, [pic] phase) at that window. See for example Figure 9 comparing the global and local phase at each point along a sinusoidal signal.

[pic]

Figure 9: Phase vs. Local Phase. Top: a pure cosine signal. Bottom: Local phase at each point is equivalent to the Fourier phase that would be calculated if that point were the origin. The local window was placed where the signal locally resembles a pure sine and therefore, in the bottom, has a local phase[pic].

b. Local Phase and Edge Detection

Local phase can serve as an edge detector by the following observation: The edge location in the image is where the signal is most asymmetric and therefore the local phase approaches [pic] (or [pic]). See Figure 10 for details.

[pic]

Figure 10: Edges in a single dimensional signal. Up-going edge correspond to a local phase of [pic] while down-going edge corresponds to an edge of [pic].

Extending edge detection using local phase to 2D requires some special directionality. This is obtained by 2D wavelet filters like Gabor filters or, in [37, 38], quadrature filters with radial frequency that is Gaussian of a logarithmic scale. See [38] for more details.

c. Live Wire Implementation

Using local phase as graph weights is not enough since the local phase measure is good at edge localizing but is independent of edge strength. Therefore, another "certainty" measure is introduces using the magnitude argument of the wavelets functions. A weighted sum of these two measures (phase and magnitude) gives the graph weights.

The optimal path implementation is done much the same as in [30, 33] adding graph understanding to compute only as much as needed (as in [15, 16]). Although training is implemented, it is shown that in some cases this algorithm performs well enough without training.

Graph Cut Methods

So far we have dealt with mainly contour detection, neglecting the internal boundary (except for some exceptions like Geodesic Active Contours). We have also demanded, in most the above algorithms, that user interaction will be either very intensive (Live Wire) or start close to the boundary or else will fall to a local minima (Active Contours). Graph Cut segmentation suggests a new way of in interaction in which the user only "marks" region pixels to obtain the segmentation. Graph Cut method is both a region and boundary detection scheme although interaction is only region-based. In Lazy Snapping [25] (see later), boundary editing is also incorporated.

Graph Cut methods treat the image segmentation problem as a min-cut problem on graphs. Originally suggested by Boykov and Jolly [5], the method is now widely used in many variants. In section 4.1 I present Boykov and Jolly's original work. In section 4.2 I present an almost automatic variant by called GrabCut [42] and in section 4.3 I present Lazy Snapping [25] – a fast tool for image segmentation using Graph Cut both as an image based as well as boundary based image segmentation. For a comprehensive comparison between Graph Cut methods and former methods presented here see [4].

1 Graph Cut

As said, Graph Cut segmentation was originally introduced by Boykov and Jolly [5]. It is shown here as an image segmentation technique but all arguments hold for N-D as well (in fact, it was originally introduced as an N-D method). The number of objects is also arbitrary. For simplicity of explanation I describe the process with one object and a background but all arguments hold for multiple objects (which I'll refer to as "foreground"). The user marks "foreground" and "background" pixels using free-form, wide-brush strokes (see Figure 11). The system uses the marked pixels as foreground and background seeds and implies "hard constraints" on them, meaning, the seed pixels must be labels exactly as the user marked, in the final segmentation. Apart from that, soft constraints are to reflect the way segmentation is to be obtained on the rest of the image.

[pic]

Figure 11: foreground and background marks are done using wide brush strokes. System infers foreground and background seed pixels (hard constraints) and segment accordingly. (Taken from [25])

a. The segmentation problem

A general segmentation is described as a labeling vector X in the form of:

[pic] (4.1)

where the optimal segmentation is the vector X that minimizes the energy:

[pic] (4.2)

In the above equations, P is the group all image pixels, N is the group of neighboring pairs of pixels (8-connectivity chosen), E1 is called the region tern, E2 the boundary term and [pic] is a constant that controls the relative significance of the two terms. For a given labeling, [pic] is low if[pic]indeed fits to be an foreground / background pixel as segmented. For two adjacent pixels with different labels [pic] is low if p and q are very different, which imply they're indeed fit to represent a border. Note that different border detection schemes could be used here (e.g. being near an edge etc.). Demanding Equation (4.2) for all unmarked pixels is referred to as applying soft constraint on the pixels.

Recall we have demanded a hard constraint on the pixels the user has marked, which means that in the resulting segmentation, the following labeling should hold:

[pic] (4.3)

Where [pic]are the user marked foreground (and background) pixels respectively.

b. Segmentation as a Graph Cut

The image is now treated as an undirected weighted graph [pic] where for each pixel p exists a node in V, and for each 8-connectivity pixels-pair exists an edge in E. Two special terminal nodes called source and sink where added to present "foreground" and "background" entities respectively. Edges were added from each pixel-node to the two terminal nodes. We will call edges between neighboring pixels n-links and edges from a pixel-node to a terminal-node t-links. See Figure 12 for a graphic explanation on this.

[pic]

Figure 12: Foreground/Background pixel nodes are marked in white/brown (resp.). The source and sink terminal nodes are drawn outside the image. n-links are drawn in yellow and t-links are drawn in white/brown. Links with high weight are drawn think. The minimal Cut best-separates the source, with all the foreground pixels, from the sink, with all the background pixels (Taken from GrabCut presentation by Rother et al)

A cut in a graph is the subset of the edges that, if removed, split the graph into two. The cost of the cut is measured by the sum of weights of the cut edges. The minimal cut is the cut with minimal sum-of weights out of all possible cuts. The problem of finding a minimal cut is closely related (actually equivalent) to the max-flow problem is weighted graphs and can be computed in low order polynomial time to obtain a globally optimal solution [6, 18, 19].

Back to the segmentation problem - each possible segmentation can be described as a cut that separates the source (foreground) from the sink (background). The minimal cut will generate the optimal segmentation with respect to the properties built into the edge weights. We only have to select the correct weights that will minimize Eq. (4.2).

l. Defining the Edge Weights

Edge weights should reflect the soft as well as the hard constraints. The soft constraints should reflect the probabilities that a specific "unmarked" pixel p belong to either foreground or background. These probabilities should be based on the pixels the user has already marked. To define those probabilities we simply use the foreground and background marked pixels to generate the normalized histograms [pic] respectively. The hard constraints should be reflected in very high / very low edge weights that will force the marked pixels to be in the desired side of the cut.

[pic]

Figure 13: Defining the edge weights. (a) User marks only one seed pixel for foreground and one for backgrounds (marked O and B respectively) (b) A Weighted graph is generated. Weights for t-links to background terminal node T and for n-links are shown (t-links to foreground terminal S are equivalent). (c) Minimal cut. (d) Imposed segmentation. (Picture taken from [5]).

We can now define the arc weights for the t-links of both hard and soft constraints see figure 13 for reference:

[pic] (4.4)

where K is a large enough constant to not be included in the minimal cut (say more then the sum of all soft constraints weights. Note that [pic] is the group of uncertain pixels to segment. To understand table 4.4 take, for example a pixel p with high probability to be frg, i.e. [pic] is relatively big. It follows that [pic] will be relatively small and so will be the weight on the {p,T} link. In the final segmentation, this edge has a good probability of belonging to the cut, therefore separating p from T and leaving it in the S (foreground) side of the cut, as expected.

We are left with specifying the costs of the n-links. In order to correctly separate between pixels that have opposite frg / bkg belonging, we must give such neighboring pairs a low-cost weight on their link (so their connecting edge will be included in the cut). The following cost function gives a low cost to pixels with different intensities:

[pic] (4.5)

Note that normalization is done by pixels-distance since adjacent pixels can relate by diagonal direction. The use of [pic] gives the ability to disregard noise. Of course, any other measures for belonging to region or being on two side of the contour can be used (e.g: prior knowledge by training, edge detection for boundary etc.). In order to complete the weighting we need only to multiply the n-links weights by[pic]and the t-links by [pic] from Equation (4.2).

It is clear that the weights above directly reflects the desired reduction of E1 and E2 in (4.2). Getting the minimal cut with respect to those weights will give us best segmentation with respect to the overall energy. This is of course intuitive from the minimal cut definition but is also proven mathematically in [5]. The writers also show how the weights could be adjusted if the user adds some hard constraints and how the min-cut can be computed without repeating the whole process again.

2 Grab Cut

GrabCut algorithm by Rother et al [42] takes the above Graph Cut segmentation even further. It applies the min-cut algorithm iteratively, inferring unknown knowledge from one iteration to another. This allows it to minimize user interaction to just specifying the background (and not the foreground) pixels. User interaction is therefore minimized to placing a rectangle or a lasso around the object (with respectable distance) instead marking brush-strokes (see Figure 14). Moreover, GrabCut generalize the segmentation to color images by replacing the gray level histograms (based on foreground and background seeds) by Gaussian Mixture Models in (GMMs) RGB. The algorithm is extended to be an "image cutout" (rather then a segmentation) algorithm by using image-mating for opacity values[pic]on the objects' borders. This will not be described here as it is beyond the scope of this work.

[pic]

[pic] [pic]

Figure 14: Interaction using GrabCut. Top: the user places a rectangle around the object to segment. Bottom: The user manually draws a closed, free-form shape around the object. (Taken from [42])

a. Color Data and GMMs

To handle color images well, GrabCut describes the regional distributions as Gaussian Mixture Models instead of simple histograms. A GMM is a weighted combination of K (k=1..K) Gaussian distributions where each Gaussian has it own mean [pic], covariance matrix [pic], and is multiplied by the weight[pic]. Overall, for both foreground and background, we get 2K Gaussians.

As for the boundary term, GrabCut uses the same equation as Graph Cut, only changing the intensities difference to be a 3D Euclidian distance in the RGB space.

m. The algorithm

The algorithm starts by initializing the background seeds with the internal border of the user-placed rectangle or lasso. Foreground pixels are initialized to [pic].

The algorithm resembles a little the k-mean and EM algorithms. It iteratively matches each pixel with the best suiting Gaussian (out of the 2K foreground and background Gaussians), creating a group of pixels associated with each Gaussian. It then computes each Gaussian's parameters[pic] (as mean), [pic] (as covariance-matrix), and [pic](as just being the ratio of pixels that belong to it, out of all pixels). The last step is to use the min-cut with those distributions as described above. Below is the full algorithm:

1. Initialize[pic]

Initialize GMMs parameters [pic] (evenly/randomly)

2. Repeat (until constant energy)

a. [pic] assign the best Gaussian => 2K clusters

b. For each cluster calculate [pic] => 2 GMMs

c. Find Min Cut => U decreases

n. Incomplete Labeling

By placing the rectangle (or lasso) around the object, the user specifies the background pixels. Foreground pixels are not specified. This means that only background hard-constraints are used. Still, due to iterative minimization of the energy, in most of the situations it's sufficient to come up with GMMs, and labeling, for both foreground and background. The algorithm uses provisional labeling for foreground pixels, meaning they can subsequently be retracted. Only the background pixels are taken to be firm, never to be retracted.

o. User Editing

Further user editing is also supported. The user can add brush-strokes as in Graph Cut for both foreground and background seeds. As in [5], in the event of repeated user editing, the previous min-cut run is continued after some weights modifications.

3 Lazy Snapping

So far we have shows segmentation methods in which interaction was through border drawing (Snakes and Level Sets) or tracking (Live Wire) or through region marking (Graph Cut, GrabCut). It is clear that a method the combines both the ease of region marking and the accuracy of border editing can lead to better results. Lazy Snapping [25] does just so – it enables the user to start by specifying foreground and background marks, and edit the result by border editing. Furthermore, it attacks the performance problem of extensive region-based calculation by working first in higher scale. This is well suited with the dual interaction scheme – Region Marking produces coarse-scale segmentation while Border Editing fixes errors on an accurate (pixel) scale.

As GrabCut, Lazy Snapping is designed as a full image-cutout tool providing comfortable user-interaction and real-time response (after every user mark/edit). As such it also deals with border-mating and opacities (which again, will not be handles here).

a. Weights Modification

Lazy Snapping algorithm uses a similar graph construction and weighting as in Graph Cut a few modifications. It, too, applies on color images when instead of histograms for estimating the color similarities (region-term) it uses the following scheme: the foreground and background seed pixels are clustered using the K-means algorithm (see [14]]) eventually resulting with the clusters centered at [pic](foreground and background respectively). The number of clusters was chosen to be 64. Then, two measure of distance to the nearest cluster-center are defined:

[pic] (4.6)

where distances are defined in the RGB space. Using this, the last line in table (4.4) is changed and the table becomes:

[pic] (4.7)

The border term (4.5), it is changed to be the following function:

[pic] (4.8)

b. Graph Cut on Image Components

The original Graph Cut algorithm creates a graph with a node per pixel. To allow fast response for user interaction, Lazy Snapping constructs a graph where each node is a small image component. For this, the image is first over-segmented using the watershed algorithm [51]. The graph is then constructed with nodes representing segmented-components and n-links representing neighboring components. Segments' mean-colors are used instead of pixel intensities.

Region weights are obtained as in (4.7). For the border weights, Equation (4.5) is used, except that it's weighted with the shared boundary length between the regions.

Watershed segmentation and graph construction can be done once (e.g. when loading the image) allowing for very fast segmentation, following every user action.

c. Border Editing

The resulting segmentation can be subjected to errors, either due to the watershed algorithm or because of using Graph Cut on a coarse version of the image. A correction step must be supplied to fix these errors on a fine scale.

To do so, the segmentation result should first be obtained as a polygonal contour. First, the polygon is initialized with the point of highest curvature. Then, at each iteration, the boundary point being furthest away from the correct polygon is added. The process is done when the distance of the next boundary from the polygon is small enough.

The user can edit the polygon in two ways: he can either drag a vertex or he can mark the part of the boundary to be replaced by a thick brush. Both ways – a wide band around the new proposed boundary segment is constructed and Graph Cut segmentation is applied in this band. This time, segmentation is done in the pixel level (see Figure 15), using the inner (outer) part of the band as the foreground (background) hard- constraints pixels. The rest of the pixels in the band form the uncertainty area. Once the user releases the mouse, segmentation is applied and the contour part is immediately updated. The width of the band is taken to be 7 pixels as a default.

The region energy term in the border-graph-cut does not change from (4.7) but the boundary energy is added another term[pic]to handle low contrast edges. [pic]is inversely proportional to the distance between the polygon and the mid-distance between p and q. This term is scaled to be of no significance when the color difference between p and q is large but when it is low (low contrast edge, no gradient information) it keeps the polygon smooth, favoring low changes in position.

[pic]

Figure 15: Defining the boundary around the contour in which re-segmentation is performed after editing (taken from [25])

Interactive Segmentation by 1D Ray Matching

I present here a novel method developed as a combined segmentation and editing tool. It incorporates a new user interface that better combines user knowledge and semi-automatic computer segmentation. It is aimed primarily (but not solely) for medical images where objects' borders are not well defined and user knowledge is valuable. The method's name is called in short: "Ray Segmentation".

The outline of this chapter is as follows: Section 5.1 presents the motivation for the proposed user interface. Section 5.2 Discusses previous work regarding image segmentation using 1D rays. Section 5.3 gives the general outline of the algorithm. Section 5.4 describes the use of 1D Ray Matching for localizing the segmenting polygon. Section 5.5 introduces a regularization term. Section 5.6 explains the use of multi scaling for adapting to various object sizes while maintaining good performance. Section 5.7 elaborates on the various interaction and editing modes and Section 5.8 shows and discusses results and comparisons to other methods.

1 Proposed User Interface

All interactive image segmentations try to find the "golden path" between fully automatic image segmentation and fully manual user drawing. They all try to balance between the user's ability to specify initial hints and the computer's ability to automatically complete them.

We've seen so far two different paradigms in user interface: In Active contours and Graph Cuts methods the user supplies "initial drawing" and lets the algorithm complete the rest where in Live Wire the user has tight control over the process but no ability to edit results. I present here the pros and cones for each technique and suggest a new interaction method. For a detailed discussion on user interaction techniques in medical image segmentation see [17].

Active Contours methods let the user draw an initial contour that has to be either fully enclosed by object or to fully enclose the object. Some of the implementations reduce interaction to a click inside the object (as in "region growing") that is translated to a drawing of a small circle. The "in or out" decision is built into the specific implementation since all implementations have a "balloon terms propagation speed" that is pre-configured to be either positive or negative. Drawing the initial contour can be regarded as giving "loose hints" to the algorithm. If the user wants to supply specific contour points he has no way of doing so.

In Graph Cut method the user also uses "loose hints", this time by drawing thick brush marks inside and outside the object. "Lazy Snapping" also suggest user editing and even placement of control points but only for "fine tuning" the borders and not as a primary segmentation guidance. Moreover, the initial drawing is not very user-intuitive: this method doesn't fit for a user who wants something similar to the known "manual interaction", that is, to "roughly sketch" the object and let the system do the rest.

Live Wire methods are designed to give the user more control. The user steers the segmentation throughout the process creating the desired borderline. He can also add control points along the border. By doing so he has a better way of introducing user knowledge into the process. However, this form of interaction is non-intuitive since the contour is built sequentially without giving the user a good opportunity to correct errors (apart from backtracking all the way to the last good location).

As described in section 1.4 regarding Medical Image segmentation, it seems there is no single segmentation method that can fit all medical images. This comes from the variability of medical image modalities and object types, the presence of extensive noise and lack of sharp edges. The other factor is the expert's need to sometimes specify regions that does not correlate with edges in the pictures (e.g: when wanting to specify margins around an object, wanting to draw part of an object etc.). It is clear that a combined editing-segmentation tool can do much better where any pure-segmentation tool failed, if it's designed to flexibly "adjust itself" between "semi-automatic" segmentation to "almost-manual" one. This is the exact motivation for the interactive segmentation method presented here.

I describe here a new user interaction scheme which is both more intuitive to the user and enables him to naturally specify more information. Interaction is very simple: the user roughly sketches the border by clicking just a few required control points where the final border must pass. These points can be drawn either as a polygon or just as a set of non-ordered points. The algorithm is then applied to produce the full segmentation contour (see figures 16). When done, the user can add, delete or move control points and let the system re-segment the border, continuing from last position and applying changes. All actions (completing initial drawing or modifying it) are automatically followed by a segmentation process.

It is clear why this form of interaction can succeed where other segmentation schemes fails. If the object to segment is clear and simple, just a few (3-4) control points will do and the method can be regarded as "semi automatic. In border-areas of complicated structure and in "featureless" border-areas (which the expert wants to specify) – adding more control points will fix the result to exactly what the user wants. All in all, the number of control points to place is usually very small (less then 10) and is very far from real manual editing.

[pic] [pic]

(a) (b)

[pic] [pic]

(c) (d)

[pic] [pic]

(e) (f)

Figure 16: Proposed user interaction: The user draws as small set of border points and the system completes the rest. (a): The user clicks 3 points. (b): the system automatically completes the border polygon (the user can alternatively postpone the segmentation until he has enough points). (c): The user adds a forth point and the system automatically updates the segmentation respectively. (d): The final segmentation border without the control points. (e): The user can specify the control points' connectivity by drawing an initial polygon. (f): The system automatically calculates the segmentation border once the polygon is completed.

2 Image Segmentation using 1D Rays – Previous Work

The use of 1D ray profiles in 2D image segmentation is not very common. In [13] D’Souza et al segment airway tubes using Full Width Half Maximum ray propagation method. They extract 1D radial rays from the airway centroid (to be later refined by segmentation). For each ray-profile they then find the peak and two minima before and after the peak. The airway's inner (and outer) walls are defined as the average location between the peak and the inner (and outer) minimum respectively. This method may face instabilities in the presence of noise due to the dependency on absolute minima and maxima

Wink et al [55] also use radial ray propagation to segment blood vessels. The vessel border is defines as the first peak in the directed gradient along the ray that passes a certain threshold. This method may face problems when the vessel edge is not sharp.

To tackle this problem, Tek et al [48] use mean shift filtering, they, also extract radial rays from a vessel centroid but they then apply mean shift filtering on the 1D rays. The filter enhances the edge dramatically creating a "2-modes" 1D signal. In that signal, the transition between inside (bright) and outside (dark) is a dramatic step function that can easily be detected.

All these methods assume two common things: the presence of edge at the border and some "roundness" of the shape (allowing for radial rays). All of them fail if the required border passes along some distinctable feature which is not a clear edge (for example, wanting the crest line and not the max-gradient line) or if the object is far from resembling a round shape (say, when detecting a long rectangle). I suggest here a new method that deals with those two problems.

As for the first problem, instead of edge detection I look for matching of 1D signals along the normals (perpendicular to the polygon). In each iteration, for each point on the polygon, I look for several new candidates along its normal. For each I extract a 1D segment around it (in the normal direction) which is then matched against the segment that surrounds the neighboring control point(s). The candidate that produces the best match is the new point's location. Signal matching looks for "best match to pattern around control point" rather then best edge.

As for the second problem – the algorithm starts from a convex polygon defined by control points, which is split to include many vertices. The vertices can advance iteratively, each time searching for best new location along the normal direction (and not in the radial direction). These iterations can evolve the shape to match any arbitrary object shape. A regulatory term is added to keep the polygon smooth but not too tight. Iterations end when the two forces (the matching term and the smoothness term) balance themselves and the polygon doesn't evolve anymore.

3 Algorithm Outline

From practical reasons I have chosen a maximization scheme rather then minimization one (which is clearly equivalent). The algorithm tries to maximize the following functional:

[pic] (5.1)

where [pic] is the polygonal border curve at time t ([pic] refers to an ordered set), [pic] is the ordered set of control points (which is a subset of the polygon points at any time) and I is the image to segment. Maximization is done on [pic] and [pic] which are grades given to the points by ray-matching (external force) and neighbors-balancing (internal force) and [pic] is the forces-balancing factor. The two grades will be explained below.

Now let's look at a specific point in iteration k and mark it as[pic] (from here on I use the discrete iteration-number k instead of the continuous time t). In order to maximize the sum in Eq. (5.1) we need to maximize for each point the combined "grade" given by:

[pic] (5.2)

Maximization is done by searching for the best candidate that maximize (5.2) alfong the previous point's normal and in a certain search range.

Now let's describe the general outline of the algorithm. The user clicks just a few control points. An initial polygon is formatted out of these points based on Convex Hull or some other way (see section "Detailed User Interaction" later). It is then split to produce a vertex every [pic] pixels (I've used [pic]), maintaining the original control points. In each iteration, for each pixel, the best new location along the normal is searched based on eq. (5.2) to form the new border polygon[pic]. Control points are never moved (they are skipped). The algorithm stops when there is no change in polygon points. In the next sections I will describe in details the formulation of the two terms in (5.2).

4 1D Ray Matching

a. Definition the Match-Grade

We want to define the match-grade[pic]. We first extract 1D ray-segments (called here "rays") of length L from each control point by sampling the image I along the point's normal (defined next) L/2 times in the [pic] direction and L/2 in the [pic] direction.

We want all vertices between two adjacent control points to move in iteration k to the location that best-matches the two control points rays. For that we define a search range D in which each point is allowed to move relative to the location in previous iteration. We "slide" our point along the normal (D/2 to each direction) creating a candidate new point for each shift d.

[pic] (5.3)

where [pic] is the normal based on the previous polygon.

Let us now assume we have a correlation tool [pic] that correlates two rays of the same length and gives higher results for better correlation. Using it we can define the match-grade of our candidate point to each of the two neighboring control points:

[pic] (5.4)

where[pic] and[pic] are the control points before and after [pic] along the polygon. We now write the combined match-grade for a d-shift candidate point as:

[pic] (5.5)

where [pic] is a relative weight of each control point based on the distance along the polygon:

[pic] (5.6)

and dist() is the curve-length between two points along the polygon.

p. Definition the correlation method

We still have to define the[pic]tool for correlating two rays of the same length L. For that purpose I've simply chosen the sum of absolute differences, normalized to give a result in the range 0-1 where 1 means a higher (better) match.

[pic] (5.7)

Where a, b are the two 1D rays (with length L) to be compared. The normalizing factor MaxVal is the maximum value of the 3 rays to correlate: the ray of the candidate point and the two rays of the neighboring control points.

q. Definition of Normals

So far we have not exactly defined the points' normals, which should serve us both as the direction to look for the next best point and the direction to extract the 1D rays.

As for non-control points, things are pretty simple: each point's normal is an average of the few neighboring edges normals from previous iteration. I use 2 edges before and 2 after for the average. This defines both the direction to advance as well as the direction to sample the ray.

Control points do not need to advance but need to supply stable and meaningful sampled rays. Taking the same normals as for non-control points will result in rays that change between iterations, which is a non-desired feature. Taking the normals from the initial (control-points based) polygon may not be good enough since it may point to a meaningless direction (in terms of image information). I therefore sample a few directions around the normal described above (initial polygon normals) and take the ray that gives maximal difference between minimum and maximum values. I chose to sample only 3 angle values: -45, 0, 45, where 0 is the initial-polygon normal direction.

5 Regularization Term

In order to keep the evolving polygon smooth we need a regularization term. This is equivalent to the internal forces used in Active Contours. The force defined here is designed to pull a point from shifting too far along the normal. The point is "advised" by its neighbors from previous iteration on where it should move to in this iteration (what shift should it have along the normal). Any difference between the point's suggested shift (by matching) and the neighbors "recommended shift" is panelized as a low (bad) "neighbors' grade".

a. Neighbors "recommended shift"

Assume the point from previous iteration [pic] has moved in iteration k to a new location [pic] which is the best of all candidates[pic]. Let's define [pic] as the set of indices neighboring to index i on the polygon (this property is independent of iterations and of points location; I chose 2 neighbors from each side ([pic])). Now take a neighboring point [pic] ([pic]). Its recommendation for the next location of [pic] is defined as it's projection onto the point's normal direction (see Figure 17a).

[pic](a)

[pic]

(b)

Figure 17: (a) Each of the points' neighbors are projected onto the point's normal and the average of all projection is taken. (b): Introducing a new point's candidate: Subtracting the average-shift from the candidate's shift we get the penalty.

As seen in Figure 17, each of the neighbors gives its recommendation for the shift along the normal. The average shift (average location) is taken as the neighbors' recommended shift. The closest a candidate is (along the normal) to the neighbors' recommendation, the higher his neighbors' grade be. Note that when choosing the num of neighbors to be 1 from each side this regularization method is equivalent to "locally minimizing" the polygon's length.

r. Neighbors' confidence

In order to speed up convergence we want to give "confident neighbors" the ability to have more affect on the next point's location. We take the "confidence level" as the neighbor's final grade in previous iteration (from eq (5.2)). The average location is therefore a weighted average were the weights are grades from previous iteration. The equations-set below sum up the definition of the neighbor's grade.

The weighted average shift is defined as:

[pic] (5.8)

And for a given shift [pic] (proposed by a new candidate) the neighbors-grade is given by:

[pic] (5.9)

where D is the search-range as defined above.

Putting the notations introduced in (5.5) and (5.9) let us write the maximization process of the functional (5.2) as a maximum on all possible d-shifts:

[pic] (5.10)

and so the new point [pic]is the candidate point [pic] with the d-shift that maximizes (5.10).

6 Multi Scaling

a. Scale dependant computations

When the polygon evolves between iterations the next contour is searched in the range of D/2 pixels in each direction. We want the algorithm to be able to detect contours far from the initial polygon without "paying" in performance. The solution is given in the form of a "Multi Scale" approach, that is, all computations are modified to be scale-dependant (rather then creating scaled versions of the image). Introducing an [pic]scale factor computations modifies the algorithm as follows (k represents iteration dependency, not a power argument):

1. Advancing within the search range we jump at [pic]pixels jumps. This means that instead of searching in a D-pixels range we search a[pic] pixels range but the number of search points stays D.

2. Sampling the L-pixels rays we use[pic]pixels jumps between samples. This makes the rays cover an[pic]distance in image with only L samples.

3. Distances take [pic] into account. This includes the shift d and average shift [pic]in equations (5.8) and (5.9).

b. Defining the scale levels

We still need to define the different scale levels to be used between iterations. I chose following scheme:

• Use a pre-defined set of scales (I chose the decreasing sequence 4, 2, 1.5, 1). Make sure the series ends at scale 1 (no scale).

• If the iterations number exceeded the number of scales, the algorithm continues with the minimal scale (1).

• Algorithm does not stop before it has processed the minimal scale (minimal iterations number is therefore the number of scales). This makes sure we get the maximal possible accuracy.

• The algorithm must use a scale that is proportional to the bounding box of the control points set. This is illustrated next.

c. Shape size adaptation

We want the algorithm to adapt its scaled search range [pic] to the size of the shape to segment. We therefore want the maximal scale to be proportional to shape size. Let's define[pic] as the size of the bounding box of the control points. To prevent different scales in x & y direction we've chosen [pic]. The choice of minimum enables handling cases of narrow shapes, in which any higher scale would make sampling in the "narrow direction" completely irrelevant.

I've argued above that the algorithm must use a shape-adapted scale[pic](we'll later see where it stands in the scales sequence). I've chosen to define [pic] as the scale that will enables examining candidates as far as half the shape's bounding box (taking into account the regularization force we can expect closing gaps of ~1/3 of the shape's bounding box). From this restriction we get:

[pic] (5.11)

Which follows that:

[pic] (5.12)

To combine the shape-adapted scale [pic]with the other pre-defined scales sequence I use the following rules:

• If the maximal pre-defined scale is smaller then [pic] I add [pic]as the first scale

• Otherwise: I look for the scale which is directly higher then [pic]and remove all scales above (before) it. In this case [pic] is not added to the list and (the scale directly above is used instead).

These rules guaranty that exactly one scale will be proportional to the bounding box value b (will be [pic] or more) and all other scales will be smaller.

7 Interaction and Editing Details

In this section I'll detail the different user interface mechanisms and how they are implemented. I will split them to "Initial Interaction" and "Editing". In general, the user draws an initial polygon or just clicks some control points. This is immediately followed by an automatic segmentation process that produces a segmentation polygon. The user can then add, move and remove control points where each action is again followed by an automatic segmentation which refines the resulting polygon. I will detail below the different initial and editing modes which the user can choose:

a. Initial interaction

The user has two interaction modes in which he can specify the initial control points: drawing an initial polygon and adding non-connected points:

1. Drawing a polygon: In this interaction mode the user draws a closed polygon by clicking a few points on the desired contour to segment (see Figure 16 e-f). Each user click adds a vertex which will serve as a control point. When the user closes (finalizes) the polygon, it is immediately split to produce a vertex every [pic] pixels (see above), and the segmentation process is initiated automatically. This form of interaction allows the user to define not only the control points but also their order.

2. Adding Control Points: In this mode the user simply adds control points on the object border without forming a polygon and without paying attention to order (see Figure 16 a-b). Segmentation initiates automatically when the third point is added. The user has the ability to disable auto-segmentation, add more then three control points and initiate the segmentation process manually. In this case the points' order is undetermined. To overcome this I take only the points' convex hull and throw away the concave vertices. The points' order is now determined by the convex hull. Further user clicks are treated as editing (see next).

s. Editing

Once a segmentation polygon exists (regardless of the initialization interaction used) the user has the ability to add, delete and move control points. Instead of modifying the set of control points and re-initialize the process, only the polygon part that is affected by the edit is modified and the algorithm continues from that point. The only thing that is not kept from previous run is the points' grades (confidence level), which enables the algorithm to gain new confidence levels based on the modified situation.

Below are the modifications done on the polygon as a result of each editing action:

1. Move a control point: The user can select and drag a control point. This deforms the segmentation polygon as follows (see figure18):

[pic]

Figure 18: User has dragged the "original ctl point" upward. The part of the polygon between the two neighboring ctl points is replaced by two straight lines. The lines are split to many vertices and the algorithm continues from this point.

First the two neighboring control points are located and the polygon part between those points is deleted. Then two straight lines are stretched between these points and the new ctl point location. The lines are split to produce many vertices (as done with the initial polygon) and the algorithm starts iterating from this polygon.

2. Add and Drag a control point: Second form of editing allows the user to "catch and drag" a polygon in a place where there is no previous ctl point. The user "clicks down" on a location along the polygon and drags it to the new desired location. On the "mouse-down" action the closest polygon vertex is turned to a ctl point, which is then moved to the new location (on "mouse-up"). We then get a similar situation as in Figure 18 with the only difference of a new ctl point rather then a moved one. The algorithm continues from here in a similar way to moving a ctl point. If the "closest polygon vertex" to move can is a ctl point itself, no new ctl point is added and the action is equivalent to dragging an existing ctl point.

3. Add a control point: If the user chooses to click a new point far from the polygon without dragging we get a new form of interaction using the same implementation. Instead of clicking next to the polygon and dragging to a new location the user can simply click (down and up) on the new desired border location (see Figure 16c). The same location will be used both for finding the closest polygon point and for setting its new location.

4. Remove a control point: The user can select a ctl point and remove it. By doing so the polygon section between the two neighboring ctl points (including the removed point) is replaced by a straight line. The line is split as above and the algorithm starts iterating from this point.

Summing it all up we get the recommended form of interaction: The user starts clicking control points from a "blank" situation (interaction a2). After the third click auto-segmentation is performed and the segmentation polygon appears (alternatively he can draw an initial polygon (a1)). Any further click refines the polygon (b3) until the user is satisfies and stops adding points. "Click and Drag" interaction (b2) is used only in ambiguous situations as to "what part of the polygon" should move to this new location. Once segmentation is completed, the user can refine localization by slightly moving control points (b1) in higher zoom.

8 Results

a. Complexity and Performance

The algorithm was tested with great success on a verity of medical and realistic images. Simple objects were segmented with no more the 4-5 control points (especially medical images) while detailed images required more (up to 20). The algorithm almost always stopped after less then 10 iterations (I've limited the number of iterations to 10, for the rare case that it doesn't).

It is clear that the algorithm complexity is[pic] K is the number of iterations, n is the number of vertices, D is the search length and L is the ray profile length. Configuring D and L to their fixed values and taking K to be not more then 10 we get an algorithm liner in the number of polygon vertices, that is, liner in the polygon length (recall a constant subdivision by[pic]). Benchmarking it on a Pentium III 650 MHz, a 100x100 square object (a fairly large object on a 256x256 image, with arc length of 400 pixels), running 10 iterations, took 0.25 sec. A 200x200 square object (a fairly large object on a 512x512 image) running 10 iterations took 0.5 sec (as expected when doubling the arc length). On Pentium D (dual core) 3.4 GHz the above benchmarks took 0.05 sec and 0.1 sec respectively, which is clearly a real-time performance!

b. Reaching Far from initial solution

Recall the reason for introducing of Multi-Scaling. I wanted to enable the contour to evolve far from its initial position without loosing performance. Below (Figure 19) is an example with initial and final polygons shown together. A gap of ~1/3 of the object's bounding box is closed (as expected in "Multi-Scaling" section above). Note the small amount of control points used here (4).

[pic]

Figure 19: Segmenting with just 4 control points. The Multi-Scale feature enable closing gaps of ~1/3 of the object's bounding box.

c. Segmenting thin structures

Figure 20 below show several thin structures in medical images (brain, lung and muscle-layer) that are segmented correctly with just a few control points. Although nothing was done to prevent the polygon from self-intersections, they rarely occur due to the regularization forces. If self intersections do occur they can either be removed (this behavior is configuration dependant) or the user can eliminate them using the "Drag and Drop" interaction.

[pic] [pic]

(a) (b)

[pic] [pic]

(c) (d)

Figure 20: Segmenting thin structures in: (a-b) CT image of the lungs. (c) MR image of the Brain.

(d) Muscle tissue layer in an MR image of the abdomen.

d. No-Edge contours

As described above, one of the strengths of this method is the user's ability to force the contour to pass in location of no edge. It can be a different kind of feature (crest – maximal intensity instead of maximal gradient), a blurred edge, a specified distance from a known edge or a completely "flat" (featureless) area. Below are some examples.

Figure 21 shows a segmentation of an object with a blued edge, as can often happen in medical images. Using more points in the blurred area, the user can specify exactly where the contour should pass.

[pic]

Figure 21: Segmentation of an object with blurred edge. The top edge is blurred and therefore requires more control points. The bottom edge is clearer and is segmented using only 2 control points.

Figure 23 a-b shows the ability to segment an object with positive or negative margins. Remember that the algorithm looks for matching 1D profiles ("rays") and sets the polygon point to be in the same relative position on the profile as the control-point is on its profile. Segmentation with margins can be used, for example, in cases of malignant tumors where the doctor wants to mark tumor for extraction (or ablation). The doctor can define positive margins to verify no cancerous tissue is left in the body.

Note also the control point that was put on the "flat" featureless area. This shows the user's ability to force the contour to pass where there is no edge at all.

e. Comparison to other methods

I show here a comparison between the 1D ray segmentation, Level Set method (Fast Marching variant) and Live Wire method. The image in figures 22 and 19 was taken from a Fast Marching demo applet in James Sethian's home page (publisher of the Level Set and Fast Marching articles [1, 27, 39, 44-47]). The image shows CT a cross section of the abdomen, including the lungs (in black), the liver (left organ), the heart (right organ), and more. Below (Figure 22) are results based on the Fast Marching applet, Live Wire and Ray Segmentation.

[pic] [pic]

(a) (b)

[pic] [pic]

(c) (d)

Figure 22: Comparison to other methods, segmenting the liver (a): The Level Set method was diverted by undesired edges and the user had no ability to direct it. (b): The Ray Segmentation method segments the object correctly using only 4 control points. (c) The Live Wire method does poorly with a small amount of control points but succeeds (d) using more points.

It is evident in this case that the Level Set approach did poorly since it was distracted by undesired edges which the user had no ability to ignore. The Live Wire method did well but with more control points then the Ray Segmentation method (trying to use the same set of points we get poor results). The reason is it's pixel-based path finding approach which leaves no place for Multi-Scaling.

Another comparison is given in Figure 23 trying to force a contour away from edges. In this case we want to segment an object adding positive margins. The Level Set method has no interaction mode for this purpose and is therefore not shown. Trying to do it with the Live Wire method gives poor results since the contour is always attracted back to the edge.

[pic] [pic]

(a) (b)

[pic]

(c)

Figure 23: (a-b): Segmenting of the liver with positive and negative margins. The contour follows the object with a fixed margin. The rightmost control point in each image is placed in a completely "flat" area, forcing the contour to pass in the desired location away from any edge. (c) A false attempt to segment an object with margin using the Live Wire method. Fast Marching method is not shown since it has no user interface for this purpose at all.

A third comparison is done on the same image trying to segment the heart (Figure 24). Due to different blood flow in the different chambers, the heart is not colored in a single color or texture. Using Level Set methods will require a few segmentations and a unification step. Live Wire methods can cope with this situation but again, with no margin ability and no editing ability once the process is done. Using Ray Segmentation the user can segment the whole heart with just a few clicks (same as LiveWire) and fine-tune the result later (as opposed to LiveWire).

[pic] [pic]

(a) (b)

[pic] [pic]

(c) (d)

Figure 24: Segmentation of the heart: (a-b) Two false segmentations using Fast Marching method, both segmenting only part of the heart with constant characteristics. Segmentation (b) shows the method's sensitivity to weak edges. (c) Correct segmentation by Live Wire method. Editing can be done only during the flow and not after it. (d) Correct segmentation using "Ray Segmentation" method with a number of control points similar to Live Wire. Further editing can be done.

f. Realistic images

I have applied the algorithm to realistic (photographic) images as well with good results. Realistic images, as opposed to medical images, usually have sharper edges that separate objects and therefore can be segmented well using edge-based methods. However, realistic images sometimes have too many edges that can confuse may segmentation algorithms. Problems of occlusions can also damage the segmentation result. All these can easily be solved by introducing the user's knowledge in the form of hard constraints. The Ray Segmentation method does exactly that, enabling the user to use control points to fix errors of the automatic segmentation.

Figure 25 shoes the classical Lenna picture, segmenting its hat. Two typical problems are faced here, marked with red arrows. One is the presence of a highly textured area - the hat is occluded by fur, which has to be skipped smoothly. This succeeds by putting a single control point in a relatively "flat" location. If no such location exists, putting enough control points along the way will force the contour to "cross" the occlusion of the fur and reach the other part of the hat smoothly. The second problem occurs where the hair meets the hat at the bottom-right part of the hat. The problem is of multiple close edges, generated by the changing color of the hair, by the forehead and the hat. The user wants, of course, the specific edge separating the hair from the hat. All he has to do is to correctly place the two control points that define this edge.

[pic]

Figure 25: Segmenting Lenna's hat. Note specifically how correct placement of control points solved ambiguities and noise around the hat's fur and the meeting point between the hat and the hair (both marked with red arrows)

The last image (Figure 26) shows how multiple occlusions can be dealt. The initial segmentation Figure 26(a) captures the camera man correctly, except from places where he is occlude by its photographic gear. Figure 26(b) shows how placement of an extra control point in the middle of each occluding part of the tripod can force the contour to ignore distractions by irrelevant edges.

[pic] [pic]

Figure 26: Segmentation with occlusions. (a): The camera men is segmented with a relatively low amount of control points but with problems around the tripod legs. (b): Putting a control point in each occluding element forces the contour to pass in the desired path.

Conclusions

We have presented a new Interactive Segmentation technique which combines editing and interaction capabilities into one tool. This technique combines user knowledge and computer speed, introducing an intuitive form of interaction with a simple 1D correlation tool. We have shown the algorithm's great performance on medical images as well as in realistic ones. Finally, we have compared results with other interactive segmentation techniques. We have shown that our technique performs better in different ambiguous situations since it enables the user to resolve ambiguities by specifying exactly where he wants the final contour to pass.

Reference

[1] D. Adalsteinsson, and J.A. Sethian, A Fast Level Set Method for Propagating Interfaces , Journal of Comp. Physics, 118, pp.269-277, 1995.

[2] Adobe Photoshop User Guide. Adobe Systems Incorp. 2002

[3] A. A. Amini, T. E. Weymouth, and R. C. Jain. Using dynamic programming for solving variational problems in vision. IEEE T. Patt. Anal. Mach. Intell., 12(9):855–867, 1990.

[4] Y. Boykov and G. Funka-Lea, Graph Cuts and Efficient N-D Image Segmentation, In International Journal of Computer Vision (IJCV), vol. 70, no. 2, pp. 109-131, 2006

[5] Y. Boykov and M.P. Jolly, Interactive Graph Cuts for Optimal Boundary & Region Segmentation of Objects in N-D images. In International Conference on Computer Vision, (ICCV), vol. I, pp. 105-112, 2001.

[6] Y. Boykov and V. Kolmogorov. An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. In 3rd. Intnl. Workshop on Energy Minimization. Methods in Computer Vision and Pattern Recognition (EMMCVPR). Springer-Verlag, September 2001

[7] J. Canny, A computational approach to edge detection, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 8, pp. 679--698, 1986.

[8] V. Caselles, F. Catte, T. Coll, and F. Dibos. A geometric model for active contours. Numerische Mathematik, 66:1–31, 1993.

[9] V. Caselles, R. Kimmel, and G. Sapiro. Geodesic active contours. In Fifth International Conference on Computer Vision, Boston, MA, 1995..

[10] Chan and L. Vese: Active contours without edges, IEEE Trans. Image Processing, 2001.

[11] L.D. Cohen, "On Active Contour Models and balloons", CVGIP: Image Understanding, vol 53, p211, 1991

[12] E. W. Dijkstra, A Note on Two Problems in connection with Graphs, Numerische Mathematik, vol. 1, pp. 269-270, 1959.

[13] N. D. D’Souza, J. M. Reinhardt, and E. A. Hoffman. ASAP: Interactive quantification of 2d airway geometry. In Medical Imaging 1996: Physiology and function from Multidimensional Images, Proc. SPIE 2709, pages 180–196, 1996.

[14] R.O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification (2nd Edition). Wiley Press. 2000

[15] A.X. Falcao, J.K. Udupa, S. Samarasekera, B.E. Hirsch, User-steered image boundary seg-mentation, in SPIE Proceedings, vol. 2710, pp. 278{288, 1996.

[16] A.X. Falcao, J.K. Udupa, S. Samarasekera, S. Sharma, B.E. Hirsch, R. de A. Lotufo, User-Steered Image Segmentation Paradigms: Live-Wire and Live-Lane, Graphical Models and Image Processing, vol. 60, no. 4, pp. 233{260, 1998.

[17] J. L. Foo. A survey of user interaction and automation in medical image segmentation methods. Technical Report ISU-HCI-2006-2, Iowa State Universtity - Human Computer Interaction, 2006.

[18] L. Ford and D. Fulkerson. Flows in Networks, Princeton University Press, 1962.

[19] A. Goldberg and R. Tarjan. A new approach to the maximum flow problem. Journal of the Association for Computing Machinery, 35(4):921–940, October 1988.

[20] R. M. Haralick and L. G. Shapiro. Computer and Robot Vision. Addison-Wesley Publishing Company, 1992.

[21] M. Holtzman-Gazit, D. Goldshe, and R Kimmel. Hierarchical segmentation of thin structure in volumetric medical images. In: Medical image computing and computer-assisted intervention (MICCAI), Montreal; 2003.

[22] A. Jain and F. Farrokhnia, Unsupervised texture segmentation using Gabor filters, Pattern Recognition., vol. 24, pp. 1167–1186, 1991.

[23] M. Kass, A. Witkins, and Terzopoulos. Snakes: active contour models. Int. Journal Computer Vision, 1(4):321-- 331, January 1988.

[24] S. Kichenassamy, A. Kumar, P. Olver, A. Tannenbaum and A. Yezzi: Conformal curvature flows: From phase transitions to active vision, Archive for Rational Mechanics and Analysis, 1996

[25] Y. Li, J. Sun, C.K. Tang and H.Y. Shum. Lazy Snapping. SIGGRAPH 2004, Vol. 23, pp. 303-308

[26] W. E. Lorensen and H. E. Cline. Marching Cube: A High Resolution 3-D Surface Construction Algorithm. Computer Graphics 21(3):163–169, 1987.

[27] R. Malladi, J.A. Sethian, and B. Vemuri, Shape modeling with front propagation: A level set approach," IEEE Transactions on Pattern Analysis and Machine Intelligence 17(2), pp. 158{175, 1995.

[28] D. Marr and E. Hildreth, A Theory of Edge Detection, in Proceedings of the Royal Society of London--Series B: Biological Sciences, vol. 207, no. 1167, pp. 187-217, Feb. 1980.

[29] B. Micusik and A. Hanbury, Steerable Semi-Automatic Segmentation of Textured Images, Proceedings of the SCIA, 2005

[30] E.N. Mortensen and W.A. Barrett, Intelligent Scissors for Image Composition. In: Proc. Of ACM Siggraph 1995. (1995) 191-198

[31] E.N. Mortensen and W.A. Barrett, Fast, Accurate and Reproducible Live-Wire Boundary Extraction. In: Proc. of Visualization in Biomedical Computing. (1996) 183{192

[32] E.N. Mortensen and W.A. Barrett, Interactive Live-Wire Boundary Extraction, Medical Image Analysis 1 (1997) 331{341

[33] E.N. Mortensen and W.A. Barrett, Interactive Segmentation with Intelligent Scissors", Graphical Models and Image Processing, vol. 60, no. 5, pp. 349{384, September 1998.

[34] E.N. Mortensen and W.A. Barrett, Toboggan-Based Intelligent Scissors with a Four Parameter Edge Model", in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'99), pp. II:452{458, 1999.

[35] E. Mortensen, B. Morse, W. Barrett, J.K. Udupa, Adaptive boundary detection using live-wire two-dimensional dynamic programming", in IEEE Proceedings of Computers in Cardiology, pp. 635{638, October 1992.

[36] D. Mumford and J. Shah, “Optimal approximation by piecewise smooth functions and associated variational problems,” Commun. Pure Appl. Math, vol. 42, pp. 577–685, 1989.

[37] L. O'Donnell, C.F. Westin, W.E.L. Grimson, J. Ruiz-Alzola, M.E. Shenton, R. Kikinis, Phase-Based User-Steered Image Segmentation. MICCAI, 1022-1030, 2001.

[38] L. O'Donnell. Semi-Automatic Medical Image Segmentation. Masters Thesis, MIT AI Lab, October 2001.

[39] S. Osher, and J.A. Sethian, Fronts Propagating with Curvature Dependent speed: Algorithms Based on Hamilton-Jacobi Formulation, Journal of Computational Physics, Vol. 79, pp. 12-49, 1988

[40] N. Paragios and R. Deriche: Geodesic Active Regions and Level Set Methods for Supervised Texture Segmentation, IJCV 2002

[41] D.L. Pham, C. Xu, and J.L. Prince, A Survey of Current Methods in Medical Image Segmentation. Annual Review of Biomedical Engineering, Jan 1998,

[42] C. Rother, V. Kolmogorov, and A. Blake, "GrabCut": interactive foreground extraction using iterated graph cuts, ACM Transactions on Graphics (TOG), v.23 n.3, August 2004

[43] J. Shi and J. Malik. Normalized cuts and image segmentation. In IEEE Conference on Computer Vision and PatternRecognition, pages 731–737, 1997.

[44] J.A. Sethian., A Fast Marching Level Set Method for Monotonically Advancing Fronts, Proc. Nat. Acad. Sci., 93, 4, pp.1591--1595, 1996.

[45] J.A. Sethian, Level Set Methods and Fast Marching Methods: Evolving Interfaces in Computational Geometry, Fluid Mechanics, Computer Vision, and Materials Science, Cambridge University Press, 2003.

[46] J.A. Sethian and A. Vladimirsky, Ordered Upwind Methods for Static Hamilton-Jacobi Equations, Proceedings of the National Academy of Sciences, 98, 11069-11074, 2001.

[47] J.A Sethian, and A. Vladimirsky, Ordered Upwind Methods for Static Hamilton-Jacobi Equations: Theory and Algorithms, SIAM J. Numer. Anal., 41, 1, pp. 325-363, 2003.

[48] H. Tek, D. Comaniciu, and J. Williams. Vessel detection by mean shift based ray propagation. In Workshop on Mathematical Models in Biomedical Image Analysis, pp. 228–234. 2001

[49] J.N Tsitsiklis, Efficient Algorithms for Globally Optimal Trajectories, IEEE Transactions on Automatic Control, Volume 40, pp. 1528-1538, 1995.

[50] J. K. Udupa, S. Samarasekera, and W. A. Barrett, Boundary Detection via Dynamic Programming, in Proceedings of the SPIE: Visualization in Biomedical Computing 92, Vol. 1808, pp. 33-39, Chapel Hill, NC, Oct. 1992.

[51] L. Vincent, and P. Soille, Watersheds in digital spaces: an efficient algorithm based on immersion simulations. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-13, 6 (June), 583–598. 1991

[52] S.K Weeratunga and C. Kamath, An investigation of implicit active contours for scientific image segmentation," Video Communications and Image Processing, SPIE Electronic Imaging, San Jose, Jan 2004

[53] J. Weickert and G. KÄuhne, Fast methods for implicit active contour models," technical report, preprint no. 61, UniversitÄat des Saarlandes, 2002.

[54] D.J. Williams, and M. Shah, A fast algorithm for active contours and curvature estimation, CVGIP: Image Understanding, v.55 n.1, p.14-26, Jan. 1992

[55] O. Wink, W. Niessen, and M. A. Viergever. Fast delineation and visualization of vessels in 3-D angiographic images. IEEE Trans. on Medical Imaging, 19:337–345, 2000.

[56] C. Xu, D. L. Pham, and J. L. Prince, “Image Segmentation Using Deformable Models,” Handbook of Medical Imaging: Vol. 2. Medical Image Processing and Analysis, SPIE Press, 2000.

[57] C. Xu, A. Yezzi, and J. L. Prince. On the relationship between parametric and geometric active contours. In Proc. of Asilomar Conf. on Signals, Systems. & Computers, 2000.

[58] A. Yuille and P. Hallinan. Deformable templates. In A. Blake and A. Yuille, editors, Active Vision, pages 20–38. MIT Press, 1992.

[59] H.K. Zhao, T. Chan, B. Merriman, and S. Osher, A variational level set approach to multiphase motion . Journal of Computational Physics, Volume 127 Issue 1, 1996

תקציר

סגמנטציה אינטראקטיבית של תמונות הינה בעיה הנעשית יותר ויותר פופולארית בין חוקרים במהלך השנים האחרונות. סגמנטציה אינטראקטיבית, להבדיל מסגמנטציה אוטומטית, מספקת למשתמש אמצעים לשלב את הידע שלו אל תוך תהליך הסגמנטציה. למרות זאת, ברוב השיטות הקימות, האינטראקציה המוצעת אינה טובה דיה שכן שילוב הידע נעשה באופן שאינו "מכריח" את התהליך לפעול לפי רצון המשתמש. בנוסף לכך, לרוב אין למשתמש אפשרות קלה לערוך את תוצאת הסגמנטציה. לפיכך, בסיטואציות מורכבות, המשתמש נאלץ לוותר על הכלים הממוחשבים ולעבור לציור ידני לחלוטין.

במאמר זה אני מציג שיטת חדשנית שפותחה ככלי המשלב סגמנטציה ועריכה. השיטה משלבת ממשק משתמש פשוט עם סגמנטציה מהירה ואמינה המבוססת על התאמת פרופילים חד ממדיים. המשתמש מתבקש לסמן מספר מועט של "נקודות בקרה" על גבול האובייקט הרצוי ולתת לאלגוריתם להשלים את השאר. המשתמש יכול לערוך את התוצאה ע"י הוספה, מחיקה והזזה של נקודות בקרה, כאשר אחרי כל אינטראקציה מתבצעת סגמנטציה אוטומטית בזמן-אמת ע"י האלגוריתם.

לפני הצגת האלגוריתם שלי אני פותח בסקירה על "שיטות סגמנטציה אינטראקטיבית המבוססות על חיפוש קו המתאר" ("Contour Based Interactive Image Segmentation"). הסקירה כוללת תאור מפורט של שיטותActive Contours, Live Wire & Graph Cut ושיטות הנגזרות מהן.

בחלק השני של העבודה אני מתאר את שיטת"Ray Segmentation" שלי: תחילה מאתחל המשתמש את התהליך ע"י מספר נקודות בקרה על גבול האובייקט. המערכת מחברת אותם לפוליגון התחלתי ומתחילה באיטראציות עד התכנסות. בכל איטראציה, כל נקודה על הפוליגון מנסה לזוז למקום הכי טוב הבא לאורך הנורמל הפוליגון. המיקום הטוב ביותר מושג ע"י השוואת סביבה חד ממדית בכוון הנורמל בין "הנקודה הנבחנת" לשני נקודות הבקרה השכנות לאורך הפוליגון. תהליך ההשוואה ממומש כקורלציה פשוטה בין שתי פרופילים חד ממדיים (הנקראים גם "סגמנטים" או "קרניים" ("rays")).

לצורך שיפור תהליך ההתכנסות מוכנס גורם נוסף. גורם זה מתוכן כך שימנע מנקודה מלהתרחק יותר מדי מהפוליגון שהתגלה באיטרציה הקודמת. לצורך כך מוטלות מספר נקודות פוליגון שכנות על הנורמל לנקודה הנוכחית והממוצע שלהן מחושב (כהזזה חד ממדית לאורך הנורמל). כל מיקום חדש שנבחן מקבל "עונש" על התרחקות מהמלצת השכנים הזו. מיצוע הטלות השכנים הינו ממושקל כך ששכנים עם "רמת ביטחון" גבוהה יותר תורמים יותר לממוצע.

האלגוריתם מתפקד ככלי זמן-אמת ("real-time") ומגיב באופן מידי לפעולות המשתמש על כל מחשב מודרני ממוצע. ההתכנסות מושגת במספר איטרציות קטן, גם אם במקרים של אתחול מפוליגון התחלתי רחוק, בזכות השימוש בגישת "multi-scaling" אדפטיבית. סקלת העבודה הראשונה לוקחת בחשבון את גודל האובייקט המועמד לסגמנטציה ומאפשרת בכך לאלגוריתם להגיע לתוצאה הנכונה עבור אובייקטים בכל גודל, גם עבור אתחול מרוחק.

האלגוריתם מורץ על תמונות רפואיות וריאליסטיות (צילומים) כאחד, ומוכיח את היתרון היחסי שלו ביחס לשיטות סגמנטציה אחרות. אנו משווים תוצאות עם שיטות Fast Marching ו- Live Wire ומראים את הוורסטיליות ואת היכולת "לסגמנט" אובייקטים במצבים מורכבים ביותר.

אוניברסיטת תל אביב הפקולטה למדעים מדויקים על שם

ריימונד ובברלי סאקלר

בית הספר למדעי המחשב

סגמנטצית תמונות ע"י התאמת פרופילים חד מימדים:

כלי המשלב סגמנטציה ועריכה

ע''י

אלון שמואלי

העבודה הוכנה בהנחייתם של

פרופ' דניאל כהן-אור

דר' חיית גרינשפן

הוגש כחלק מדרישות גמר לשם קבלת תואר

מוסמך במדעים (M.Sc.) באוניברסטת תל אביב

סיון תשס''ז

-----------------------

Start Point

Free Point

Boundary Point

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

recommended shift

average projection

projection

[pic]

[pic]

candidate shift

penalty

recommended shift

[pic]

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download