REAL-TIME TRACKING OF OBJECTS IN UAV VIDEO IMAGERY



REAL-TIME TRACKING OF OBJECTS IN UAV VIDEO IMAGERY

Shuqun Zhang

Assistant Professor

Department of Electrical and Computer Engineering

State University of New York at Binghamton

Binghamton, NY 13902-6000

Final Report for

Visiting Faculty Research Program

Air Force Research Laboratory, Rome Site, New York

Sponsored by:

Information Directorate

Air Force Research Laboratory

26 Electronics Parkway

Rome, NY 14413

August 2003

REAL-TIME TRACKING OF OBJECTS IN UAV VIDEO IMAGERY

Shuqun Zhang

Assistant Professor

Department of Electrical and Computer Engineering

State University of New York at Binghamton

Binghamton, NY 13902-6000

Abstract

This research project addresses the tracking of objects in a video stream obtained from a moving airborne platform for annotation purpose. It requires developing a real-time tracking system that can track any UAV video objects indicated by a user mouse click. A general tracking framework is thus proposed, which is based on a spatio-temporal segmentation. The proposed algorithm compensates the image flow induced by the camera motion, and then detects and tracks object regions. The moving objects are detected using a temporal change detection algorithm. Change detection usually fails in detecting stationary objects. To overcome this problem, a simple method based on an image shrinking operation is used to make static objects “move” so they can be also tracked by the same algorithm as moving objects. Another problem with change detection is that many noise variations are also detected besides the object region, and if the object is small it can be very hard to separate out the moving target from the noise. An effective method of extracting object regions is proposed based on the assumption that the target should be close to the mouse click point. To extract more object regions, an edge detection-based segmentation is used. Segmenting the subsequent frames of video sequence and establishing a correspondence of moving objects between frames perform the final step of tracking. The main features of the proposed algorithm are a low degree of computational complexity and generality suitable for objects of various types and sizes. We demonstrate results on a few real video sequences.

REAL-TIME TRACKING OF OBJECTS IN UAV VIDEO IMAGERY

Shuqun Zhang

1. Introduction

Exploitation of Unmanned Aerial Vehicle (UAV) video data is increasingly critical for future battlefield surveillance systems. The large amount of videos from UAV clearly will overwhelm human operators for visually inspecting the data and the performance of human operators will be undoubtedly affected by the boredom and fatigue. Automated video analysis and processing is aimed at extracting important information from video and reducing the workload for human operators. This includes the development of video annotation tools with the capability of adding text, graphics or audio to objects of interest such as vehicles, buildings, etc. The video annotation requires an effective real-time tracking algorithm to follow objects within the video.

A large amount of research on moving object detection and tracking in a video has been performed in the past and recent years, and a variety of techniques have been developed, which may include recognition-based, region-based, feature-based, and contour-based methods. Although many methods have been proposed, most can only be used for specific applications with reliable assumptions such as stationary camera in a constrained and uncluttered environment. Object detection and tracking remains an open research problem, and a fast and robust object tracking method is still a great challenge today. Most challenges arise from the image variability of an object over time, which comes from three main sources: variation in object pose or object deformations, variation in illumination, and partial or wholly occlusion of the target [1]. For a stationary camera, the object detection problem is relatively easy since objects can be obtained using background subtraction or temporal-differencing. Therefore, a major topic for object tracking with a stationary camera is to develop a good background model [2]. For a moving camera, the problem is much harder due to the apparent motion of the background. It is difficult to maintain a good background model. The motion creates other problems that are not easy to solve. For UAV videos captured by a fast moving camera, object tracking is even harder. Most of the existing tracking algorithms are inapplicable for UAV objects directly due to the following reasons. (1) They are computationally expensive and inapplicable for real-time applications. UAV video analysis requires the tracking of multiple objects in a real-time manner. A compromise typically exists between the achieving of real-time tracking performance and the constructing of a robust tracking algorithm. (2) They usually require strong assumptions such as stationary camera (or slowly moving camera), good image quality, a large number of pixels on target, slowly moving objects, and/or simple background, which are invalid for UAV videos. A UAV usually equipped with an optical and an infrared camera may move very fast (more than fifteen pixels in some testing videos). The objects may also move fast. The UAV video objects tend to have low resolution since they are far always from the camera, and the background is usually cluttered. (3) They are geared for large objects with clear features and boundaries whereas UAV objects may consist of only a few pixels (called point targets). In our testing videos, the object size ranges from 5x5 to 90x90. Detection of small moving objects can be a complicated task since they are similar to noise [3]. (4) They track moving objects only and cannot be used to track stationary objects simultaneously. There is little literature discussing the tracking of both moving and stationary objects at the same time. This is because stationary object tracking is not of the interest in most applications. However, for UAV video annotation, stationary man-made objects are also the targets of interest. It is desirable to have an algorithm that can be used for both stationary and moving objects.

This research project attempts to develop a new real-time object tracking algorithm for annotating UAV video. The general problem statement is given as follows. In a playing video, a human operator mouse clicks on, or as close as possible to, an object of interest, and an algorithm tracks the object through successive frames in real-time. For small and fast moving objects it is very difficult to click within the region of the object so the algorithm needs to account for this. Due to the variety of UAV objects, the tracking algorithm should be sufficiently general. That means it should be able to track various types and sizes of objects (stationary, moving, small, large, low-resolution, or high-resolution). The algorithm generality and real-time implementation are the main difficulties of this summer research project. Our goal is to obtain a robust algorithm able to cope with real-time constraints. Several videos from AFRL and Internet are used to test the developed algorithm. This paper is organized as follows. First, we briefly review some related works of object detection and tracking methods in the next section. Then the proposed approach is described in. Finally, we discus some implementation test results, and conclude this paper.

2. Related Work

Tracking techniques proposed for moving objects generally consists of two steps: detection of the objects followed by tracking of theses objects. These two steps are closely related to each other. In this section, we briefly summarize current published research on moving object detection and tracking.

2.1 Motion Detection

Motion detection is used to segment moving objects in a video sequence. It is more important than the tracking step in some sense. A very large proportion of research efforts on object detection and tracking focused on this problem. However, most of them have been designed for scenes acquired by a stationary camera. These methods allow segmenting each image into a set of regions representing the moving objects by using a background subtraction algorithm. A recent survey of motion detection can be found in Ref. 4, where different motion detection methods are classified into motion-based and spatio-temporal based, as shown in Fig. 1. The motion-based techniques usually need to estimate optical flow, a time-consuming operation. Furthermore, optical flow estimation is noise sensitive and is not accurate at object boundaries and homogeneous areas. Therefore, motion-based methods suffer from inaccuracy and high computational complexity. They are not suitable for our project since it is difficult to obtain accurate optical flow for UAV objects especially when they are small. On the contrary, the spatio-temporal techniques that combine both temporal and spatial segmentation are more appropriate for our project. In the temporal segmentation, change detection, based on the difference between two video frames, is the most popular method due to its simplicity and efficiency. The problem with the differencing is that many noise variations are also detected, and if the object is small it can be very difficult to separate out the moving target from the noise. Spatial segmentation is used to refine the object area.

2.2 Object Tracking

Motion detection provides useful information for the object tracking step. Different tracking strategies are taken based on different information (motion, color, texture, etc) provided. Below are some frequently used tracking methods.

A. Recognition (or model)-based tracking

If an object template or model is available and doesn’t change dramatically, it is straightforward to track the object by matching the template or model in the image and extracting its position. The template matching can be done by techniques such as image subtraction, cross-correlation or model matching. Image subtraction determines the object position by minimizing the distance function between the template and various positions in the image. Although it is simple and fast, it performs well only in very restricted environments where the template and the image should have very similar imaging conditions. In the correlation-based matching method, the normalized cross-correlation peak between the template and the scene can locate the object. Although the correlation method can accommodate some distortions such as rotation, scaling, illumination and noise, it is expensive to perform. The model matching needs to match object models with detailed geometry that could be found in the scene, which is also computationally expensive. Thus, the performance of the recognition-based tracking system is limited by the efficiency of the recognition method, as well as the types of objects recognizable. The recognition-based tacking method is not suitable for UAV small objects since they do not have enough spatial detail to extract enough information or a model. Examples of work on recognition-based tracking can be found in Refs. 1, 5-8.

B. Region-based tracking

Region-based approaches [9-12] track connected regions obtained from the motion detection. Ref. 9 shows a typical region-based tracking method which segments the subsequent frames of the video sequence and establishes a correspondence of moving objects between frames. It consists of the steps of motion projection, marker extraction, segmentation and region merging. Motion projection establishes the correspondence of moving objects between frames by mapping moving objects in one frame into the next frame according to motion information. However, the motion projection may not be accurate. A post-segmentation is generally needed in order to have an accurate and complete object region. For this purpose, a watershed transformation followed by a region merging can be used. The markers for watershed transform are extracted from reliable parts of each projected object. This method is computationally efficient and is able to track fast moving objects. Therefore, it is appropriate for UAV video object tracking and is used in this project.

C. Feature-based tracking

Feature-based methods [13-17] track image features of an object instead of the entire object. Most feature-based tracking methods are based on the Shi-Tomasi-Kanade tracker [13]. Features frequently used for tracking are corners and lines. Such features are detected automatically in each frame and tracked through a sequence. Since the number of features detected may be large and many of them useless, there must be an automatic scheme to select those good features and reject bad ones. On the other hand, the number of features must be large enough to achieve better tracking performance. Therefore, the performance of features-based tracking depends on the feature detection and selection schemes used. The advantage of tracking features is that it may be able to achieve stable tracks even in case of partial occlusion of the object. However, the problem of grouping the features to determine which of them belong to the same object is a major drawback of feature-based approaches. Since it is difficult to obtain enough good features for small objects, feature-based was not chosen for this project.

D. Contour-based tracking

Contour-based tracking methods [18-22] track only the contour of the object, and they can track both rigid and non-rigid objects. Snake or deformable active-contour-based tracking is particularly adapted to track non-rigid objects since it can easily incorporate the dynamics derived from time-varying images. In such tracking method, an approximation of the object boundary is provided by a user (or automatically using motion detection) in the first frame. The initial contour is attracted toward the expected object boundary. The final snake in the first frame is then motion-projected to the second frame and used as the initial snake. The deformable template matching process is applied to the snake until it converges. This process repeats. Since snakes are usually parameterized and the solution space is constrained to have a predefined shape, an accurate initialization step is very important. Snakes-based segmentation is computationally expensive since it operates iteratively. Its performance depends on the initialization and parameter tuning. The active contour-based tracking is also not suitable for UAV objects because of the above drawbacks and too much human intervention.

3. Proposed Method

From the review in Section 2, we can see that many existing tracking techniques are inapplicable to track UAV objects mainly because they are unable to handle small objects efficiently or are too computational complex. The spatio-temporal motion detection with region-based tracking seems to be the best fit for UAV video objects. The block diagram of the proposed algorithm is depicted in Fig. 2. It includes three steps: compensation of the image flow induced by a moving camera, detection of moving regions in each frame, and tracking of moving regions in time. The motion compensation involves estimating motion parameters between two consecutive video frames. The moving objects are detected using a spatio-temporal change detection algorithm. The final step of tracking is performed by segmenting the subsequent frames of video sequence and establishing a correspondence of moving objects between frames. The background processing block in Fig. 2 uses a mean filter to smooth the image to be subtracted and to “move” stationary objects. The need of “moving” stationary objects is because the change detection algorithm can detect only moving objects. The “movement” operation will allow stationary objects be treated as moving objects in the following motion detection step. The principle of using a simple mean filter to “move” stationary objects is discussed in detail in Section 3. D.

A. Camera Motion Estimation and Compensation

The motion induced by a moving camera must be canceled before motion detection can be applied. Given two successive frames I0 and I1 of an image sequence, the geometric transform to align I0 and I1 can be estimated using some motion model. The most used models include affine or quadratic approximation of the motion. Here, we use a 3-parameter model (translation plus rotation) for the camera motion estimation [23, 24]:

(1)

where tx, ty and ( are the horizontal translation, vertical translation and rotation angle, respectively. The following approximations can be made when the rotation angle is small:

(2)

Eq. (1) is then simplified as

(3)

By use of Taylor series expansion, Eq. 3 becomes

(4)

where [pic] and [pic] We estimate the motion parameters by minimizing the following error function

(5)

where the summation is over all image pixels. Taking derivative on E(tx, ty,() with respect to tx, ty and ( and setting to zero yields the following three linear equations:

(6)

where [pic]and [pic] The three motion parameters can then be computed by solving Eq.(6) as

(7)

The above motion estimation technique can achieve subpixel accuracy for small shifts and rotations. Since we expect larger values, an iterative estimation method is used. That is, I1 is continuously warped toward I0 using the motion parameters estimated according to Eq. (7) until the motion parameters are sufficiently small or the number of iteration is larger than a threshold. The warped version of I1 gets closer to I0 at every iteration. After the three motion parameters have been estimated, I1 is registered to I0 with the bilinear interpolation.

The above least-squares motion estimation is generally accurate since the three parameters are well over-determined by the data. To reduce the computation time, we can also perform the motion estimation based on a small set of feature points only. In general, it is better to select pixels/features only from the background. The use of pixels/features from the moving object regions may affect the accuracy. For UAV videos, since objects tend to be small and the background is dominant, the camera motion parameters estimated using Eq. (7) provides sufficiently accurate results. The accuracy and the performance of the above estimation method can be enhanced using a multi-resolution approach.

B. Motion Detection

Since moving objects generate changes in the image intensity, motion detection can be done by temporal change detection efficiently, which is based on the difference between two video frames. For videos captured with a moving camera, moving object regions are detected by subtracting the previous frame transformed with estimated motion parameters from the current frame. A threshold is then used to determine whether a pixel belongs to a moving object or not. Let In-1(x, y) and In(x, y) be two aligned successive image frames, the image difference is computed as

(8)

and a changed region can be extracted by thresholding the difference image

(9)

where T is a threshold that can be determined empirically or automatically. Here, we calculate T automatically based on the standard deviation of the difference image. If there are no moving objects, then Dn(x, y) should contain only camera white noise with zero-mean and variance of (2. According to the theory of hypothesis testing in statistics, a sample from a normal distribution N(0, (2) has a probability of 0.9987 belonging in the range [-3(, 3(]. If it falls outside this range, it will not be considered as noise and thus is due to object motion. Therefore, 3( can be used as the threshold to determine change caused by moving objects. For some videos, 3( is too big and many object pixels will not be detected. In order to obtain more object pixels, a threshold smaller than 3( and greater than 2.5( is generally preferred. Of course, lower threshold will result in more noise in the change detection result. Instead of using a single global threshold, an adaptive threshold computed based on the statistical property of a small neighborhood can be used to achieve improved performance with added computation [25].

The result of motion detection analysis is a binary mask that indicates the presence or absence of motion for each pixel of the image. Figure 3 shows an example of a change detection result. It can be seen that the binary image contains not only moving object pixels but also noise. The noise may include motion noise and camera noise. The motion noise is due to inaccurate image registration or insufficient modeling. The camera noise is assumed to be random. The change detection generally does a poor job of extracting all relevant feature pixels. The performance depends on image quality, background, and object size

. A spatial segmentation is generally needed in order to have a complete object region.

To separate the object pixels from noise, a morphological closing operation is first performed on the change detection result to remove camera noise and to connect object pixels into regions. The resultant image is then size filtered. Only connected regions of size bigger than a threshold are kept. The next step is to locate the moving object of interest. From the general statement of the project, we know that the object of interest is indicated by a mouse click. Since there is no guarantee that the user can click on the object every time, we cannot take this as granted. But it is reasonable to assume that the mouse click point is very close to the object. Therefore, we can locate the object by finding the region(s) closest to the mouse click point. This has been proven to work very well in our experiments. For small objects, a moving object is almost always correctly extracted and there is no need to do further spatial refinement. However, for large objects, the region found may be just a small portion of the object. Figure 4 shows a segmentation of a large object. It is seen from Fig. 4(e) that the region closest to the mouse click point is just a small piece of the object. Therefore, a spatial segmentation is needed in order to extract other object regions. There are several ways to perform further spatial segmentation. It may include edge detection, watershed transform and clustering. Here, we use the edge detection method by applying the Sobel operator on the image. Edge detection for image segmentation has the advantages of simplicity and applicability to different object sizes when compared to other segmentation techniques. Since the edge may be incomplete and broken, a morphological dilation is performed to connect edge segments and thicken the edge. By performing edge detection on the image, both the moving object edge and background object edges are detected. It is therefore a need to separate the object edge from other non-object edges. Since both the object edge and the region detected from the change detection step belong to the same object, they tend to have a non-empty overlap. So the object edge is identified as the edge having an intersection with the region closest to the mouse click point. The obtained object edge map may have holes and a region filling operation is used to fill these holes. This completes the segmentation of large objects. It is noted that although the segmentation result may be far from perfect, it is normally adequate for annotation purpose since it can give the approximate object centroid and bounding box.

We don’t wish to perform the same edge processing operations on small objects because of the following two reasons. First, it can save computation time if there is no need to perform edge-based image segmentation. Second, the edge processing could damage the segmentation result of small objects occasionally. So a threshold is set to separate small and large objects. Here, any object whose area is smaller than a threshold is considered small. It is noted that we don’t need to tune this threshold since most of the time the edge processing operation won’t affect the segmentation results of small objects. We can also eliminate the occasional damage by using past segmentation results.

Tracking object regions

Object tracking is performed after the object of interest has been segmented. The objective of tracking is to establish a correspondence of moving objects between frames and to segment the subsequent frames of the video sequence. It includes the steps of motion prediction and segmentation correction. The motion prediction step is to predict the next location of targets. Since motion prediction does not form an accurate object correspondence between two frames, segmentation correction attempts to remove the incorrectly identified moving regions.

One of the popular techniques used for motion prediction is that of Kalman filtering. Kalman filters require accurate models of motion and noise. They do not allow radical motion changes, and their performance depends on accurate modeling and parameter tuning. Thus, Kalman filtering does not perfectly suit to the tracking of UAV video objects. Another motion prediction method is to use curve fitting and prediction. Here, for simplicity, we use the motion information in the previous frame by assuming that the velocity of the object is constant. That is, the new location of object is calculated by simply adding the current location and the calculated motion vector from the segmented object region. The new location is used as the initial seed for searching object region. Then the motion detection process described in the Section 2.A is applied to segment the objects of interest. The segmentation result may be inaccurate due to inaccurate prediction or background change. An example is that a small object detected in the previous frame may be detected as a large object in the current frame because the object region is merged with other non-object regions. We can alleviate this problem by examining the object’s previous segmentation results.

Tracking stationary objects

The approach described above can be effectively used to track moving objects. If an object stops moving, the change detection method can no longer be used to obtain reliable segmentation results. For tracking purposes, one might assume that the object remains at its last known position until significant motion is again detected. But this approach can be distracted too easily by noise or erroneous motion estimates without the use of more sophisticated motion models. For objects (such as buildings) that never move, the change detection approach cannot even detect its existence. Although we can try to determine whether an object is stationary or moving by comparing its speed with the camera speed, the problems are that the velocity estimation may not be accurate and it is difficult to separate stationary objects from small moving objects.

Since the tracking algorithm developed in this project will be mainly used for annotation purpose, tracking of both moving and nonmoving objects is necessary. Most vision systems never consider tracking both moving and non-moving objects simultaneously since static object tracking is generally not of interest for most applications. Therefore, most tracking algorithms will fail in tracking stationary objects. A solution to this is to have the human identify whether an object is moving, and to use different algorithms and mouse buttons to track them. But this is inconvenient for the user.

A better solution is to make stationary objects “move” so the technique for tracking moving objects can be applied to stationary objects too. Here we suggest using simple image processing operations to simulate the behavior of an object moving. Let’s recall the principle of change detection using image subtraction. For a moving object, the image subtraction results in a broken object boundary due to the relative object translation in two video frames. For a stationary object, the subtraction generates a zero image. If the change detection also outputs a boundary (instead of all zeros) for a stationary object, we can consider the stationary object “moving.” To obtain an object boundary, a simple way is to shrink (or enlarge) an object and then subtract the shrunk (or enlarged) object from the original object. Therefore, under the framework of change detection, the “movement” of a stationary object can be simulated using an image shrinking (or enlarging) operation. In this project, we perform the shrinking operation using a simple 3x3 mean filter. The object and background have different intensities. The averaging operation of the mean filtering make the intensities of object boundary pixels get closer to the background intensity while the intensities of the rest object pixels remain almost unchanged. This corresponds to shrink the object. It is noted that for moving objects, the shrinking operation won’t change the output of the change detection. We still get an object boundary although it tends to become thicker. The mean filter also smoothes the image to be subtracted and eliminates isolated noise. This smoothing operation is important for low-resolution images.

Detail steps

A detailed data flow of the proposed algorithm is given in Fig. 5. In a playing video, two successive frames, frame n and frame n+1, are extracted. These two images are downsampled, and the motion estimation is performed on the dwonsampled images using the algorithm described in Section 3. A. Frame n+1 is then aligned with frame n using the motion parameters estimated. When there is a mouse click on an object, a 101x101 region centered at the mouse click point is cropped from each of the two frames. Here we assume that the size of the largest object expected will not exceed the dimension of the region. All image processing operations will be performed on these two subimages only. This can save a lot of computation time. In the following description, frame n and frame n+1 will refer to the two subimages instead of the two whole images. Frame n is first filtered by a 3x3 mean filter. The purpose for this filtering is to remove isolated random noise and more importantly to make stationary objects “move”. By thresholding the difference between the filtered frame n and frame n+1, a binary image containing both object pixels (most of them belong to object boundary) and noise is generated. Here, the threshold used is 2.7(. A morphological closing is applied on the binary image to connect object pixels and remove isolated noise. A size filter with threshold of 6 pixels is used to further remove unwanted non-object regions. We then locate the object of interest by finding the region that is closest to the mouse click point. If the area of the region found is smaller than a threshold, then we think the object is small and go ahead to compute its centroid and bounding box. Otherwise, the object is considered large, and the region just found is only a portion of it. An edge detection-based segmentation is applied on frame n+1 to exact the edge of the object. The threshold used to separate small and large objects is 80 pixels here. Actually it can take any value between 60 and 100 according to our experiments.

Since edges of non-object regions also appear on the edge map, a scheme is needed to separate the edge of the object from other edges. This can be simply done by selecting the edge that has an overlap with the region closest to the mouse click point. An image filling operation is then performed on the object edge to fill holes. Next a segmentation correction step is used to correct the object boundary. The correction is based on comparing the current object segmentation with the previous segmentation result. If there is a significant change between these two segmentations, then the previous segmentation result will be used. This completes the segmentation step. The centroid and bounding box of the regions segmented are calculated. The object displacement is then computed and used to predict the new location of the object in the next frame. When the video moves forward one frame, we crop two regions centered at the new object location, and repeat the same object detection and tracking process until the object disappears from the scene.

4. Results

We have developed a system implementation in Matlab for the tracking algorithm described in Section 3. We have tested the system using both optical and IR videos with various types and sizes of objects. The resolution of video imagery ranges from 192x254 to 704x480, and the object size ranges from 5x5 to 90x90 or even bigger. Frame rate of videos is 30 or 15 frame/second. The testing results show that the system can track most moving and stationary objects despite their sizes as long as the mouse click point is sufficiently close to the object. However, these are several cases in which the tracking could fail. First, when objects are very close to each other. This is because several object regions may merge into a single region or the algorithm can’t determine which region belongs to which object. Second, when the object size is larger than 100x100. This is because 100x100 is the dimension of the largest object assumed. Third, when the object is occluded. We don’t have any special treatment for occluded objects. Therefore, the performance of the proposed tracker is uncertain under the above three situations.

5. Conclusions and future work

Along with the increasing application of videos in battlefield surveillance systems, automation of video processing will heavily rely on effective and efficient object detection and tracking. It is almost impossible to have a generalized, robust, accurate and real-time tracking technique today. For some specific applications, however, we might be able to make some reliable assumptions and simplify the object detection and tracking problem. In the framework of a video annotation application, we developed an object tracking system to track any object of interest indicated by a user’s mouse click. For the project period, we were able to demonstrate the tracking of UAV objects of various types and sizes in real-time. The proposed algorithm is based on the combination of temporal differencing and spatial segmentation. It is more general and computationally inexpensive than other tracking methods. The obtained results show that the algorithm achieves satisfactory tracking performance in most cases. Additional research work is needed to improve the performance of the algorithm and to advance the-state-of-art. Below are some thoughts about the future work.

The edge-based segmentation method used in this project can obtain only a rough object boundary. To obtain a better segmentation performance, it is necessary to combine with other object segmentation methods such as watershed transform. In order to further improve the segmentation and tracking performance, more information must be utilized to separate object regions from non-object regions. Currently only spatial information and two-frame temporal analysis were used to keep the algorithm general and simple. It is noted that history of motion and segmented results as well as object color could be very useful in improving the segmentation and tracking performance. They could be also utilized to handle partial occlusion. The motion history can be used to project motion, which establishes the correspondence of moving objects between frames. The temporal consistency of objects can help to identify targets. The past segmentation results could be used to correct current segmentation. Color information has been very useful in image segmentation. It could be also helpful for our object detection and tracking.

In this paper, the motion of regions and objects is described using a simple constant velocity model. Better prediction performance is expected by using a more complex motion model.

Motion estimation is the most expensive operation in this algorithm. Since some kind of motion information has already been included in video streams, it may be able to utilize it for rough global motion compensation. Of course the accuracy is of a concern.

6. Acknowledgements

I would like to thank my mentor Todd Howlett for his support, valuable guidance, suggestions and comments throughout the development of this project. I would also like to thank Dr. Mark Robertson for his valuable discussion and comments.

References

 

1. G. D. Hager and P. N. Belhumeur, “Efficient region tracking with parametric models of geometry and illumination,” IEEE Trans. Pattern Analysis and Machine Intelligence 20, 1025-1039, 1998.

2. K. Toyama, J. Krumm, B. Brumitt, B. Meyers, “Wallflower: Principles and Practice of Background Maintenance,” International Conference on Computer Vision, Vol. 1, September 20 - 25, 1999, Corfu, Greece.

3. M. Ye, L. G. Shapiro and R. M. Haralick, “Aerial point target detection and tracking – a motion-based Bayesian approach,” ISL Technical Report, University of Washington.

4. D. Zhang and G. Lu, “Segmentation of moving objects in image sequence: a review,” Circuits, Systems and Signal Processing, 20, 143-183, 2001.

5. T. Meier and K. N. Ngan, “Automatic segmentation of moving objects for video object plane generation”, IEEE Trans. Circuits Syst. Video Technol. 8, 529-538, 1998.

6. H. Tao, H. S. Sawhney, and Rakesh Kumar, “Object tracking with Bayesian estimation of dynamic layer representations,” IEEE Transactions on Pattern Analysis and Machine Intelligence 24(1): 75-89 (2002)

7. G.L. Foresti, "Object recognition and tracking for remote video surveillance", IEEE Trans. Circuits Syst. Video Technol. 9, 1045-1062, 1999.

8. I. Cohen, G. G. Medioni, “Detecting and tracking moving objects for video surveillance,” IEEE Proc. Computer Vision and Pattern Recognition, 2319-2325, 1999, Fort Collins CO.

9. D. Wang, “unsupervised video segmentation based on watersheds and temporal tracking,” IEEE Trans. Circuits Syst. Video Technol. 8, 539-546, 1998.

10. R. Mech, M. Wollborn, “A noise robust method for 2D shape estimation of moving objects in video sequences considering a moving camera,” Signal Processing 66, 203-217, 1998.

11. K. Y. Wong, M. E. Spetsakis, “Motion Segmentation and Tracking,” Proceedings of 15th International Conference on Vision Interface, 80-87, S2.1, 2002, Calgary, Canada.

12. A. Cavallaro and T. Ebrahimi, “Video object extraction based on adaptive background and statistical change detection,” Proc. of SPIE Electronic Imaging 2001 - Visual Communications and Image Processing, 465-475, 2001, San Jose C A.

13. J. Shi, C. Tomasi, ”Good features to track”, IEEE Proc. of Computer Vision and Pattern Recognition, 593-600, 1994.

14. C. Tomasi, T. Kanade, ”Detection and tracking of point features”, Technical report CMU-CS-91-132, Carnegie Mellon University, Pittsburgh, PA, April 1991.

15. T. Tommasini, A. Fusiello, E. Trucco, V. Roberto, “Making Good Features Track Better”, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 178-183, 1998.

16. E. Trucco, Y. R. Petillot, I. Tena Ruiz, K. Plakas, and D. M. Lane, “Feature tracking in video and sonar subsea sequences with applications,” Computer Vision and Image Understanding 79, 92-122, 2000.

17. A. Fusiello, E. Tommasini, and V. Roberto, “Improving feature tracking with robust statistics,” Pattern analysis & applications 2, 312-320, 1999.

18. S. Araki, T. Matsuoka, N. Yokoya, H. Takemura, ”Real-Time Tracking of Multiple Moving Object Contours in a Moving Camera Image Sequence”, IEICE Trans. Inf. & Syst., E83-D, 1583-1591, 2000.

19. A. Cavallaro and T. Ebrahimi, “Multiple video object tracking in complex scenes,” Proc. Of the ACM multimedia Conference, 523-532, 2002, Juan Les Pins, France.

20. Y. Fu, A. Tanju Erdem, “Tracing visible boundary of objects using occlusion adaptive motion snake,” IEEE Trans. Image Proc. 9, 2051-2060, 2000.

21. N. Peterfreund, “Robust tracking of position and velocity with Kalman snakes,” IEEE Trans. Pattern Analysis and Machine Intelligence 21, 564-569, 1999.

22. Y. Zhong, A.K. Jain, and M.-P. Dubuission-Jolly, “Object tracking using deformable templates,” IEEE Trans. Pattern Analysis and Machine Intelligence 22, 544–549, 2000.

23. M. Irani and S. Peleg, “Improving resolution by image registration,” CVGIP: Graph. Models and Image Process. 53, 231-239, 1991.

24. R. C. Hardie, K. J. Barnard, J. G. Bognar, E. E. Armstrong and E. A. Watson, “High-resolution image reconstruction from a sequence of rotated and translated frames and its application to an infrared imaging system,” Opt. Eng. 37, 247-260, 1998.

25. T. Aach A. Kaup, “Statistical model-based change detection in moving video,” Signal Processing 31, 165-180, 1993.

 

 

Fig. 1. Classification of motion segmentation (Courtesy of Zhang and Lu [4]).

Fig. 2. Block diagram of the proposed tracking algorithm.

Fig. 3. Result of change detection: (a) and (b) two image frames; (c) image difference; and (d) binary image obtained by thresolding (c) using 2.7(.

Fig.4. Initial segmentation of a video frame containing a large object: (a) input frame; (b) difference image; (c) thresholded difference image; (d) binary image after morphological closing and size filtering; (e) the region closest to the mouse click point.

Fig. 5. Data flow of the proposed algorithm.

-----------------------

[pic]

(d)

[pic]

[pic]

[pic]

[pic]

[pic]

(c)

(b)

(a)

Motion

prediction

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

Object tracking

Motion

Detection

Background

processing

Motion Compensation

Frame n

Frame n+1

No

Input video

Image registration

Average filtering

Temporal differencing

Size filtering

Frame n

Frame n+1

Image cropping

Image cropping

Mouse click

Morphological closing

Object identification

Small object?

Edge detection

(a)

Select

object edge

Centroid & Bounding Box computation

Yes

Segmentation

correction

(b)

(c)

(d)

(e)

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download

To fulfill the demand for quickly locating and searching documents.

It is intelligent file search solution for home and business.

Literature Lottery

Related searches