Monocular Vehicle Detection and Tracking

Monocular Vehicle Detection and Tracking

Yufei Wang Department of Electrical and Computer Engineering

University of California San Diego La Jolla, California 92037

E-mail: yuw176@eng.ucsd.edu

Abstract--This project implements a vehicle detection and tracking system. This framework uses Haar cascade classifier for vehicle detection, and uses car-light features for validation. In order to smooth the detection and refine the detection result further, Kalman tracking is applied to every car hypothesis, additionally, a three-stage tracking is employed. Experiment shows that the car-light validation can largely reduce the false alarms, meanwhile preserving the detection rate; and that tracking can improve the detection result further. Given limited training set, although the Haar cascade classifier produces a large amount of false alarms, the detection system still yields satisfactory result with the refinement and tracking step.

I. INTRODUCTION

Worldwide, millions of people are killed or injured in motor vehicle collisions, and financial costs to both society and individuals are significant. Therefore, a pre-crash vehicle system is of a great interest to researchers as well as vehicle manufactures.

The pre-crash vehicle systems is very challenging. One of its main challenges is that it requires accurate detection of on-road vehicles. The detection problem is a real world problem, the variety of vehicles' appearance and the changing environments put great obstacles to the tackle of this problem. In the past decades, many techniques have been used to detect on-road vehicles, including radar, lidar, and computer vision techniques. Thanks to the development of cameras and computational devices, the computer vision approach can be implemented in real time and has captured more and more attention. Many approaches have been developed to deal with monocular vision-based vehicle detection. For feature extraction, popular features such as HOG features, SIFT, SURF, and Haar-like features were tried, and a variety of classifiers were employed, such as SVM, Adaboost classifier, hidden Markov model classification, etc. ( [5]). For tracking, particle filtering and Kalman filtering are among the popular methods ( [5] [4]).

The main novelty of this project is the usage of car-light feature for target validation. The system involves three steps. Car targets are generated with Haar cascade classifier; Target validation is implemented with car-light feature; Tracking smooth and refine the detection results further. The detection and tracking result is evaluated on real-world video.

The rest of this report is organized as follows. In Section II a brief overview of related works in vehicle detection and object detection is presented. Section III gives the framework of the vehicle detection and tracking system. Section IV gives the experiment result and analysis of the detection and tracking system. Section V gives some future work that can be done.

II. RELATED RESEARCH

Many features and classification methods have been tried on vehicle-detection problem. Among them, Haar-like features and boosted cascade are preferred by many researchers.

In [1] haar-like features is used for object detection and the results are compelling. The advantage is that it is easy and fast to compute and the rectangular feature is representative for vehicle detection. Adaboost classifier is first proposed in [2]. By combining weak classifiers with certain weights, AdaBoost can achieve good results. [6] used haar-like features and Adaboost as classifier for face detection. However, the problem of computational time remains: there are too many haar-like features in a small patch (with different sizes and positions). Although by using integral image the computing time of haarlike features is reduced greatly, it is still too time consuming to compute every haar-like feature for all the patches. Therefore, boosted cascade is proposed. It consists of several stages. For an object candidate, only when it passes one stage can it enter the next stage, and if it fails any stage, it will be classified as background. This largely reduces the computing time. The haar-like feature & boosted cascade produces very compelling results in face detection. Rear facing cars has many rectangle patterns which may be easily represented by haar-like features, and boosted cascade is very efficient classifier so that the detection system can run in real time. That is the reason I choose haar-like features and boosted cascade. Also, many have used haar cascade for vehicle detection with satisfactory results ( [4]).

Vehicle detection normally consists of two step: first, all regions that can be viewed as vehicle candidates are identified; second, the candidates are verified and tracked. In the first stage, many researchers use shadows underneath vehicles as a clue indicating the presence of a vehicle. [3] suggests a gradient based method: due to shadows, wheels and bumpers in the bottom rear view of a vehicle, there will be a negative horizontal gradient.

III. METHOD

The framework is shown in Fig. 1. In this section, hair cascade is introduced briefly first. Then validation using carlight feature and tracking steps are detailed.

A. Haar Cascade Classification

The combination of haar-like features and boosted cascade classifier is used for object detection by many. Haar-like features can be divided into three class: two-rectangle features which can detect edges; three-rectangle features to detect

Fig. 3: Car light feature. Top row: detected cars. Middle row: Cr component of top row. Bottom row: Thresholding of the middle row.

Fig. 1: Framework of System Algorithm

Fig. 2: Car candidates detected by Haar cascade

lines; four-rectangle features to detect diagonal edges. Boosted cascade is a cascade of weak classifiers (mostly decision stumps), each stage consists of several weak classifiers, and for every candidate, it is put through each stage. The candidates which passes through all the stages are classified as positive, while those which are rejected by any stage are classified as negative. The advantage of the cascade is that majority of the candidates are negative, and they usually cannot pass the first few stages, therefore the computing time is greatly reduced by the early-stage rejection. B. Car-light Refinement

After the sub images in a frame are classified as nonvehicle or vehicle, the vehicle candidates are generated. As is shown in Fig. 2, there are many false alarms among the candidates. That is due to the limitation of the training set that is used, which will be elaborated in the experiment section.

To reduce the false alarm to the largest, car-light feature is introduced.

On-road tracking of vehicles mainly deals with rear-view vehicles that appear in front of the camera. Regardless of the shape, texture of color of the vehicles, the rear view of them shares a common features: they have red lights in the middle. As shown in Fig. 3, despite the variation of the color of the vehicle or the lighting condition, the rear view of the vehicle has the red car light on left and right side of it.

To extract the car-light feature described above, the color space is first transformed to YCbCr color space. Y component corresponds to illumination, and Cb and Cr component correspond to red and blue chroma components. In this color space, the influence of illumination difference is reduced. Cr component is of our interest, and the Cr component of the area is shown in the second row of Fig. 3. It can be observed that illumination changes can still impact the Cr value of car light's. Otsu's method is used to obtain an adaptive threshold of the Cr sub image. With the Otsu's method, impact of illumination is reduced largely, and the two lights are extracted. The employment of Otsu's method is illustrated in bottom row of Fig. 3. The area to be segmented only contains the middle of the original vehicle area. Finally, car-light feature is extracted. Currently, a heuristic but effective threshold is applied to the black-and-white car-light image:

CarLight =

1, 0

() Ebw r

,

()

()

Ebw m > T or Ebw l

otherwise

() Ebw m > T

(1)

where r/l/m is the three sub images (right/left/middle) of

the bottom row of Fig. 3, and ( ) stands for mean Ebw area

value of the black-and-white thresholded area. T is predefined

threshold. CarLight = 1 verifies the car candidate, whereas CarLight = 0 rejects the candidate.

However, when the car is red, as is shown in the first column of Fig. 3, when the vehicle itself is red, the above method is invalid. Therefore a different thresholding is employed: thresholding is applied to Cr image, and if the mean Cr value of the candidate area is larger than a predefined threshold, then the area is validated as detected car area:

RedCarLight =

1, 0

() ECr area > TCr

(2)

, otherwise

The car-light feature has the advantage of using color information to eliminate false alarms, which is a good complement to cascade classifier (which uses only gray level information).

C. Tracking

After detection, tracking is employed to smooth and refine the detection result further.

1) Kalman tracking: Kalman tracking is employed to smooth the detection result.

I describe the state with 6 dimensions: X =

[

]0, where ( ) denotes the co-

sx, sy, width, height, vx, vy

sx, sy

ordinate

of

the

center

of

the

area,

( width,

) height

denotes

the

size of the area, and (vx, vy) denotes the velocity of its center.

For the measurement Z, we use 4D vectors:Z =

[

]0, where ( ) represents the position

zx, zy, zwidth, zheight

zx, zy

of

the

observed

vehicle,

and

(

)

zwidth, zheight

denotes

the

observed size of the vehicle.

2) Three-stage hypothesis tracking: On top of Kalman tracking, I assume there are three stages of a target hypothesis: hypothesis generation, hypothesis tracking, and hypothesis removal.

? Hypothesis generation: when a newly detected area appears and last for more than n1 frames, a vehicle hypothesis is generated.

? Hypothesis tracking: Kalman tracking of every vehicle hypothesis, which consists of two stage: prediction, where the state of each hypothesis is predicted, and update, where the state of each hypothesis is updated based on prediction and observation.

? Hypothesis removal: when a hypothesis is not detected

for

more

than

2 n

frames,

the

hypothesis

is

removed.

n1 and n2 are predefined parameters. The three stages of tracking refines the detection result further, for it can eliminate the false alarms that doesn't survive long enough, and can keep track of the vehicles which are shortly missing in detection step.

IV. RESULTS AND ANALYSIS

A. Experiment Dataset

The dataset for training and test is the LISA-Q Front FOV data set, which consists of three video sequence, consisting of 1600, 300, and 300 frames respectively. The three video clips have different lighting condition and traffic condition. The first dataset is used as training data, and the other two are used as test data. This is because the number of vehicles in the first video is relatively large.

B. Experiment Parameters

For cascade training, 2000 positive images are randomly chosen from the 1600-frame training dataset. 1300 negative images are chosen from 325 non-vehicle images, the scenes of which fall into highway,coast, mountain, open country, building, and street. The training images are resized to 40*40 patches. The number of sages is 20. The maximum false alarm rate for each stage is 0.5, which is relatively loose, and the minimum detection rate for each stage is 0.995. The weak classifiers are decision stumps.

For car-light refinement, choose T = 20, TCr = 200.

For

tracking,

choose

1 n

=

2,

2 n

=

1.

Fig. 5: Result of car-light refinement: left two columns: validated candidates; right two columns: rejected candidates

Fig. 6: Result of car-light refinement: left: before validation; right: after validation

C. Result The system is tested on the two videos, 300 frames of each.

The result is shown in Fig. 4. I use performance metrics put forward by [4]: TPR = Detected vehicles / Total vehicles. FDR = False positives/(Total vehicles+False positives). FP/Frame = False positives / Frames. TP/Frame = True positives / Frames. FP/Object = False positives / True vehicles.

The result is compared with the result using ALVeRT system in [4], as is shown in Table I and Table II.

D. Result Analysis 1) Car-light refinement: Car-light refinement can reduce

most of the false alarm generated by the previous stage. Some examples are shown in Fig. 5. The left two candidates are validated, while the right two false alarms are successfully eliminated. The effectiveness can be illustrated more clearly by Fig. 6. Before validation, there are many false alarms, while after validation, only the true vehicle is validated. This can also be illustrated in Table I and Table II, where my method has lower false alarm rate than ALVeRT in [4].

2) Tracking: The advantage of tracking method is that it can smooth and refine the detection result. As is shown in Fig. 7, without tracking method, the red car in the middle is missed in frame i, whereas it can still be tracked with tracking, because it is detected and tracked in the last frame.

3) Failure: There are several reasons for failed case:

? shadows or severe illumination changes ? cars that are not strictly rear-faced ? sometimes part of the cars can be mistakenly detected

Fig. 4: Experiment Result. Each row shows one experiment result. Each column corresponds to intermediate result of certain stage. Column: 1. input frame; 2. detection using cascade classifier; 3. refinement with car-light feature; 4. after tracking.

Tracking System

TPR FDR FP/Frame TP/Frame FP/Object

Detection-tracking system 98.67% 0.33% 0.003

0.99

0.003

ALVeRT

91.7% 25.5% 0.39

1.14

0.31

TABLE I: Data Set 2: March 9, 2009, 9 A.M., Urban, Cloudy

Tracking System

TPR FDR FP/Frame TP/Frame FP/Object

Detection-tracking system 93.33% 4.05% 0.127

2.8

0.042

ALVeRT

99.8% 8.5% 0.28

3.17

0.09

TABLE II: Data Set 3:April 21, 2009, 12.30 P.M., Highway, Sunny

Fig. 7: Result of tracking: left: without tracking; right: with tracking

V. CONCLUSION AND FUTURE WORK

The project builds a vehicle detection and tracking system based on monocular vision. Haar cascade is used for target generation, and a "car-light feature" is found useful for target validation, and a three stage tracking combined with Kalman tracking is used. The main valuable future work that I would like to do is to further refine the car-light features. The feature can run quite well on the two testing videos, but when applying to more datasets that are in more diverse lighting conditions and with more vehicles, it can be foreseen that some problems

will occurs, because the car-light features I developed currently are heuristic and doesn't use machine learning algorithms, and it can be generalized to build a more robust feature.

REFERENCES

[1] M. Oren C. Papageorgiou and T. Poggio. A general framework for object detection. In International Conference on Computer Vision, 1998.

[2] Yoav Freund and Robert E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. 2011.

[3] A. Khammari; F. Nashashibi; Y. Abramson; C. Laurgeau. Vehicle detection combining gradient analysis and adaboost classification. Intelligent Transportation Systems, 2005. Proceedings., pages 66?71, 2005.

[4] M.M. Trivedi S. Sivaraman. A general active-learning framework for onroad vehicle recognition and tracking. Intelligent Transportation Systems, IEEE Transactions on, 11(2):267?276, June 2010.

[5] S. Sivaraman; M.M. Trivedi. Looking at vehicles on the road: A survey of vision-based vehicle detection, tracking, and behavior analysis. Intelligent Transportation Systems, IEEE Transactions on, 14(4):1773? 1795, 2013.

[6] M. Viola, P.; Jones. Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE Computer Society Conference on, volume 1, pages 511?518, 2001.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download