Structure From Motion in Helicopter Flight



Structure From Motion in Helicopter Flight

Tim Lee

Varun Ganapathi

Eric Berger

Abstract

A method is presented for enabling an autonomous helicopter to use camera imagery as its primary sensor for both mapping of and localization within its environment. Structure from motion with a wide-angle lens over a set of SIFT features extracted from the video stream is used to map an area in three dimensions and localize the aircraft within that map. Several methods are also examined for resolving the scale of structures within the environment which are well-suited to use onboard the helicopter.

Introduction

The problem of extracting spatial structure from a video stream would be fairly straightforward in a world of perfect feature detection, correspondence, and scale estimation. Unfortunately, all of these can be very difficult in real-world situations, and resolving these problems for a specific application often requires substantial effort. Developing a robust set of tools and methods tailored to video streams taken from a wide-angle camera on board a model helicopter would enable easy integration of visual mapping and navigation into existing and new helicopter projects, which would be useful for situations such as flight in areas without reliable GPS, flight in areas where are un-known and vision is the best way to detect and avoid them, mapping applications, or autonomous landing-site identification.

The problem to be addressed can be divided into three areas: Feature detection and correspondence, localization and map construction, and scale estimation. Effectiveness of feature detection and correspondence algorithms is highly dependent on the type of scene in which they are to be performed. Although many feature detectors exist, it is far from obvious which detectors will offer the best combination of real-time computation, frame-to-frame consistency, invariance to angle changes and lens-distortion, and amenability to easy and error-resistant correspondence matching during flight. We show that SIFT features are a very appropriate choice for this type of localization in an unstructured environment [6].

Similarly, there are many assumptions to be made in constructing a spatial map from noisy video data in order to reject reconstructions that are clearly erroneous. Determining which of those assumptions are likely to hold for footage taken from a helicopter will make the resulting mapping and localization process more robust and efficient than a general-purpose structure from motion algorithm, since the types of errors that are most worrisome for the purposes of keeping a helicopter aloft are significantly different from the types of errors that are a problem for scanning a model in a laboratory. Additionally, the use of a wide-angle lens makes the solution of the structure from motion problem under perspective necessary, which is significantly harder than in cases where an orthographic projection is a good approximation [5]. Bundle adjustment is one solution to the problem of projective structure from motion; however, because of the size of the minimization problem involved it can be difficult to compute quickly. In order to speed the minimization, information from a simulator of the helicopter dynamics will be used to obtain accurate initial guesses for helicopter motion and reduce the number of iterations needed for convergence.

Finally, the scale estimation problem is one that doesn’t have a solution within the framework of traditional structure from motion, although solving it is absolutely essential to the use of a vision-based navigation system for a helicopter. Identifying clues from the image stream itself, as well as from other sources, that will work in unstructured outdoors environments is key to ensuring the applicability of this method to real-world problems in autonomous aviation. In particular, the integration with the dynamics simulator will enable us to see prevent our estimate of scale of movement from drifting over time.

We plan to combine existing feature detectors and structure from motion algorithms with heuristics for scale estimation, with a focus on obtaining data suitable for use in a helicopter control system. We believe that we will be able to create a package which can robustly estimate the structure of the environment and the position of the camera within it for the types of images and movement patterns typically seen in helicopter flight. Because there are significant constraints on the types of motion and types of scene that the system will need to process, we expect to be able to leverage that and obtain speed and accuracy better than that of more general-purpose structure from motion algorithms. In addition, because of the wide-angle lens system that we are using we hope to be able to track individual features and feature patterns over a relatively long distance, in order to ensure that different portions of our constructed map will be consistent with one another in terms of position, orientation, and scale.

Background

Although onboard vision is potentially one of the most powerful types of sensors for airborne vehicles, it is still not an easy task to integrate a vision processing system into an existing helicopter platform. The majority of cameras mounted on small aircraft have been used for applications where no processing is necessary, specifically for obtaining pictures and videos and for allowing remote human control of the craft. Vision has been successfully used in conjunction with GPS and inertial measurement units for landing assistance [1, 2]. All successful landing demonstrations, however, have relied on the use of known landing pads, which reduces the vision-processing component of the problem to a simple fitting of known geometry to an image from a calibrated camera. CMU has developed a “visual odometer” system onboard, using custom hardware and stereo cameras, which was used in conjunction with GPS and IMU measurements to track the helicopter’s movement through an environment [3].

Our Approach

Our major areas of focus in this work are feature extraction and the structure from motion under noisy perspective projection algorithms. In order to decouple these two problems from one another and allow us to optimize each of them independently, we are also looking into the use of semi-structured environments with easily identifiable markers at unknown locations to serve as a proxy for ideal feature detection.

Our overall goal for the algorithm is to maintain a map of visual features, along with estimates of their positions in 3d space, which we can update with each new frame, and which we can also use to update our position estimates. To do this, it is important for us to have features which can be observed in the same location over the course of significant translations and rotations of the helicopter. We would also like to use features which can be distinguished from one another to make the correspondence problem more robust to large motions in image-space between successive update steps.

One type of feature that matches these criteria very well is intentionally placed visual markers. Even without prior knowledge of their placement, the existence of sharp unmistakable corners allows us to rely more on the accuracy of our feature detection algorithm across a wide variety of viewing situations, and substantially reduces our reliance on specifics of the environment in which we are operating. Because a marker provides several different corner and edge features, it also allows us to solve the correspondence problem without relying heavily on temporal stability in image space, which is crucial for robustness of the algorithm to visual obstructions, temporary video-stream loss, or rapid movements of the helicopter. Ideally, however, our system would not rely on any type of pre-existing markers, and could work from observations of its environment.

In order to obtain many of the same advantages of a marker-based system, we propose using scale invariant feature transforms, or SIFT features, to detect regions of the captured video which will be easy to locate in subsequent frames. One of the most important characteristics of the SIFT features is their encoding of the brightness pattern in their local area, which allows them to be distinguished from one another. This property makes the construction of a map large enough that not all markers are in view simultaneously possible, because it will be possible to recognize features when they are seen again from a combination of expected position and expected appearance. SIFT features also have the advantage of being relatively robust to changes in lighting or camera orientation, so that we can rely on a given feature being available to use for a sufficiently long time that we can localize it accurately and use that data to help determine the helicopter’s position and orientation.

After feature detection and correspondence, which can be achieved either through use of markers or through the SIFT feature detector, we still need to construct a 3d graph of the environment and embed the observed features within it. In order to do this, we propose a system with three phases. First, we use the best features shared by a sequence of successive frames to attempt to extract structure from motion using either bundle adjustment or the multiframe perspective structure from motion technique proposed by Oliensis[4]. Secondly, we use some simple heuristics to compare our computed structure with that predicted by our pre-existing map data. Using that to check for evidence of incorrect correspondences, and also possibly making other assumptions about the orientation and motion of the helicopter and the relationship of features to one another, we would attempt to detect any faulty correspondences and remove them so that we could find a more accurate estimate of the spatial location of the observed features. Finally, for points which appear to be proper correspondences, we use the predicted locations to update the positions of the features in our world map so that over time it will become more accurate and self-consistent. We also update the position of the helicopter at this point, by using a combination of predicted motion from our simulator of helicopter behavior and observed change in pose from the structure from motion algorithm. This allows us to resolve the scale problem and to keep relatively accurate estimates in the face of faulty image data with no additional on-board sensors on the helicopter.

It is during these final steps of filtering for bad matches and updating the world-model that our knowledge of the target application will really come into play. We will attempt to capture the knowledge of what situations are likely for the helicopter to encounter and which are very unlikely in order to keep the output from the method as reasonable as possible even in the event of errors in image acquisition, feature detection, or correspondence matching.

Finally, we have to address the issue of scale estimation. There are several possible solutions to this problem, and which is best depends on what resources are available. The simplest solution would be in the case where the camera system is mounted on a fully instrumented helicopter and helicopter pose information is already fully known. In this case, the structure from motion problem becomes well-posed, and the problem of scale estimation disappears because the camera trajectory is completely known. The next category of solutions is one where limited sensor data is available, maybe just an accelerometer, GPS unit, altimeter, or laser range-finder. Using a Kalman filter to compare the scale of the motions seen by the camera and those seen by the other sensors, we will be able to prevent drift in scale estimation. Because the maps constructed during flight will be relatively static over time and include scaled distances between features, this drift would be quite slow, and would be limited mainly to scale estimation in unseen areas, not in the same area over time. Estimation will most likely come from comparisons of the scale of motions predicted by the helicopter simulator and the image changes that are observed in flight. Finally, estimation could come entirely from visual clues, although this would require more knowledge of the environment. In the case of visual markers of known size, it becomes easy to use the size of the markers to calibrate the scale for the rest of the scene, but, in the case of completely unstructured environments, knowledge of the scale of some environmental feature is necessary for proper scale estimation.

Current Results

We currently have put together a small monochrome camera with a wide-angle lens and an analog transmitter which can beam the video back to the ground. We are using off-board processing with real-time analog video link from the helicopter which minimizes the amount of equipment which actually needs to be in the air for the system to be operational, and keeps the flight setup both cheap and light. We have tested the setup by hand, and will hopefully mount it on a helicopter for gathering data by the end of this week.

Using openCV, markers placed in the environment can be readily identified, and the next step is developing SIFT feature detection and tracking.  The marker detection used standard canny edge detection techniques to filter the image.  The correspondence problem is solved through prior knowledge of the marker configuration (4 white boxes arranged in an "L" pattern).  Contours are extracted from the edges, and polygons are fitted to these contours.  Probable marker locations are extracted from the set of polygons (fig 1).

[pic][pic][pic]

fig 1

Marker recognition process

Next Steps

There are several things which we need to do at this point in order to make our system fully operational. As far as the physical system is concerned, we need to mount it on the helicopter and gather some data to learn how well suited to the problem our approach actually is. We are waiting until after the helicopter team is done with the NASA demo, and hope to fly within a day or two after that.

We have marker identification code which can reliably find and correspond points, but we still need to develop the SIFT feature detection code to put in place of the marker identification to allow us to operate in unstructured environments.

We are working on our structure from motion algorithms to try to get everything running on data gathered by hand before the helicopter flight so that we will know what types of data will be good for us to obtain from the flights.

Summary and Conclusion

We are developing a package which will make it easy for anybody who wants to add visual navigation to a helicopter to do so by pairing appropriate feature detectors with a mapping program which is designed with the needs of airborne mapping taken into consideration. We still have a lot of work ahead of us, but the project is looking promising and we have every reason to believe that we will be successful by the end of the class.

Bibliography

[1] C. Sharp, O. Shakernia, and S. Sastry, “A vision system

for landing an unmanned aerial vehicle,” in In Proceedings

of IEEE International Conference on Robotics and Automation,

2001, pp. 1720-1728.

[2] S. Saripalli, J. F. Montgomery, and G. S. Sukhatme, “Visionbased autonomous landing of an unmanned aerial vehicle,” in IEEE International Conference on Robotics and Automation,Washington D.C., May 2002, pp. 2799–2804.

[3] O. Amidi, T. Kanade, and R. Miller. Vision-based autonomous helicopter research at cmu. In Proceedings of Heli Japan 98, Gifu, Japan, 1998.

[4] J. Oliensis, "A Multi--frame Structure from Motion Algorithm under perspective Projection," NECI TR April 1997

[5] J. Weng, T.S. Huang, and N. Ahuja. Motion and structure from two perspective views: algorithms, error analysis and error estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(5):451--476, 1989.

[6] S. Se, D. Lowe, and J. Little. Global Localization using Distinctive Visual Features, Proceedings of the 2002 IEEE/RSJ Intl. Conference on Intelligent Robots and Systems

EPFL, Lausanne, Switzerland, October 2002

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download