Section 2



Section 2.1.2 Realtime 3D reconstruction of Objects and background (JYB)

1 page - Other Relevant Research

3D scanning techniques



There are mainly two types of scanning techniques: 1- laser triangulation scanner and 2- Time of flight laser scanner. In the first case, a laser stripe is projected on the object to model, and scanned across it. The three dimensional shape of the object is then inferred from geometrical triangulation. This technique is very successful in retrieving accurate three dimensional shapes of small objects. When dealing with large sceneries (up to 100 meters large), a time of flight scanning technique may be used. It consists of estimating the depth at a given pixel from the travel time of a laser pulse (from the active device to the scene). Marc Levoy at Stanford is working with Brian Curless on a project of building 3D model of Michelangelo’s statues in Italy. This project includes work on merging multiple scans together in order to retrieve complete model.

The main strength of those scanning methods is accuracy. Some of them can resolve details of the order of 0.25mm over a meter range.

The main weaknesses: They require lots of hardware (expensive), and do not work real time.

At caltech, we have been developing a scanning system using an LCD projector as active device. The principle remains the same as the one for laser triangulation scanners. The larger effort is now being dedicated into developing powerful tools for integrating multiple 3D scans together in order to built complete object models. The main actors in that work are Peter Schroder (in computer science), Pietro Perona, and Jean-Yves Bouguet.

At Caltech, Jean-Yves Bouguet and Pietro Perona developed a method for acquiring three dimensional data using very little hardware. The principle of this new technique is based on shadows. The user casts a shadow on the scene using a stick. Then, the 3D shape may be estimated by simple triangulation. The major strength of that method is its simplicity and low cost. Indeed, the shadow motion does not have to be uniform throughout scanning. Therefore, there is no need of complicated calibrated hardware. In addition, the 3D surface shape is directly inferred from the way the edge of the shadow deforms onto the scene. We called that type of 3D acquisition system “weak structured lighting”.

The main weakness: It is not real time.

Multiresolution modeling

A large effort in computer science has been put into developing tools for modeling objects at multiple resolutions. Intuitively, if an object is located far away from the imaging system (camera), then a coarse representation of it is usually sufficient for good rendering (few faces). However, as the same object moves closer to the camera, it becomes crucial to use a finer model in order to avoid visible artifacts on the images. It might be useful to integrate those components into the final system. There is a large research activity in that field at Microsoft Research (Hugue Hoppe) and other universities (Peter Schroder at Caltech).

Real time teleconferencing project at UNC

The University of North Carolina at Chapel Hill is developing a real time teleconferencing system where the face of the user is 3D scanned in real time. Their system uses an LCD projector that projects at very high speed (approx. 200Hz) successive stripe patterns on the user (at different resolution). The way they envision that research is to turn their system into a real time teleconferencing system where the 3D model would be sent to over to the net. They are still working on the implementation details. So far, thee resolutions they can achieve is rather poor.

Weakness: Very hardware demanding, and invasive, it seems this approach does not take advantage of tracking clues in the scene (at every time instant, the scene is completely scanned)

Image Based Rendering work

A number of actors in computer science work on alternative approaches to the one of constructing full 3D model of the scene in order to generate images.

For example, Marc Levoy at Stanford has developed a technique called “Lumigraph” that allows to render complex 3D scenes by storing the light field defined by the entire scene.

The major strength of such an approach is speed of rendering

Weakness: In order to store the light field, a large image database is necessary.

Structure from Motion (JYB)

If a still camera observes a moving textured object in space, then there are geometrical methods for computing its 3D shape as well as its motion.

This class of techniques called Structure from Motion consists of tracking feature points on the images of the sequence, and infer from the observed image flow the underlining motion of the rigid scene, as well as the geometry (shape)

At Caltech, there has been a large effort in developing such schemes, especially useful for visual navigation. The main actors involved in that project are: Stefano Soatto, Xiaolin Feng, Jean-Yves Bouguet and Pietro Perona.

Strength: Does not require any prior knowledge about neither the structure of the scene, nor the motion.

Weakness: To apply it to small moving object, it is necessary to first segment the out the object of interest. In addition, the object must be reasonably textured to be able to track feature points on its surface.

---

2 pages -

Description of modules available for sensing

(INTRO) (common with section 2.1.2)

There are several known methods of extracting various types of information from image streams useful for the purpose of tracking. These can be incorporated and further developed to be part of a system capable of tracking and segmenting the upper-body and hands, as well as objects brought into the scene. Below is a brief description of the methods we wish to experiment with, evaluate, and possibly integrate into our system.

3D stereo triangulation (JYB)

This method, well known in the computational vision community, is commonly used to extract 3D models of objects. This technique consists of detecting and matching point features attached to the object, across two or more images acquired by cameras, and compute the locations of the corresponding points in space by geometrical triangulation.

Main strength: Non invasive technique (do not require active device)

Weakness: Works poorly with insufficiently textured objects - Stereo triangulation is mostly effective in short range distances: if the objet is far away from the viewer, this technique generally fails to extract accurate depth maps.

Existing product: Takeo Kanade at CMU has built a real time stereo camera system that uses 5 cameras. This system allows to recover a reasonable depth map of the environment.

One possible application of such a system could be building a layer map of the environment (desk+background) allowing to deal with occlusions between the users (moving in the scene) and the scene itself. For background recovery, an accurate 3D model extraction may not be necessary (given that the cameras will be most of the time stationary).

Structured Lighting, weak structured lighting (JYB)

In order to retrieve accurate, and dense 3D models of objects, it is very often necessary to use active lighting techniques for scanning. We have developed at Caltech a complete active lighting system for object scanning. We are currently working on building blocks related to merging of multiple scans together in order to build complete models.

We also have developed a method for scanning objects using shadows (weak structured lighting). This could constitute one alternative to more expensive techniques for acquiring the three dimensional map of the background.

Video mosaicing and image-based panoramic reconstruction (JYB)

A number of product are available to build panoramic images of surrounding scenes. This module may be useful for acquiring the background image. There are quite a few existing products on the market.

Structure from motion (JYB)

-------------

2 pages - chart of possible methods for real time 3D reconstruction of objects and background

Objects:

Most 3D scanning systems do not operate real time. However, it is conceivable to acquire an initial 3D model of m the objects of interest, and update their 3D poses and positions using 2D visual tracking (on the images).

There exists a number of visual clues useful for 2D object tracking: features points attached to the object, outline (occluding contour), shades…

We believe that using stereo vision would significantly help the tracking of objects in 3D (from sparse sets of 3D points).

Background:

Since the camera(s) will remain stationary with respect to the background scene, a first approach consists of acquiring an initial image of the scene without the users in the field of view of the cameras and use that image as background image during run time.

The background image would then have to be sent only once (at the beginning of the transmission).

The main drawback of such an approach is that the users cannot fully interact with the three dimensional environment. For example, while moving in the room, the user may, at times, be partially occluded by walls, or pieces of furniture present in the scene. In such cases, the rendering engine at the receiver should account for those occlusions in order to generate realistic images of the scene (the human visual system is highly sensitive to occlusion clues).

We can see two approaches to deal with such a situation. The first one would consist of considering potential occluding elements in the scene as objects on their own, and treat them separately (see object description). That constitutes the first iteration that we envision for the system.

The second approach would consist of using some three dimensional representation of the scene background. This representation could be very crude (a small sets of planes in space defining the walls, floor, tables,..) or more elaborate (a complete three dimensional model). In all those approaches, it would still be necessary to texture map an original high resolution color image onto the 3D models in order to generate realistic looking views.

-----------

Roadmap of development.

stages of increasing difficulty

1) Use one object, and segment it from the scene (using color clues for example), and directly transmit the image (assuming that the

2) Use a single scanned object (rigid), and update in real time its location and pose as the user moves it in front of the camera. A first version of that system would use a single camera, a possibly an object with a significantly different color than the background (for easy segmentation). The model would be sent only once and then the 3D location and pose.

3) Use multiple objects (still scanned). The difficulties are then: segmenting the objects from the visual scene, and recognizing them among a set of known objects. Different colors may still be used for simplifying the segmentation. Problems such as occlusions, or disappearance will have to be dealt with. One way is to take advantage of stereo vision in order to identify the different objects (using depth clues).

4) Track multiple objects that have not been scanned. The main difficulty is in building on the fly the 3D models. We believe that stereo vision is absolutely required in that case (especially if there is no prior knowledge about the object size, and shape).

----------------

1/4 page bios.

Jean-Yves Bouguet is currently a PhD student at the California Institute of Technology, expecting to graduate December 1998. His thesis work deals with the problems of building three-dimensional model of real scenes using vision clues. He has previously worked in collaboration with Pietro Perona (professor at Caltech) on passive clues for 3D modeling, such as image flow. This technique may be applied to visual navigation: in the case where a vehicle moves with an unknown motion in a rigid environment (of unknown structure). The principle of this technique is to track on the images feature points attached to the scenery, and then infer their 3D positions. More recently, he has been investigating active lighting approaches to extract dense and accurate 3D models. The most recent scanning method he invented in collaboration with Pietro Perona is based on shadows. This system does not require any specific hardware except a stick used to cast a shadow on the scene to scan.

This technique is called “weak structured lighting”. He has also been working in collaboration with the Jet Propulsion Laboratory (JPL) on a project related to comet fly by and landing using visual information.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download