A wide variety of techniques exist for Web-based interactive visualization; the simplest and most familiar are basic client-side rendering systems, in which high-level 3D object descriptions are transmitted to the client for rendering. This class includes most notably VRML [3] and systems that extend it [8]. For complex geometry and large datasets, the transmission cost of these systems is high, as is the client cost if high-quality images are desired.
A number of techniques have been proposed to reduce transmission costs and speed up software rendering. Geometry compression methods include quantized representations [6] and decimation of polygon meshes [11]; however even these lossy schemes are unlikely to achieve compression ratios high enough to permit reasonably fast transmission of the datasets we are considering. Techniques for faster rendering in software abound; these include adaptive rendering [7], incremental rendering [10] [14], and the Shear-Warp technique [13] [10] and others [22] for fast volume rendering.
An attractive strategy for Web-based interactive 3D visualization is the use of a powerful server as a render engine that generates and transmits still images of the 3D scene specified by the client. This technique is used in [20] and [1]. A related approach described by Levoy [16] performs high- and low-quality renderings on the server and only a low-quality rendering on the client. A compressed difference image is sent to the client to reconstruct the high-quality rendering. However we wish to avoid any 3D rendering on the client, as well as having to transmit large 3D datasets initially. The drawback of all such server-based systems is that unless the number of clients and the types of interaction are limited, the server can be flooded with requests from multiple clients, or even a single client supporting extensive interaction. In addition, since multiple individual frames must be transmitted in the course of a single interactive session, the transmission costs can become large.
Image-based techniques transmit a set of 2D images which are
interpolated to reconstruct specified 3D views [4]. A
critical property of these techniques is that the cost of
interactively viewing the scene is independent of scene complexity,
since the 2D images are pre-computed (in the case of synthetic images)
or digitized from photographs. Image-based rendering forms the basis
for Apple Computer's QuickTime VR format [5], which allows
viewing of a panoramic environment. Light field rendering
[17] allows more extensive viewing from arbitrary camera
positions by treating the 2D images as slices of the 4D light field,
which is resampled to construct new views. McMillan and Bishop
[19] propose plenoptic modeling as a framework for
understanding image-based systems, and describe a rendering system for
arbitrary viewpoints based on this concept. Gortler et al. [9]
perform image warping from a Layered Depth Image (LDI), whose
representation is very similar to our MLI representation. However both
the LDI system and the other image-based techniques support only
navigation within a fixed environment; we wish to support the other
types of interaction described in the introduction. If extensive
navigation is not required, then multilayer images can be used to
provide high-quality interactive 3D visualization at very low
transmission cost, and without the need for client-side graphics
hardware or costly rendering in software.