A wide variety of techniques exist for Web-based interactive visualization; the simplest and most familiar are basic client-side rendering systems, in which high-level 3D object descriptions are transmitted to the client for rendering. This class includes most notably VRML  and systems that extend it . For complex geometry and large datasets, the transmission cost of these systems is high, as is the client cost if high-quality images are desired.
A number of techniques have been proposed to reduce transmission costs and speed up software rendering. Geometry compression methods include quantized representations  and decimation of polygon meshes ; however even these lossy schemes are unlikely to achieve compression ratios high enough to permit reasonably fast transmission of the datasets we are considering. Techniques for faster rendering in software abound; these include adaptive rendering , incremental rendering  , and the Shear-Warp technique   and others  for fast volume rendering.
An attractive strategy for Web-based interactive 3D visualization is the use of a powerful server as a render engine that generates and transmits still images of the 3D scene specified by the client. This technique is used in  and . A related approach described by Levoy  performs high- and low-quality renderings on the server and only a low-quality rendering on the client. A compressed difference image is sent to the client to reconstruct the high-quality rendering. However we wish to avoid any 3D rendering on the client, as well as having to transmit large 3D datasets initially. The drawback of all such server-based systems is that unless the number of clients and the types of interaction are limited, the server can be flooded with requests from multiple clients, or even a single client supporting extensive interaction. In addition, since multiple individual frames must be transmitted in the course of a single interactive session, the transmission costs can become large.
Image-based techniques transmit a set of 2D images which are interpolated to reconstruct specified 3D views . A critical property of these techniques is that the cost of interactively viewing the scene is independent of scene complexity, since the 2D images are pre-computed (in the case of synthetic images) or digitized from photographs. Image-based rendering forms the basis for Apple Computer's QuickTime VR format , which allows viewing of a panoramic environment. Light field rendering  allows more extensive viewing from arbitrary camera positions by treating the 2D images as slices of the 4D light field, which is resampled to construct new views. McMillan and Bishop  propose plenoptic modeling as a framework for understanding image-based systems, and describe a rendering system for arbitrary viewpoints based on this concept. Gortler et al.  perform image warping from a Layered Depth Image (LDI), whose representation is very similar to our MLI representation. However both the LDI system and the other image-based techniques support only navigation within a fixed environment; we wish to support the other types of interaction described in the introduction. If extensive navigation is not required, then multilayer images can be used to provide high-quality interactive 3D visualization at very low transmission cost, and without the need for client-side graphics hardware or costly rendering in software.