**Figure 1:** Generating an MLI: *P*1, *P*2, and *P*3 are pixels of the
image; *A*, *B*, and *C* are structure numbers between 0 and 127; *A*'
= *A* + 128 (high bit set), and *I*(*P*1, *A*) is the intensity of
structure *A* at the point where the ray passing through *P*1
intersects it. The data stream representing the image in the figure
is:

To generate the representation, we first choose a width *w* and height
*h* for the target image, and select a viewing direction and lighting
for the models. We color each of the structures in the
model white and render it separately, recording the image and the
Z-buffer for each structure's rendering. The information contained in
these 2*n* images is sufficient for producing a target image with
arbitrary color and transparency values for each structure, but this
representation is as large as 2*n* uncompressed images,
and contains a large percentage of extraneous information. Fortunately
it is relatively easy to extract only the necessary data and arrange
it in an order that leads to very simple and fast decoding.

Let be an array of pixel intensity
values from the rendering of structure *i*, and let be an array of Z-buffer values from the rendering of
structure *i*. For simplicity, we assume that the depths in the
Z-buffer are normalized so that 0 represents the front clipping
plane, some *b* > 0 represents the back clipping plane, and any value
greater than *b* represents a background pixel.

For each pixel (*x*, *y*), we sort the values from front to
back, giving a permutation such that
We then define the following *n* arrays of size :

These arrays can be interpreted as follows: each pixel in
is labeled with the structure number that appears in
that pixel if all structures are fully opaque, or zero for
background. For *i* > 1, each pixel in is labeled
with the structure number that appears in that pixel if the structures
that occupy the same pixel in are all turned off.

Finally, let to be the number of overlapping structures at
pixel (*x*, *y*), so equals the smallest *i* for which
, minus one. This allows us to define, for
each pixel, the sequence

The sequence tells us how to render pixel (*x*, *y*) given
an assignment of colors and transparencies to each of the *n*
structures. Each successive pair of elements represents a
(*structure number, intensity value*) tuple ordered from front to
back, so to compute the pixel, we need only step along the sequence,
blending the appropriate intensity of each structure's color in
proportions dictated by its transparency.

In order to store the variable-length sequences sequentially
without having to store the value for each pixel, we tag the
first element of each sequence and insert placeholders when as follows: let equal zero with the high bit set if
, and let equal with the high bit of
the first element set, otherwise. This use of the high bit explains
the requirement that . Our final representation *R*
consists of the following sequence:

Figure 1 illustrates the derivation of an MLI representation from 3D models.

The data stream *R* is a compact representation of the original
intensity and depth images; the length of *R* in bytes is:

Equivalently, letting *g* equal the number of background pixels
(i.e. pixels (*x*, *y*) for which ), and *a* equal the
mean of over all non-background pixels, we have . In other words, the length of the
representation is no more than 2*a* times greater than the length of
an uncompressed image, and in our experience, *a* tends
to be relatively small compared to *n* (it was never greater than
*n*/8 in our sample application). This is a significant improvement
over the naive strategy of sending all 2*n* and
arrays, and we will see in the next section that the
ordering of data in *R* is conducive to rapid image construction since
the front to back ordering is precomputed. Section 2.3
describes a simple compression scheme that further reduces the size
required for the representation of MLIs.

Sun Sep 7 15:06:59 PDT 1997