Interlacing and Progressive Display
We'll wrap up our look at the basic elements of Portable Network Graphics
images with a quick consideration of progressive rendering and interlacing.
Most computer users these days are familiar with the World Wide Web and the
method by which modern browsers present pages. As a rule, the textual
part of a web page is displayed first, since it is transmitted as part of
the page; then images are displayed, with each one rendered as it comes
across the network. Ordinary images are simply painted from the top down,
a few lines at a time; this is the most basic form of progressive display.
Some images, however, are in a format that allows them to be rendered as an
overall, low-resolution image first, followed by one or more passes that
refine it until the complete, full-resolution image is displayed. For
GIF and PNG images this is known as interlacing. GIF's approach has
four passes and is based on complete rows of the image, making it a
one-dimensional method. First every eighth row is displayed; then every
eighth row is displayed again, only this time offset by four rows from the
initial pass. The third pass consists of every fourth row, and the final
pass includes every other row (half of the image).
PNG's interlacing method, on the other hand, is a two-dimensional scheme
with seven passes, known as the Adam7 method (after its inventor, Adam
Costello). If one imagines the image being broken up into
8 × 8-pixel
tiles, then the first pass consists of the upper left pixel in each tile--that
is, every eighth pixel, both vertically and horizontally. The second
pass also consists of every eighth pixel, but offset four pixels to the right.
The third pass consists of two pixels per tile, offset by four rows from
the first two pixels (see Figure 8-4a). The fourth pass contains four pixels in each tile, offset two columns
to the right of each of the first four pixels, and the fifth pass
contains eight pixels, offset two rows downward (see Figure 8-4b).
The sixth pass fills in the remaining pixels on the odd rows (if the
image is numbered starting with row one), and the seventh pass
contains all of the pixels for the even rows. Note that, although I've
described the method in terms of 8 × 8 tiles, pixels for any given
pass are stored as complete rows, not as tiled groups. For example,
the fifth pass consists of every other pixel in the entire third row
of the image, followed by every other pixel in the seventh row, and so
on.
The primary benefit of PNG's two-dimensional interlacing over GIF's
one-dimensional scheme is that one can view a crude approximation of the
entire image roughly eight times as fast.[66]
That is, PNG's first pass consists of one sixty-fourth of the image
pixels, whereas GIF's first pass consists of one-eighth of the
data. Suppose one were to save a palette image as both an interlaced
GIF and an interlaced PNG. Assuming the compression ratio and download
speeds were identical for the two files, the PNG image would have
completed its fourth pass as the GIF image completed its first. But
most browsers that support progressive display do so by replicating
pixels to fill in the areas that haven't arrived yet. For the PNG
image, that means each pixel at this stage represents a 2 × 4
block, whereas each GIF pixel represents a 1 × 8 strip. In other
words, GIF pixels have an 8-to-1 aspect ratio, whereas PNG pixels are
2-to-1. At the end of the next pass for each format (GIF's second
pass, PNG's fifth; one-quarter of the image in both cases), the PNG
pixels are square 2 × 2 blocks, while the GIF pixels are still
stretched, now as 1 × 4 strips. In practical terms, features in the
PNG image--particularly embedded text--are much more recognizable
than in the GIF image. In fact, readability testing suggests that text
of any given size is legible roughly twice as fast with PNG's
interlacing method.
JPEG also supports a form of progressive display, but it is not
interlacing in the usual sense of reordering the pixels spatially.
Rather, it involves reordering the frequency components that make up a
JPEG image, first displaying the low-frequency ones and working up to
the highest frequency band; this is known as spectral selection.
In addition, progressive JPEG can transmit the most significant bits
of each frequency component earlier than the less significant ones, a
feature known as successive approximation that is very nearly
the same as turning up the JPEG quality setting with each scan. The
two approaches can be used separately, but in practice they are almost
always used in combination. Because JPEG operates on 8 × 8 blocks
of pixels, progressive JPEG bears a strong resemblance to interlaced
PNG during the early stages of display, though it tends to have a
softer, fuzzier look due to the initial lack of high-frequency
components (which is often deliberately enhanced by smoothing in the
decoder). This is visible in
Figures C-4a and C-4b in the color
insert, which represent the second pass of a progressive JPEG image
(26% of the compressed data), both unsmoothed and smoothed. Note in
particular the blockiness in the shadowed interior of the box and the
``colored outside the lines'' appearance around the child's arms and
hands; the first effect is completely eliminated in the smoothed
version, and the second is greatly reduced. JPEG's first pass is
actually more accurate than PNG's, however, since the low-frequency
band for each 8 × 8 pixel block represents an average for all 64
pixels, whereas each 8 × 8 block in PNG's first pass is represented
by a single pixel, usually in the upper left corner of the displayed
block. By its fifth pass, which represents only 40% of the compressed
data, the progressive JPEG version of this image
(Figure C-4c) is
noticeably sharper and more accurate than all but the final pass of
the PNG version. Keep in mind also that, since the PNG is lossless and
therefore 11 times as large as the JPEG, 40% of the compressed JPEG
data is equivalent to only 3.5% of the PNG data, which corresponds to
the beginning of PNG's third pass. This only emphasizes the point made
previously: for non-transparent, photographic images on the Web, use
JPEG.
Note that smoothing could be applied to the early passes of interlaced PNGs
and GIFs, as well; tests suggest that this looks better for photographic
images but maybe not as good for simple graphics. (On the other hand,
recall that smoothing did seem to enhance the readability of early interlace
passes in Figure 1-4.)
As for representing blocks
by the pixel in the upper left corner, it would be possible to replicate each
pixel so that the original would lie roughly at the center of its clones, as
long as some care were taken near the edges of the image. This would prevent
the apparent shift in some features as later passes are displayed.
But neither
smoothing nor centered pixel replication is currently supported by the PNG
reference library, libpng, as of version 1.0.3.
It is worth noting that TIFF can also support a kind of interlacing, although
like everything about TIFF, it is much more arbitrary than either GIF's or
PNG's method. Baseline TIFF includes the concept of strips, each of
which may include one or more rows of image data though the number of rows
per strip is constant. A list of offsets to each strip is embedded within
the image, so in principle one could make each strip a row and do GIF-style
line interlacing with any ordering one chose. But since TIFF's structure is
fundamentally random access in nature, this approach would only work if one
imposed certain restrictions on the locations of its internal directory, list
of strip offsets, and actual strip data--that is, one would need to define
a particular subformat of TIFF.
In addition, libtiff supports a TIFF extension called tiles, in
which the image data is organized into rectangular regions instead of
strips. Since the tile size can be arbitrary, one could define it to
be 1 × 1 and then duplicate PNG's Adam7 interlacing scheme
manually--or even extend it to 9, 11, or more passes. However,
since every tile must have a corresponding offset in the TIFF image
directory, doing something like this would at least double or triple
the image size. Also, TIFF's compression methods apply only to
individual strips or tiles, so there would be no real possibility of
compression aside from reusing tiles in more than one location (that
is, by having multiple tile offsets point at the same data). And, as
with the strip approach, this would require restrictions on the
internal layout of the file. Nevertheless, the capability does exist,
at least theoretically.
|