Computer Vision with OpenCV

Image Enhancement in the Spatial Domain

The principle objective of enhancement is to process an image so that the result is more suitable than the original image for a specific application. Enhancement techniques are very much problem oriented. Image enhancement falls into two broad categories; spatial domain methods and frequency domain methods. The term spatial domain refers to the image plane itself, approaches in this category are based on direct manipulation of pixels in an image. Frequency domain processing techniques are based on modifying the Fourier transform of an image. There is no general theory of image enhancement. When an image is processed for visual interpretation, the viewer is the ultimate judge of how well a particular method works.

Spatial domain methods are procedures that operate directly on pixels composing an image. Spatial domain processes will be denoted by the expression;

g(x,y) = T [ f(x,y) ] --> (1)

Where g(x,y) is the output image, f(x,y) is the input image and T is an operator on f, defined over some neighborhood of f(x ,y ). In addition T can operate on set of images.

The principle approach in defining a neighborhood about a point (x,y) is to use a square or rectangular sub image area centered at (x,y) as shown in figure 3.1.

Figure 3.1 – A 3*3 neighborhood about a point (x,y) in an image.

The center of the sub image is moved from pixel to pixel starting, say, at the top left corner. The operator T is applied at each location (x,y) to yield the output, g, at that location. The process utilizes only the pixels in the area of the image spanned by the neighborhood.

The simplest form of T is when the neighborhood is of size 1 * 1. In this case, g depends only on the value of f at (x,y), and T becomes a gray-level transformation function of the form;

s = T ( r ) --> (2) where r = f (x,y ) and S = g(x,y)

Figure 3.2 – Gray level transformation functions for contrast enhancement

For example, if T(r) has the form shown in figure 3.2 (A), the effect of this transformation would be to produce an image of highest contrast than the original by darkening the levels below m and brightening the levels above m. This technique is known as contrast stretching. In the limiting case shown in figure 3.2 (B), T (r) produce a two level (binary) image. This technique is known as thresholding. Larger neighborhood provides greater flexibility and this technique implemented using so called masks (filters, kernel, template or windows). Basically, a mask is a small 2D array, in which the values of the mask coefficients determine the nature of the process, such as image sharpening. Enhancement techniques based on this type of approach often are referred to as mask processing or filtering.

Some Basic Gray Level Transformations

We continue to the discussion based on the equation (2) described in the previous section (s = T ( r )). Since we are dealing with digital quantities, values of the transformation function typically are stored in a 1D array and the mapping form r to s are implemented via the table lookups. For an 8 bit environment, a lookup table containing the values of T will have 256 entries. As an introduction to gray level transformations, consider the figure 3.3, which shows the three basic types of functions used frequently for image enhancement; linear (negative and identity transformations), logarithmic ( log and inverse log transformations ) and power law (n^th power and n^th root transformation).

Figure 3.3 – Some basic gray level transformation functions

Image Negatives

The negative of an image with gray levels in the range, [0, L-1] is obtained by using the negative transformation shown in figure 3.3, which is given by the expression

s =L-1-r --> (3)

Reversing the intensity levels of an image in this manner produces a equivalent of photographic negative. This type of processing is particularly suitable for enhancing white or gray detail embedded in dark regions of an image, especially when the black areas are dominant in size.

Figure 3.4 – Original and Negative mammogram

Log Transformations

The general form of log transformation is shown in figure 3.3 and it is expressed using the equation given below;

s =c log (1+r) --> (4) where c is a constant, and it is assumed that r>=0.

The shape of the log curve in figure 3.3 shows that this transformation maps a narrow range of low gray-level values in the input image into a wider range of output levels. The opposite is true of higher values of input levels. We use this type of transformation to expand the values of dark pixels in an image while compressing the higher level values. The opposite is true of the inverse log transformation. The log function has the important characteristic that is compresses the dynamic range in images with large variations in pixel values.

The classic illustration of an application in which pixel values has large dynamic range is the Fourier spectrum. As an illustration of log transformation, figure 3.5 (left) shows a Fourier spectrum with values in the range 0 to 1.5*10⁶. When these values are scaled linearly for display in an 8 bit system, the brightest pixels will dominate the display, at the expense of lower values of the spectrum (figure 3.5(right)).

Figure 3.5 – Results of applying log transform (right) to the Fourier spectrum (left) when c=1

Power Law Transformation

The power law transformations have the basic form;

s = c γ^r --> (5) where c and γ are positive constants. Sometimes this equation is written as s = c (γ +e )^r to account for an offset.

As in the case of log transformation, power law curves with fractional values of γ map narrow range of dark input values into a wider range of output values, with the opposite being true for higher values of input levels. Unlike the log function, a family of possible transformation curves obtained simply by varying γ as shown in figure 3.6. The curves generated with values of γ>1 have exactly the opposite effect as those generated with values of γ<1.

Figure 3.6 – Plots of the equation s = c γ^r for various values of γ.

A variety of devices used for image capture, printing and image display respond according to the power law. By convention, the exponent in the power law equation is referred to as gamma. The process used to correct this power-law response phenomenon is called gamma correction. Gamma correction is important if displaying an image accurately on a computer screen of is concern. Images that are not corrected properly can look either bleached out, or what is more likely, too dark. Trying to reproduce colors accurately also requires some knowledge of gamma correction because varying the values of gamma correction changes not only brightness, but also the ratio of red to green to blue. Figure 3.7 shows the effect of gamma increasing.

Figure 3.7 – (A) Areal image and (B)-(D) results of applying the transformation in power law transformation with c=1 and γ=3.0, 4.0, 5.0 respectively

Piece wise Linear Transformation Functions

A complementary approach to the methods discussed in the previous three sections is to use piecewise linear functions. The principle advantage of the piece wise linear functions over the types of functions we have discussed thus far is that the form of piece wise functions can be arbitrary complex. The disadvantage of piece wise transformation is that their specification requires considerably more user input.

Contrast Stretching

Low contrast images can results from poor illumination, lack of dynamic range in the imaging sensor, or even wrong setting of a lens aperture during image acquisition. The idea behind contrast stretching is to increase the dynamic range of the gray levels in the image being processed. Figure 3.8 shows the contrast stretching and the location of points (r₁, s₁) and (r₂,s₂) control the shape of the transformation function

Figure 3.8 – Contrast stretching (A) form of transformation function (B) a low contrast image (C) result of contrast stretching (D) result of thresholding

If r₁=s₁and r₂=s₂, the transformation is a linear function that produces no changes in gray levels. If r₁ = r₂, s₁=0 and s₂ = L-1, the transformation becomes a thresholding function that creates a binary image. Intermediate values of (r₁, s₁) and (r₂,s₂) produce various degrees of spread in the gray levels of the output image, thus affecting its contrast.

Gray Level Slicing

Highlighting a specific range of gray levels in an image often is desired. One approach for gray level slicing is to display a high value for all gray levels in the range of interest and low value for all other gray levels. This produces a binary image. Another approach for gray level slicing is brightness the desired range of gray levels but preserves the background and gray level tonalities in the image.

Bit Plane Slicing

Instead of highlighting gray level ranges, highlighting the contribution made to total image appearance by specific bits might be desired. Suppose that each pixel in an image is represented by 8 bits. Imaging that the image is composed of eight 1-bit planes, ranging from bit plane 0 for least significant bit to bit plane 7 for the most significant bit. In terms of 8-bit bytes, plane 0 contains all the lowest order bits in the bytes comprising the pixels in the image and plane 7 contains all the high order bits (figure 3.9).

Figure 3.9 – Bit plane representation of 8 bit image

Higher-order bits contain the majority of the visually significant data. The other bit planes contribute to more subtle details in the image. Separating a digital image into its bit planes is useful for analyzing the relative importance played by each bit of the image, a process that aids in determining the adequacy of the number of bits used to quantize each pixel. Also, this type of decomposition is useful for image compression.

Histogram Processing

Wednesday, April 17, 2013

Image Enhancement in the Spatial Domain