Wednesday, April 17, 2013

Image Enhancement in the Spatial Domain




Image Enhancement in the Spatial Domain

The principle objective of enhancement is to process an image so that the result is more suitable than the original image for a specific application. Enhancement techniques are very much problem oriented. Image enhancement falls into two broad categories; spatial domain methods and frequency domain methods. The term spatial domain refers to the image plane itself, approaches in this category are based on direct manipulation of pixels in an image. Frequency domain processing techniques are based on modifying the Fourier transform of an image. There is no general theory of image enhancement. When an image is processed for visual interpretation, the viewer is the ultimate judge of how well a particular method works.      

Spatial domain methods are procedures that operate directly on pixels composing an image. Spatial domain processes will be denoted by the expression;

g(x,y) = T [ f(x,y) ] --> (1)  

Where g(x,y) is the output image, f(x,y) is the input image and T is an operator on f, defined over some neighborhood of f(x ,y ). In addition T can operate on set of images.

The principle approach in defining a neighborhood about a point (x,y) is to use a square or rectangular sub image area centered at (x,y) as shown in figure 3.1.


Figure 3.1 – A 3*3 neighborhood about a point (x,y) in an image.

The center of the sub image is moved from pixel to pixel starting, say, at the top left corner. The operator T is applied at each location (x,y) to yield the output, g, at that location. The process utilizes only the pixels in the area of the image spanned by the neighborhood.

The simplest form of T is when the neighborhood is of size 1 * 1. In this case, g depends only on the value of f at (x,y), and T becomes a gray-level transformation function of the form;

s = T ( r ) --> (2) where r = f (x,y ) and S = g(x,y)



Figure 3.2 – Gray level transformation functions for contrast enhancement

For example, if T(r) has the form shown in figure 3.2 (A), the effect of this transformation would be to produce an image of highest contrast than the original by darkening the levels below m and brightening the levels above m. This technique is known as contrast stretching.  In the limiting case shown in figure 3.2 (B), T (r) produce a two level (binary) image. This technique is known as thresholding. Larger neighborhood provides greater flexibility and this technique implemented using so called masks (filters, kernel, template or windows). Basically, a mask is a small 2D array, in which the values of the mask coefficients determine the nature of the process, such as image sharpening. Enhancement techniques based on this type of approach often are referred to as mask processing or filtering.



Some Basic Gray Level Transformations       

We continue to the discussion based on the equation (2) described in the previous section (s = T ( r )). Since we are dealing with digital quantities, values of the transformation function typically are stored in a 1D array and the mapping form r to s are implemented via the table lookups. For an 8 bit environment, a lookup table containing the values of T will have 256 entries. As an introduction to gray level transformations, consider the figure 3.3, which shows the three basic types of functions used frequently for image enhancement; linear (negative and identity transformations), logarithmic ( log and inverse log transformations ) and power law (nth power and nth root transformation).


Figure 3.3 – Some basic gray level transformation functions

Image Negatives

The negative of an image with gray levels in the range, [0, L-1] is obtained by using the negative transformation shown in figure 3.3, which is given by the expression

s =L-1-r --> (3)

Reversing the intensity levels of an image in this manner produces a equivalent of photographic negative. This type of processing is particularly suitable for enhancing white or gray detail embedded in dark regions of an image, especially when the black areas are dominant in size. 


Figure 3.4 – Original and Negative mammogram

Log Transformations

The general form of log transformation is shown in figure 3.3 and it is expressed using the equation given below;

s =c log (1+r) --> (4) where c is a constant, and it is assumed that r>=0.

The shape of the log curve in figure 3.3 shows that this transformation maps a narrow range of low gray-level values in the input image into a wider range of output levels. The opposite is true of higher values of input levels. We use this type of transformation to expand the values of dark pixels in an image while compressing the higher level values. The opposite is true of the inverse log transformation. The log function has the important characteristic that is compresses the dynamic range in images with large variations in pixel values.  

The classic illustration of an application in which pixel values has large dynamic range is the Fourier spectrum. As an illustration of log transformation, figure 3.5 (left) shows a Fourier spectrum with values in the range 0 to 1.5*106. When these values are scaled linearly for display in an 8 bit system, the brightest pixels will dominate the display, at the expense of lower values of the spectrum (figure 3.5(right)). 



Figure 3.5 – Results of applying log transform (right) to the Fourier spectrum (left) when c=1


Power Law Transformation

The power law transformations have the basic form;
s = c γr --> (5) where c and γ are positive constants. Sometimes this equation is written as s = c (γ +e )r to account for an offset.  
As in the case of log transformation, power law curves with fractional values of γ map narrow range of dark input values into a wider range of output values, with the opposite being true for higher values of input levels. Unlike the log function, a family of possible transformation curves obtained simply by varying γ as shown in figure 3.6. The curves generated with values of γ>1 have exactly the opposite effect as those generated with values of γ<1.  

Figure 3.6 – Plots of the equation s = c γr  for various values of γ.
A variety of devices used for image capture, printing and image display respond according to the power law. By convention, the exponent in the power law equation is referred to as gamma. The process used to correct this power-law response phenomenon is called gamma correction. Gamma correction is important if displaying an image accurately on a computer screen of is concern. Images that are not corrected properly can look either bleached out, or what is more likely, too dark. Trying to reproduce colors accurately also requires some knowledge of gamma correction because varying the values of gamma correction changes not only brightness, but also the ratio of red to green to blue. Figure 3.7 shows the effect of gamma increasing. 



Figure 3.7 – (A) Areal image and (B)-(D) results of applying the transformation in power law transformation with c=1 and γ=3.0, 4.0, 5.0 respectively  

Piece wise Linear Transformation Functions

A complementary approach to the methods discussed in the previous three sections is to use piecewise linear functions. The principle advantage of the piece wise linear functions over the types of functions we have discussed thus far is that the form of piece wise functions can be arbitrary complex. The disadvantage of piece wise transformation is that their specification requires considerably more user input. 
Contrast Stretching
Low contrast images can results from poor illumination, lack of dynamic range in the imaging sensor, or even wrong setting of a lens aperture during image acquisition. The idea behind contrast stretching is to increase the dynamic range of the gray levels in the image being processed. Figure 3.8 shows the contrast stretching and the location of points (r1, s1) and (r2,s2) control the shape of the transformation function


.
Figure 3.8 – Contrast stretching (A) form of transformation function (B) a low contrast image (C) result of contrast stretching (D) result of thresholding 

If r1=s1 and r2=s2, the transformation is a linear function that produces no changes in gray levels. If r1 = r2, s1=0 and s2 = L-1, the transformation becomes a thresholding function that creates a binary image. Intermediate values of (r1, s1) and (r2,s2) produce various degrees of spread in the gray levels of the output image, thus affecting its contrast. 
Gray Level Slicing
Highlighting a specific range of gray levels in an image often is desired. One approach for gray level slicing is to display a high value for all gray levels in the range of interest and low value for all other gray levels. This produces a binary image. Another approach for gray level slicing is brightness the desired range of gray levels but preserves the background and gray level tonalities in the image. 
Bit Plane Slicing  
Instead of highlighting gray level ranges, highlighting the contribution made to total image appearance by specific bits might be desired. Suppose that each pixel in an image is represented by 8 bits. Imaging that the image is composed of eight 1-bit planes, ranging from bit plane 0 for least significant bit to bit plane 7 for the most significant bit. In terms of 8-bit bytes, plane 0 contains all the lowest order bits in the bytes comprising the pixels in the image and plane 7 contains all the high order bits (figure 3.9).

Figure 3.9 – Bit plane representation of 8 bit image

Higher-order bits contain the majority of the visually significant data. The other bit planes contribute to more subtle details in the image. Separating a digital image into its bit planes is useful for analyzing the relative importance played by each bit of the image, a process that aids in determining the adequacy of the number of bits used to quantize each pixel. Also, this type of decomposition is useful for image compression.    

Histogram Processing
    




Friday, April 12, 2013

Image Acquisition



Image Acquisition 

Human Visual System

The eye is nearly a sphere; with average diameter of approximately 20mm. There are three membranes enclosed the eye namely cornea, sclera and choroid. The lens is flexible and attached to the ciliary body. Figure 2.1 depicts the simplified horizontal cross section of the human eye.




Figure 2.1 – Cross section of Human Eye

When the eye is properly focused, light from objects outside the eye is imaged on the retina. Pattern vision is afforded by the distribution of discrete light receptors over the surface of the retina. There are two classes of receptors, cones and rods. The cones are located primarily in the central portion of the retina, called the fovea, and are highly sensitive to color. Muscles controlling the eye rotate the eye ball until the image of an object of interest falls on the fovea. Cone vision is called photopic or bright color vision. Rods serve to give a general overall picture of the field of view. They are not involved in color vision and are sensitive to low levels of illumination (dim-light vision). For the interpretation purposes, we can consider the fovea as a square sensor array of size 1.5 mm * 1.5 mm. Figure 2.2 clearly depicts how cones and rods are organized in the retina.
    




Figure 2.2 – Organization of the Rods and Cones and Optic Nerve

The major difference between the lenses in the human eye and ordinary optical lens is that the lens in the human eye is flexible. The distance between the centre of the lens and the retina (called the focal length) varies from approximately 17mm to about 14 mm, as the refractive power of the lens increases from is minimum to maximum. When the lens focuses on far object (more than 3m) it exhibits its lower refractive power and it is happen opposite way when it is focused on closer object. This information makes it easy to calculate the size of the retinal image as shown in the following example.



15 m /100 m = H/ 17mm
H = 2.55 mm

This retinal image is reflected primarily in the area of the fovea. Perception then takes place by the relative excitation of light receptors, which transform radiant energy into electrical impulses that are ultimately decoded by the brain. 


Light and the EM Spectrum

As shown in the figure 2.3, the range of color we perceive in visible light represents a very small portion of the EM spectrum.




Figure 2.3 – Visible Band extracted from EM Spectrum

The EM spectrum can be expressed in the terms of wavelength (λ), frequency or energy (ν) as shown below.

λ = c / ν --> (1) where c is the speed of light (2.998 * 108 m/s)

The energy of the various components of the EM spectrum is given by the expression;

E = h ν --> (2) where h is Planck’s constant.

Frequency is measured in Hertz (Hz), with 1Hz being equal to one cycle of a sinusoidal wave per second. The commonly used unit of energy is the electron-volt.

Electromagnetic waves are conceptually sinusoidal and formation of different wave lengths. It can be thought of as a stream of massless particles, each travelling in a wave like pattern and moving at the speed of light. Each massless particle contains a certain amount of energy and a bundle of energy is known as photon. According to the equation (2) energy is proportional to the frequency. Therefore for higher frequency bands in EM spectrum carries more energy per photon and lower frequency bands in EM spectrum carries lower energy per photon.    

Light is the particular type of EM radiation that can be visible and sense by the human eye. The visible band of the EM spectrum spans the range from approximately 0.43 micrometer (violet) to about 0.79 micrometer (red). The color spectrum divided into six broad regions; violet, blue, green, yellow, orange and red. No color ends abruptly, but rather each range blends smoothly into the next as shown in the figure 2.3. The colors that human perceive in an object are determine by the nature of the light reflected from the object. The light that is void of color is called achromatic or monochromatic light. The only attribute of such light is its intensity, or amount. The term gray level generally is used to describe monochromatic intensity because it ranges from black to gray and finally to white. Chromatic light spans the EM energy spectrum from approximately 0.43 to 0.79 micrometers, as noted previously. Three basic quantities are used to describe the quality of a chromatic light source; radiance, luminance and brightness. Radiance is a total amount of energy that flows from the light source and it is usually measured in watts (W). Luminance measure in lumen (lm), gives the measure of the amount of energy and observer perceives from light source. Brightness is a subjective descriptor of light perception that is particularly impossible to measure.    


Image Sensing and Acquisition



Depending on the nature of the source, illumination energy is reflected from, or transmitted through, objects. As an example we can say light reflected from planer surface and X ray transmitted through the human body to construct the images. In some applications, the reflected or transmitted energy is focused on to a photo converter, which converts the energy into a visible light. Electron microscopy and some applications on Gamma imaging use this approach.



Image Acquisition using Single Sensor



Figure 2.4 – Single image sensor

Figure 2.4 depicts the main components of the single image sensor. Perhaps the most familiar sensor of this type is the photodiode, which is constructed of silicon materials and whose output voltage waveform is proportional to light. The use of a filter in front of a sensor improves selectivity. For example, a green filter in front of a light sensor favors light in the green band of the color spectrum. As a consequence, the sensor output will be stronger for green light than for other components in the visible spectrum. In order to generate a 2D image using a single sensor, there has to be relative displacements in both the x and y directions between the sensor and the area to be imaged.   

Image Acquisition using Sensor Strips

Schematic of sensor strip is shown in the figure 2.5. Geometry is used much more frequently in this method.   


Figure 2.5 – Line sensor

The strip provides imaging elements in one direction. Motion perpendicular to the strip provides imaging in the other direction as shown in the figure 2.6. This is the type of arrangement used in most flat bed scanners.   


      
Figure 2.6 – Image acquisition using sensor strip

Sensor strips mounted in a ring configuration are used in medical and industrial imaging to obtain cross sectional images of 3D objects. This method was depicted in the figure 2.7. CAT, MRI and PET imaging are used this concept to produce images. Images are not obtained directly from the sensors by motion alone, they require extensive processing. A 3D digital volume consisting of stacked images is generated as the object is moved in a direction perpendicular to the sensor ring.   


Figure 2.7– Image acquisition using circular sensor strip

Image Acquisition using Sensor Arrays

Numerous electromagnetic and some ultrasonic sensing devices frequently are arranged in array format. This is also found in digital cameras. The response of each sensor is proportional to the integral of the light energy projected on to the surface of the sensor, a property that is used in astronomical and other applications requiring low noise images. The key advantage of 2d sensor array is it can be obtained the complete image by focusing the energy pattern onto the surface of the array (figure 2.8). Motion is not necessary in this method.



Figure 2.8 – Image acquisition using sensor arrays


Simple Image Formation Model

We can denote images by two dimensional function of the form f(x,y).
The value or amplitude of the f at spatial coordinates (x,y) is a positive scalar quantity whose physical meaning is determined by the source of the image.
When an image is generated from a physical process, its values are proportional to energy radiated by a physical source (electromagnetic waves). As a consequence, f(x,y) must be non zero and finite; that is,

0<f(x,y) < ∞ --> (1)

The function f(x,y) may be characterized by two components;
  • Illumination – The amount of source illumination incident on the scenebeing viewed ( i(x,y)). 
  • Reflectance – The amount of illumination reflected by the objects in the scene (r(x,y)).

The two functions combine as a product to form f(x,y).

f(x,y) = i(x,y) r(x,y) --> (2)

where;

0 < i(x,y) < ∞ -->  (3) and

0 < r(x,y) < 1 --> (4) ( 0 – total absorption, 1 – total reflectance )

The nature of i(x,y) is determined by the illumination source, and r(x,y) is determined by the characteristics of the imaged objects.

We call the intensity of a monochrome image at any coordinates (xn,yn) the gray level (l) of the image at that point. That is

l = f (x0,y0) --> (5)

From equation (2) through (4), it is evident that l lies in the range

Lmin < l < Lmax--> (6)

In theory, the only requirement of Lmin is that it be positive, and on Lmax that it be finite.
In practice Lmin = imin  rmin  and Lmax = imax rmax
The interval [Lmin, Lmax ] is called the gray scale. Common practice is to shift this interval numerically to the interval [0, L-1], where l=0 is considered black and l = L-1 is considered white on the gray scale. All the intermediate values are shades of gray varying from black to white. 
  
Image Sampling and Quantization



The output of most sensors is a continuous voltage waveform whose amplitude and spatial behaviors are related to the physical phenomenon being sensed. To create a digital image, we need to convert the continuous sensed data into digital form. This involves two processes; sampling and quantization.


Basic concepts in Sampling and Quantization

To convert image to digital form, we have to sample the function in both coordinates and in amplitude. Digitizing the coordinate values is called sampling. Digitizing the amplitude values is called quantization.

The one dimensional function shown in figure 2.9(b) is a plot of amplitude (gray level) values of the continuous image alone the line segment AB in figure 2.9(a). The random variations are due to image noise. To sample this function, we take equally spaced samples alone line AB, as shown in figure 2.9(c). The location of each sample is given by a vertical thick mark in the bottom part of the figure. The samples are shown as small white squares superimposed on the function. The set of theses discrete locations gives the sampled functions. However, the values of samples still span (vertically) a continuous range of gray level values. In order to form a digital function, the gray level values also must be converted (quantized) into direct quantities. The right side of figure 2.9(c) shows the gray level scale divided into eight discrete levels, ranging from black to white. The vertical thick marks indicate the specific values assigned to each of the eight gray levels. The continuous gray levels are quantized simply by assigning one of the eight discrete gray levels to each sample. The assignment is made depending on the vertical proximity of a sample to a vertical thick mark. The digital samples resulting from both sampling and quantization are shown in figure 2.9(d). Starting at the top of the image and carrying out this procedure line by line produces a two dimensional digital image.       



 
Figure 2.9 – Generating a digital image (a) continuous image (b) A scan line from A to B in the continuous image used to illustrate the sampling and quantization (c) sampling and quantization (d) digital scan line

In practice, a method of sampling is determined by the sensor arrangement used to generate image. 

Single sensing element with mechanical motion – The output of the sensor is quantized in the manner described above. Sampling is accomplished by selecting the number of individual mechanical increments at which we activate the sensor to collect data. 

Sensor strip – The number of sensors in the strip establishes the sampling limitations in one image direction. Mechanical motion in the other direction can be controlled more accurately. Quantization of the sensor outputs completes the process of generating a digital image.

Sensing array – In this method there is no motion in image acquisition and number of sensors in the array establishes the limits of sampling in both directions. Quantization the sensor output is as before. Figure 2.10 illustrate this concept. The quality of the digital image is determined to a large degree by the number of samples and discrete gray levels used in sampling and quantization.   




Figure 2.10 – continuous image projection to a sensor array (left) and result of the image sampling and quantization (right) 


Representing Digital Images

The result of sampling and quantization is a matrix of real numbers. Assume that an image f(x,y) is sampled and so that the resulting digital image has M rows and N columns. The values of the coordinates (x,y) now become discrete quantities. We can denote the values of the coordinates at the origin are (x,y) = (0,0). It is important to keep in mind that the notation (0,1) is used to signify the first sample along the first row and so on. It does not mean that these are the actual values of physical coordinates when the image was sampled. Figure 2.11 represents the coordinate convention used in this lesson.




Figure 2.11 – Coordinate convention used to represent the image

The notation introduced in the preceding paragraph allows us to write the complete M * N digital image in the following compact matrix form.

--> (7)

The right side of the equation is by definition a digital image. Each element of this matrix array is called an image element, picture element, pixel or pel. In some discussion, it is advantageous to use a more traditional matrix notation to denote a digital image and its elements;

--> (8)
Cleary f(x,y) and A are identical matrices.

We can express the sampling and quantization in more formal mathematical terms. Let Z and R denote the set of real integers and the set of real numbers, respectively. The sampling process may be viewed as partitioning the xy plane into a grid, with the coordinates of the center of each grid being a pair of elements from the Cartesian product Z2, which is the set of all ordered pairs of elements (Zi,Zj) with Zi and Zj being integers from Z.  Hence f(x,y) is a digital image if f(x,y) are integers from Z2 and f is a function that assigns a gray level value (that is a real number from the set of real numbers, R) to each distinct pair of coordinates (x,y). If the gray levels also are integers, Z replaces R and the digital image then become a 2D function whose coordinates and amplitude values are integers.  

This digitization process requires decisions about values for M, N and for the number, L, of discrete gray levels allowed for each pixel. M and N should be positive and there is no any requirement for it. However due to processing, storage and sampling hardware considerations, the number of gray levels typically is the integer power of 2 as expressed below;

L = 2k --> (9)

We assume that the discrete levels are equally spaced and that they are integers in the interval [0, L-1]. Sometimes the range of values spanned by the gray scale is called the dynamic range of an image, and we refer to images whose gray levels span a significant portion of the gray scale as having a high dynamic range. When an appreciable number of pixels exhibit this property, the image will have high contrast. Conversely, an image with low dynamic range tends to have a dull, washed out gray look.

The number of b of bits required to store a digitized image is

b = M * N * k --> (10)

b = N2 * k --> (11) (when M=N)

Example: If N = 32 and L = 256, what is the value of b?

If L = 256 then k = 8 because L = 2k
Then b = 32 * 32 * 8 = 8192 bits

When an image can have 2k gray levels, it is common practice to refer to the image as a “k bit image”.  For example an image with 256 possible gray level values called 8 bit image.  


Spatial and Gray Level Resolution

Sampling is the principle factor determining the spatial resolution of an image. Basically, spatial resolution is the smallest discernible detail in an image. A widely use definition for resolution is simply the smallest number of discernible line pairs per unit distance; for example, 100 line pairs per millimeter.

When an actual measure of physical resolution relating pixels and the level of detail they resolve in the original scene are not necessary, it is not uncommon to refer to an L level digital image of size M * N as having a spatial resolution of M * N pixels and a gray level resolution of L levels.

Figure 2.12 shows an image of size 1024 * 1024 pixels whose gray levels are represented by 8 bits. The other images shown in figure 2.12 are the result of sub sampling the 1024 * 1024 images. The sub sampling was accomplished by deleting the appropriate number of rows and columns from the original image. For example 512 * 512 image was obtained by deleting every other row and column from the 1024 * 1024 image. The number of allowed gray levels was kept at 256.        

 


Figure 2.12 – A 1024 * 1024, 8 bit image sub sampled down to size 32 * 32 pixels.

Images in figure 2.12 show dimensional proportions between various sampling densities, but their size differences make it difficult to see the effects resulting from a reduction in the number of samples. The simple way to compare these effects is to bring all the sub sampled images up to size 1024 * 1024 by row and column pixel replication. The results are shown in figure 2.13.


 
Figure 2.13 – Resample images up to 1024 * 1024

Next, we keep the number of samples constant and reduce the number of gray levels from 256 to 2, in integer power of 2. Figure 2.14 shows the results of using 452 * 374 CAT projection image displayed with k = 8 (256 gray levels).



Figure 2.14 – Typical effect of varying the number of gray levels in a digital image

The fourth image of figure 2.14 has an imperceptible set of very fine ridge like structure in areas of smooth gray levels ( particularly in the skull). This effect cause by the use of an insufficient number of gray levels in smooth areas of a digital image, is called false contouring, so caused because the ridges resembles topographic contours in map. False contouring is generally is quiet visible in images displayed using 16 or less uniformly spaced gray levels.  

Aliasing


Functions whose area under the curve is finite can be represented in terms of sines and cosines of various frequencies. The sine cosine component with the highest frequency determines the “highest frequency content” of the function. Suppose that this highest frequency is finite and that the function is of unlimited duration (these functions are called band limited functions).

Then the Shannon’s sampling theorem tells us that, if the function is sampled at a rate equal to or greater than twice its highest frequency, it is possible to recover completely the original function from its samples. If the function is under sampled, then the phenomenon called aliasing corrupts the sampled image. The corruption is in the form of additional frequency components being introduced into the sampled function. These are called aliased frequencies. Note that the sampling rate in image is the number of samples taken (in both spatial direction) per unit distance. The principle approach for reducing the aliasing effect on an image is to reduce its high frequency components by blurring the image prior to sampling. However, aliasing is always present in a sampled image. The effect of aliased frequencies can be seen under the right conditions in the form of so called Moir’e patterns. 


Zooming and Shrinking Digital Images

Zooming may be viewed as over sampling and shrinking may be viewed as under sampling. Zooming requires two steps; the creation of new pixel locations, and the assignment of gray levels to those new locations. Suppose that we have an image of size 500 * 500 pixels and we want to enlarge it 1.5 times to 750 * 750 pixels. Conceptually one of the easiest ways to visualize zooming is laying an imaginary 750 * 750 grid over the original image. In order to perform the gray level assignment, for any point in the overlay, we look for the closest pixel in the original image and assign its gray level to the new pixel in the grid. When we are done with all points in the overlay grid, we simply expand it to the original specified size to obtain the zoomed image. This method of gray level assignment is called nearest neighbor interpolation.

Pixel replication is a special case of nearest neighbor interpolation. Pixel replication is applicable when we want to increase the size of an image an integer number of times. For instance, to double the size of an image we can duplicate each column (horizontal direction enlargement) or row (vertical direction enlargement). Although nearest neighbor interpolation is fast, it has the undesirable feature that is produces a check board effect that is particularly objectionable at a high factor of magnification.

More sophisticated way of accomplishing gray level assignment is bilinear interpolation using four nearest neighbors of a point. Let (x’ y’) denotes the coordinates of a point in the zoomed image, and let v(x’,y’)denote the gray level assigned to it. For bilinear interpolation, the assigned gray level is given by;

v(x’,y’) = ax’ + by’ + c(x’y’) + d --> (12)

where a, b, c, d are determined from the four equations in four unknown that can be written using the four nearest neighbors of point (x’y’).  

Image shrinking is done in similar manner as zooming and the equivalent process of pixel replication is row column deletion. For example, to shrink an image by one half, we delete every other row and column.

It is possible to use more neighbors for interpolation. Using more neighbors implies fitting the points with a more complex surface, which generally gives a more smoother results (important in 3D graphics and medical image processing).



Some Basic Relationship between Pixels    


Neighbors of Pixels

A pixel p at coordinates (x,y) has 4 neighbors;

(x+1, y), (x-1, y), (x, y+1), (x, y-1)

These four neighbors are denoted as N4(p)

Four diagonal neighbors of p have coordinates

(x+1, y+1), (x+1, y-1), (x-1, y+1), (x-1, y-1)

And are denoted by ND(p)

Eight neighbors of P denoted as N8(p) =  N4(p) + ND(p)

Adjacency, Connectivity, Regions and Boundaries 

Two pixels are said to be connected, if they are neighbors and if their gray levels satisfy a specified criterion of similarity. Let V be the set of gray level values used to define adjacency. In a binary image V = {1} and in gray scale image V has more elements. For example, in the adjacency of pixels with a range of possible gray level values 0 to 255, set V could be any subset of these 256 values. There are three types of adjacency.

4- adjacency – two pixels p and q with values from V are 4-adjacent if q is in the set N4(p)
8-adjacancy - two pixels p and q with values from V are 8-adjacent if q is in the set N8(p)
m-adjacency (mixed adjacency) - two pixels p and q with values from V are m-adjacent if q is in N4(p)  or q is in ND(p) and the set N4(p) intersect N4(q) has no pixels whose values are from V.

Figure 2.15 represents example of m-adjacency. Left potion of the image represents pixel arrangement and middle image represents pixels that are 8 adjacent and right image represents m-adjacency.  


Figure 2.15 – M- adjacency


Let S represent subset of pixels in an image. Two pixels p and q are said to be connected in S if there exist a path between them consisting entirely of pixels in S. For any pixel p in S, the set of pixels that are connected to it in S is called a connected component of S. If it only has one connected component, then set is called connected set.   

Let R be a subset of pixels in an image. We call R a region of the image if R is a connected set. The boundary (a boarder or contour) of a region R is the set of pixels in the region that have one or more neighbors that are not in R. Normally when we refer to a region, we are referring to a subset of an image, and any pixel in the boundary of the region that happen to coincide with the boarder of the image are included implicitly as part of the region boundary. The boundary of a finite region forms a closed path and is thus a global concept. Edges are formed from pixels with derivative values that exceed a preset threshold. Thus, the idea of an edge is a local concept that is based on a measure of gray level discontinuity at a point. It is possible to link edge points in to edge segments, and sometimes these segments are linked in such a way that correspond to boundaries, but this is not always the case. It is helpful to think of edges as intensity discontinuities and boundaries are closed paths.

Distance Measures

For pixels p, q and z with coordinates (x,y), (s,t) and (v,w), respectively, D is a distance function or metric if

D(p,q) >= 0 ( D(p,q) = 0 iff p=q )
D(p,q) = D (q,p) and
D(p,z) <= D(p,q) + D(q,z)

The Euclidean distance between p and q is defined as

De ( p, q) =  [ ( x - s)2  + ( y-t)2 ] ½ -->(13)  

The D4 distance (also called city block distance) between p and q is defined as

D4 (p, q) = | x-s| + | y-t| --> (14)

The D8 distance between p and q is defined as

D8 (p, q) = max(| x-s| , | y-t|) --> (15)


Linear and Nonlinear Operations

Let H be an operator whose input and output are images. H is said to be linear operator if, any two images f and g and any two scalars a and b.

H ( af +bg )  = aH(f) + bH(g) --> (16)

An operator whose function is to compute the sum of K images is a linear operator. An operator that computes the absolute value of the difference of two images is not. An operator that fails the test of equation (16) is nonlinear. 




Other Important Mathematical Tools for Image Processing

Array versus Matrix Operation

for example consider the following 2 * 2 images


 


Arithmetic Operations

Arithmetic operations between images are array operations which means that arithmetic operations are carried out between corresponding pixel pairs. The four arithmetic operations are denoted as;

s(x,y) = f(x,y) + g(x,y)

d(x,y) = f(x,y) - g(x,y)

p(x,y) = f(x,y) * g(x,y)

v(x,y) = f(x,y) / g(x,y)

It is understood that the operations are performed between corresponding pixel pairs in f and g for x = 0,1,2,...., M-1 and y= 0,1,2, ......, N-1 where, as usual, M and N are the row and column sizes of the images. Clearly, s,d,p and v are images of size M*N also.  

Let g(x,y) denote a corrupted image formed by the addition of noise. ŋ(x,y), to a noiseless image f(x,y); that is,
g(x,y) = f(x,y) +  ŋ(x,y)

where the assumption is that at every pair of coordinates (x,y) the noise is uncorrelated and has zero average value.

The objective of the following procedure is to reduce the noise content by adding a set of noisy images, {gi(x,y)}.  This is a technique used frequently for image enhancement. If the noise satisfies the constraints just stated, it can be shown that if an image g'(x,y) is formed by averaging K different noisy images,

  
An important application of image averaging is in the field of astronomy, where imaging under very low light levels frequently causes sensor noise to render single images virtually useless for analysis. 

A frequent application of image subtraction is in the enhancement of differences between images. Figure 2.16 represents an example for the image subtraction.




Figure  2.16 - (a) Infrared image of the land area (b) Image obtained by setting to 0 the least significant bit of every pixel in (a). (c) Difference of the two images, scaled to the range [0,255] for clarity.

The mask mode radiography is another example for the application of image subtraction in medical imaging. Image difference of this form can be written as;

g(x,y) = f(x,y) - h(x,y)

The net effect of subtracting the mask (h(x,y)) from each sample live image (f(x,y)) is that the areas that are different between f(x,y) and h(x,y) appear in the output image, g(x,y), as enhanced detail. 

An important application of image multiplication (and division) is shading correction. Suppose that an imaging sensor produces images that can be modeled as the product of a "perfect image", denoted by f(x,y), times a shading function, h(x,y); that is, g(x,y) = f(x,y) h(x,y). If h(x,y) is known, we can obtain f(x,y) by multiplying the sensed image by the inverse of h(x,y) (i.e. dividing g by h). If h(x,y) is not known, but access to the imaging system is possible, we can obtain the approximation to the shading function by imaging a target of constant intensity. When the sensor is not available, we often can estimate the shading pattern directly from the image.

Another common use of image multiplication is in masking, also called region of interest (ROI) operations. As shown in image 2.17, it contains multiplying the given image by a mask image that has 1s in the ROI and 0s elsewhere. 



Figure 2.17 - (a) Digital dental X-ray image. (b) ROI mask for isolating teeth with fillings (white corresponds to 1 and black corresponds to 0). (c) Product of (a) and (b).

Set and Logical Operators


References

Gonzalez, R.C., Woods, R.E., 1992. Digital Image Processing, 3rd ed. Addison-Wesley Pub (Sd).