A Simple Model for Sharpness in Digital Cameras – Aliasing

Having shown that our simple two dimensional MTF model is able to predict the performance of the combination of a perfect lens and square monochrome pixel we now turn to the effect of the sampling interval on spatial resolution according to the guiding formula:

(1)   \begin{equation*} MTF_{Sys2D} = \left|(\widehat{ PSF_{lens} }\cdot \widehat{PIX_{ap} })\ast\ast\: \delta\widehat{\delta_{pitch}}\right|_{pu} \end{equation*}

The hats in this case mean the Fourier Transform of the relative component normalized to 1 at the origin (_{pu}), that is the individual MTFs of the perfect lens PSF, the perfect square pixel and the delta grid.

Sampling in the Spatial and Frequency Domains

Sampling is expressed mathematically as a Dirac delta function at the center of each pixel (the red dots below).

Figure 1. Left, 1a: A highly zoomed (3200%) image of the lens PSF, an Airy pattern, projected onto the imaging plane where the sensor sits. Pixels shown outlined in yellow. A red dot marks the sampling coordinates. Right, 1b: The sampled image zoomed at 16000%, 5x as much, because each pixel’s width is 5 linear units on the side.

Recall from part II that the original image is square, 1024 linear units long on the side (think of the units as microns if it helps).  It is sampled by square pixels 5 linear units on the side  therefore the resulting sampled image is made up of 204×204 pixels.  In such a case sampling would be represented in the frequency domain by a 204×204 square grid of delta functions at cycle/pixel-pitch spacing.  This is also known as a lattice or a two dimensional comb function.  The two dimensional Dirac delta comb and its Fourier transform are expressed as follows in the spatial (left) and frequency domains (right):

(2)   \begin{equation*} comb(ax)comb(by) \Leftrightarrow \frac{1}{|ab|}comb(\frac{f_x}{a})comb(\frac{f_y}{b}) \end{equation*}

In typical monochrome photographic sensors  a and b represent pixel pitch and are the same – 5 linear units in this example.  Also from the previous article, the combined MTF of the perfect lens and square pixel aperture produce this (shifted) two dimensional response:

Sampled Diff+Square with Slice
Figure 2. The combined two dimensional MTF of Lens and Pixel area (2a).  A horizontal slice of the two dimensional MTF (2b)

Convolving in two dimensions the rectangular delta lattice one cycle/pixel apart with the tent-looking combined lens+pixel area MTF in Figure 2a produces 204×204 such tent-like MTFs,  each centered on a delta function in the grid:

Figure 3.  The two dimensional Lens and Pixel Aperture MTF of Figure 2, convolved with the sampling Dirac delta function lattice of equation (2) in the frequency domain.

Modeling Aliasing

It is clear that if the reference 2D MTF has some energy above 0.5 c/p (the Nyquist-Shannon frequency) it will start interfering with its neighbours once convolved with the delta sampling grid: the center of each solid is only 1 c/p away.  This interference is called aliasing and it becomes more obvious when viewing Figure 3 in profile, projected against the X-Z plane:

Convolution PSF and Dirac grid Profile
Figure 4.  Horizontal profile of the 2D MTFs in Figure 3.

We are normally used to seeing this information in the 0-1 c/p range only, as shown in Figure 2b above.  Note how energy at spatial frequencies above 0.5 c/p intermingle with those of their neighbours.  The result is that they are able to sneak back below Nyquist under an ‘alias’, masquerading as lower frequencies.  Ignoring phase in our simple model and because the convolved MTFs look like mirror images of each other, frequencies higher than Nyquist can be thought of as ‘folding’ back around 0.5 c/p.  Once this happens it is impossible to tell the real low frequencies from the folded aliased ones, which are then free to produce undesirable aliasing effects like stair stepping and moiré in the final photograph.

Aliasing is the reason why we see that uptick near 0.5 c/p in the system MTF horizontal radial slice shown in Figure 2b.  If we ignore phase, it can be modeled by folding aliased frequencies above Nyquist back towards the origin and adding them to the unaliased model, as shown in Figure 5.

Modeling Alias
Figure 5. Ignoring phase, Aliasing can be modeled by folding frequencies above Nyquist back towards the origin and adding them to the unaliased model there. ‘Measured’ is a horizontal radial slice off the actual 2D Discrete Fourier Transform of the sampled image, as also seen in Figure 2.

Anti Aliasing

The negative effect of aliasing can be controlled by filtering the original signal before sampling to limit energy captured above the Nyquist frequency.  This is the job of the Anti Aliasing filter that we will discussed in a future article.  There is always a trade-off because no filter is  perfect, so reducing the impact of frequencies above 0.5 c/p means necessarily also lowering some good frequencies below that – and perceived ‘sharpness’ with them.

Another way to reduce aliasing all other things being equal is to increase the sampling rate.  One of the properties of spatial-frequency duality is that narrower features in one domain become larger ones in the other (the a and b factors of equation 2 are in the numerator in one and in the denominator in the other).  Sampling at a smaller pitch spreads apart the 2D Dirac deltas in the frequency domain grid, therefore pushing the convolved MTF solids further away, reducing the chance of aliasing.

Figure 4  also shows clearly why in order to be able to recover a signal perfectly it is necessary for contiguous MTF solids not to overlap, which means filtering away frequencies above the maximum desirable spatial frequency and sampling at least at twice that rate.  This is known as the Whittaker-Shannon sampling theorem.

Directional Slices of 2D MTFs

We know that the two dimensional system MTF of the sampled image is not rotationally symmetric because in typical photographic sensors the sampling grid has a rectangular (or square) layout.

Neighbours above and below or to the left and right are normally one cycle per pixel apart.  Those diagonally across are however further away, \sqrt{2} c/p apart.  This suggests that in a typical photographic sensor there is less chance of aliasing when the detail being evaluated is in the 45 degree direction with respect to its origin, as can be gleaned from this central cutout of the 2D MTF in Figure 3:

Convolution PSF and Dirac grid Diagonal
Figure 6.  Detail of the two dimensional system MTF of the monochrome photographic sensor  in Figure 3.

In our f/16 example it can be seen that there is aliasing overlap in the horizontal (x-axis, c/p) or vertical (y-axis, c/p) directions, say going from (0,0) to (1,0) but that there is no overlap diagonally, say from (0,0) to (1,1).  This is quite obvious when plotting the two respective directional MTF radial slices on the same graph:

MTF of Sampled Perfect Lens and Pixel
Figure 7. Linear MTF in the horizontal and 45 degree direction of the perfect monochrome sensor.

Therefore in this simple monochrome example there is aliasing in the vertical and horizontal direction, but not diagonally.

The Simplified Perfect Model

This concludes numerical verification of our simple 2D MTF model. Ignoring phase, it works as advertized in its current form based on a perfect lens and monochrome square-pixeled sensor:

(3)   \begin{equation*} \begin{align*} MTF_{Sys2D} &= (\frac{2}{\pi}[\arccos(s)-s\sqrt{1-s^2}]\\ &\times |\frac{sin(\pi f_{x} w)}{\pi f_{x} w}||\frac{sin(\pi f_{y} w)}{\pi f_{y} w}|)\\ &**\: comb(\frac{f_x}{pitch})comb(\frac{f_y}{pitch}) \end{align*} \end{equation*}

with s the linear spatial frequency f normalized for extinction: s = \frac{f}{\lambda N}; f_{x} and f_{y} the horizontal and vertical spatial frequency components; w the effective linear size of a perfect square pixel on the side also known as linear fill factor; pitch the linear spacing of the centers of pixels as laid out on a rectangular grid in the sensor; and ** two dimensional convolution.

Note that effective square pixel width (w) and spacing (sampling pitch) are separate variables.  We often take them to have the same value in classic photographic digital sensors – therefore implicitly assuming a perfect microlens producing an effective 100% pixel fill factor.  But they can be made to have different values to simulate other scenarios.  For instance one could have w = 4um and pitch = 8um to simulate an 8um pixel with 25% fill factor; or alternatively to simulate using every other pixel in a sensor with perfect contiguous 4um pixels.  Or one could have round pixels of diameter 2w at pitch spacing (of course in that case the sinc functions used for a pixel in the shape of a square above would have to be replaced by the equivalent for a pixel in the shape of a circle, a besinc function).

We can use this simplified model to start answering questions about the effects of diffraction and pixel size on the spatial resolution performance of our photographic equipment.  Next we will add a couple of additional components to the model and test it against real captures.


*The units of spatial frequency f are described in detail in this article.