A Simple Model for Sharpness in Digital Cameras – I

The next few posts will describe a linear spatial resolution model that can help a photographer better understand the main variables involved in evaluating the ‘sharpness’ of photographic equipment and related captures. I will show numerically that the combined spectral frequency response (MTF) of a perfect AAless monochrome digital camera and lens in two dimensions can be described as the normalized multiplication of the Fourier Transform (FT) of the lens Point Spread Function by the FT of the (square) pixel footprint, convolved with the FT of a rectangular grid of Dirac delta functions centered at each  pixel, as better described in the article

    \[ MTF_{2D} = \left|(\widehat{ PSF_{lens} }\cdot \widehat{PIX_{ap} })\ast\ast\: \delta\widehat{\delta_{pitch}}\right|_{pu} \]

With a few simplifying assumptions we will see that the effect of the lens and sensor on the spatial resolution of the continuous image on the sensing plane can be broken down into these simple components.  The overall ‘sharpness’ of the captured digital image can then be estimated by combining the ‘sharpness’ of each of them.

The stage will be set in this first installment with a little background.  I will assume that the reader is somewhat familiar with Fourier Transforms and go straight to the point, so if anything is not clear or correct feel free to ask for clarification in the comments below.

A perfect Lens: an Airy Disc

As an example I will use an Airy disc as the impulse response (or PSF) of the lens, that is the continuous diffraction pattern that a distant star would project on the sensing plane after having gone through a perfectly focused lens with a circular relative aperture N of, say,  f/16 and no aberrations whatsoever:

Figure 1. The Airy Diffraction Point Spread Function for light of 530nm average wavelength and an f/16 relative aperture, zoomed to show it in detail.

Figure 1 represents the f/16 Airy disc in the spatial domain for green light at an average wavelength (\lambda) of 530nm.  Figure 1a to the left is a 3D rendition; and 1b to the right the same image shown in two dimensions as it would appear on the sensing plane of a camera.  Yes, the rings are there even though they are hard to see with a linear scale.  Ignore pixelation and pretend that the PSF is continuous at this stage.  It is computed according to the theoretical Airy irradiance distribution of a point source:

(1)   \begin{equation*} E(\rho) = \left|\frac{2J_1(\pi\rho)}{(\pi\rho)}\right|^2 \end{equation*}

with \rho the linear radius r from the origin normalized for wavelength and relative aperture: \rho = \frac{r}{\lambda N}, with r = \sqrt{x^2+y^2}.

A Perfect Pixel: a Square Aperture and a Dirac Delta

The pixels in our example are photodiodes arranged in a square grid on the sensing plane.  They are perfectly square, with perfect microlenses and no Anti Aliasing or other filters.  They are contiguous with no gaps, behaving as if they have an effective 100% fill factor .  They are five linear units on a side and monochrome (for now).  Their centers are therefore 5 units apart, which means that sampling (pixel) pitch is also 5.  Absolute units are irrelevant to this discussion but you can think of a linear unit as one micron if it helps.

The continuous 2D Airy PSF image intensity above is projected onto the sensing plane where this sensor and its pixels reside.  The left image below represents the ‘continuous’ 2D Airy Disc characterized earlier, zoomed in.  Pretend that the left image is not pixelated and that its relative brightness indicates the number of photons that impinge on that portion of the sensing plane during Exposure.  The image to the right represents the relative size of actual pixels of the sensor underlying the continuous image. The relative intensity captured by each pixel is the sum (the integral) of all of the photons from the continuous image that fall within its area (a pixel covers an area 5×5 the visible pixelation of Figure 2a to the left).

Figure 2. The left image (2a) represents ‘continuous’ lens PSF intensity corresponding to the number of photons arriving during exposure on the sensing plane. The right image (2b) represents the number of photons collected by the corresponding pixel on the underlying sensor. Zoomed 3200% and 16000% respectively (the latter 5x as much because one of its pixels corresponds to 5 of the former on the side).

The operation of photon integration within a pixel is a convolution of the continuous image with the pixel’s square area (also referred to as a pixel’s footprint or aperture).   It is performed by every pixel at its location on the sensor as it collects arriving photons during Exposure, thereby sampling by its aperture the continuous image at pixel pitch spacing for the extent of the sensor.

Sampling in 2D: Convolutions and Multiplications

So a digital camera sensor with perfect, contiguous square pixels performs two functions in the spatial domain in this perfect example:

  1. samples the image at pitch spacing at coordinates corresponding to the center of every pixel; and
  2. performs a convolution of image intensity over each pixel’s effective area, its aperture.

In this example the sampling is performed at the very center of each pixel by a point with zero area, indicated above by a red dot; mathematically this is expressed by a Dirac delta function.  The set of all delta functions at the center of each pixel as arranged on the sensor forms a rectangular grid of ‘red dots’ (a 2D comb or  lattice).

Including our perfect lens we can therefore express the continuous image sampling process mathematically in the spatial domain as follows:

(2)   \begin{equation*} I_{sampled} = I_{cont} \ast\ast [ PSF_{lens} \ast\ast PIX_{ap} \cdot \delta\delta_{grid}] \end{equation*}

The sampled image I_{sampled} is the result of convolution of the continuous image I_{cont} with the lens PSF and pixel aperture multiplied by the  Dirac delta grid – all in two dimensions.  The terms in the square brackets above represent the 2D impulse response of the imaging system as a whole (the system’s PSF), that is the  impact that the digital camera and lens have on the otherwise perfect continuous image that would be present on the sensing plane.

System Response in the Frequency Domain

Because of the Linear Shift Invariance assumption below we can Fourier Transform each of the factors individually and combine them per equation (2) to obtain this simplified system’s response in the frequency domain.  Recall that convolutions in the spatial domain become multiplications in the frequency domain and vice versa.   And that the magnitude of the normalized Fourier transform of the system’s impulse response is its MTF, so for this example:

(3)   \begin{equation*} MTF_{Sys2D} = \left|(\widehat{ PSF_{lens} }\cdot \widehat{PIX_{ap} })\ast\ast\: \delta\widehat{\delta_{pitch}}\right|_{pu} \end{equation*}

The hat symbols indicate the Fourier Transform of the specified function, {pu} normalization to one at the origin.  The three functions in our example are easily modeled in the frequency domain as shown below:

A) The Fourier Transform of the PSF of a perfect lens (the Airy pattern in equation 1) is

(4)   \begin{equation*} MTF_{\text diff} = \frac{2}{\pi}[\arccos(s)-s\sqrt{1-s^2}] \end{equation*}

with s the linear spatial frequency f[1] normalized for extinction: s = \frac{f}{\lambda N}.  The MTF is zero when s is greater than 1. f is the same spatial frequency as in B) and C).  It has no suffix because the resulting MTF is rotationally symmetric so the response is independent of direction.

B) The pixel aperture area is a square so its Fourier Transform is a separable product of linear sinc functions in the horizontal (f_{x}) and vertical (f_{y}) spatial frequency directions, with w the width of a pixel which, being square, is the same in both directions:

(5)   \begin{equation*} MTF_{PIX} = \left|\frac{sin(\pi f_{x} w)}{\pi f_{x} w}\right|\left|\frac{sin(\pi f_{y} w)}{\pi f_{y} w}\right| \end{equation*}

C) And the separable Dirac delta function grid in the spatial domain gets Fourier transformed into a Dirac delta grid at cycles per pixel-pitch spacing in the frequency domain.  Spacing in digital imaging is typically the same in the horizontal and vertical directions, therefore so is pitch:

(6)   \begin{equation*} MTF_{\delta\delta} = \frac{1}{|pitch^2|}comb(\frac{f_x}{pitch})comb(\frac{f_y}{pitch}) \end{equation*}

All in two dimensions.  So with equations (4), (5) and (6) we can model the performance of the perfect imaging system in the frequency domain.  In the next post we will take a closer look at these functions, then simplify things somewhat by extracting and evaluating directional one-dimensional slices of the two dimensional results.  But first let’s get a couple of key underlying assumptions out of the way.

MTF and Linear Shift Invariance

In this article and the ones that follow ‘sharpness’ will refer to the objective quantitative spatial resolution information stored in the raw file of a digital camera, not the subjective qualitative version after demosaicing and rendering (post) processing.  It will be evaluated through the familiar Modulation Transfer Function (MTF) of the imaging system expressed in physical units of line pairs or cycles per meaningful parameter (e.g. lp/mm, lp/ph, cycles per pixel, etc.).  See here for a description of the various units and their typical uses.

The first assumption is that the imaging system is Linear and Shift Invariant (LSI).  This assumption is needed in order to use the linearity and superposition properties of transfer functions which allow to break down a complex system into smaller pieces easier to deal with individually.

Linear means twice the intensity, twice the response: images captured in the raw data of commercially available digital cameras can be considered to fall into this category.  Shift invariant means that the response of the system to a given stimulus should be exactly the same no matter where we look on the sensor.  This second part is definitely not true in typical digital cameras because of their edgy square pixels and rectangular sampling grid.  However, we can typically set up the system to appear to be shift invariant locally within a certain range of operation – allowing the measurement of its MTF at various key spots around the field of view.

Notes and References

1. For a description of the units of linear spatial resolution see this article.
2. Thanks go to Alan Robinson for taking the time to explain this subject to me a couple of years ago.
3. Thanks also go to Frans van den Bergh for MTF Mapper and his excellent blog.