A Simple Model for Sharpness in Digital Cameras – I

The next few posts will describe a linear spatial resolution model that can help a photographer better understand the main variables involved in evaluating the ‘sharpness’ of photographic equipment and related captures.   I will show numerically that the combined spectral frequency response (MTF) of a perfect AAless monochrome digital camera and lens in two dimensions can be described as the magnitude of the normalized product of the Fourier Transform (FT) of the lens Point Spread Function by the FT of the pixel footprint (aperture), convolved with the FT of a rectangular grid of Dirac delta functions centered at each  pixel:

    \[ MTF_{2D} = \left|\widehat{ PSF_{lens} }\cdot \widehat{PIX_{ap} }\right|_{pu}\ast\ast\: \delta\widehat{\delta_{pitch}} \]

With a few simplifying assumptions we will see that the effect of the lens and sensor on the spatial resolution of the continuous image on the sensing plane can be broken down into these simple components.  The overall ‘sharpness’ of the captured digital image can then be estimated by combining the ‘sharpness’ of each of them.

The stage will be set in this first installment with a little background and perfect components.  Later additional detail will be provided to take into account pixel aperture and Anti-Aliasing filters.  Then we will look at simple aberrations.  Next we will learn how to measure MTF curves for our equipment, and look at numerical methods to model PSFs and MTFs from the wavefront at the aperture.

In this article I will assume that the reader is somewhat familiar with Fourier Transforms and go straight to the point, so if anything is not clear or correct feel free to ask for clarification in the comments below.

Why Modulation Transfer Function?

The vast majority of evaluations of an imaging system’s perceived ‘sharpness’ are one figure metrics condensed from a measure of its Modulation Transfer Function (MTF).   In photography such metrics have names like MTF20, MTF50, SQF, SQRI, Acutance, etc.

The MTF curve  upon which those metrics are based represents the portion of energy  that the system is able to pass through it at increasing spatial frequencies.  The size of repeating detail that can be resolved in an image is inversely proportional to its spatial frequency.  So an MTF curve tells us how well different detail sizes can be captured in the raw data and subsequently reproduced by our equipment.  It represents the system’s Spatial Frequency Response, aka its Energy Spectrum.

Figure 1. Actual MTF curve measured from a slanted edge in the center of the image as captured in the raw data by Erik Kahfer. The spatial frequencies on the horizontal axis are in cycles per pixel pitch (see this article for how to convert them to other units like lp/mm). The monotonically decreasing System curve (solid line) is the element-by-element product of the lens and pixel MTF curves (dashed).

How good are a given camera and lens at capturing image detail of varying size?   Where the MTF curve has a value close to one it means that the camera and lens can pass almost 100% of detail that size to the raw data (e.g. some text will be crisply legible), while zero means that all detail will be lost (e.g. it will look like gray mush).

As we will see, the MTF curve resulting from raw data captured from in-spec photographic kit with good technique near its sharpest apertures decreases monotonically from a value of one (100%, MTF100) at zero frequency, say a uniform area with no detail, to a value of zero (0%, MTF0) at higher spatial frequencies, say at diffraction extinction.  Sometimes the spatial frequency at which MTF is degraded by 50% (MTF50) may be of interest.

Beyond the frequency of the first null the MTF curve may continue bouncing around if it was caused by an Anti-Aliasing filter, Effective Pixel Aperture or aberrations; or the curve will show no energy (MTF0) if the null was caused by diffraction extinction, the highest spatial frequency physically recordable by the system, a function of relative aperture and the wavelength of light (1/\lambda N).  If some of these terms are unfamiliar don’t worry, they will become clearer as you read along.

A perfect Lens: an Airy Disc

As an example of possible detail on an image I will use an Airy pattern, the impulse response (or PSF) of the lens, that is the continuous diffraction pattern that a distant star would project on the sensing plane after having gone through a perfectly focused lens with a circular relative aperture N of, say,  f/16 and no aberrations whatsoever.  The imaging system cannot physically ‘see’ detail smaller than this:

airy
Figure 2. The Airy Diffraction Point Spread Function for light of 530nm average wavelength and an f/16 relative aperture, zoomed to show it in detail.

Figure 2 represents the f/16 Airy pattern in the spatial domain for green light at an average wavelength (\lambda) of 530nm.  Figure 2a to the left is a 3D rendition; and 1b to the right the same image shown in two dimensions as it would appear on the sensing plane I(x,y) of a camera.  The brightness of the rings should show about correctly assuming your monitor has a 2.2 gamma.  Ignore pixelation and pretend that the PSF is continuous at this stage.  It is computed according to the theoretical intensity distribution on the imaging plane of a point source that has gone through an unaberrated circular aperture:

(1)   \begin{equation*} PSF_{lens} = E(\rho) = \left|\frac{2J_1(\pi\rho)}{(\pi\rho)}\right|^2 \end{equation*}

with J_1 a first order Bessel function of the first kind, \rho the linear radius r from the optical axis on the imaging plane, normalized for wavelength and relative aperture: \rho = \frac{r}{\lambda N}, with r = \sqrt{x^2+y^2}.  This version of the Airy pattern is normalized so that peak intensity is equal to 1.

A Perfect Pixel: a Square Aperture and a Dirac Delta

The ‘pixels’ in our example are photodiodes arranged in a square grid on the sensing plane.  They are perfectly square, with perfect microlenses and no Anti Aliasing or other filters.  They are Back Side Illuminated, contiguous with no gaps, 100% fill factor .  They are five linear units on a side and monochrome (for now).  Their centers are therefore 5 units apart horizontally and vertically, which means that sampling (pixel) pitch is also 5.  Absolute units are irrelevant to this discussion but you can think of a linear unit as one micron if it helps.

The continuous 2D Airy PSF image intensity above is projected onto the sensing plane where this sensor and its pixels reside.  The left image below represents the ‘continuous’ 2D Airy pattern characterized earlier, zoomed in to show just the disk.  Pretend that the left image is not pixelated and that its relative brightness indicates the number of photons that impinge on that portion of the sensing plane during Exposure.  The image to the right represents the relative size of actual pixels of the sensor underlying the continuous image. The relative intensity captured by each pixel is the sum (the integral) of all of the photons from the continuous image that fall within its area while it is Exposed (a pixel covers an area 5×5 the visible pixelation of Figure 3a to the left).

Footprint-PSF3
Figure 3. The left image (2a) represents ‘continuous’ lens PSF intensity corresponding to the number of photons arriving during exposure on the sensing plane. The right image (2b) represents the number of photons collected by the corresponding pixel on the underlying sensor. Zoomed 3200% and 16000% respectively (the latter 5x as much because one of its pixels corresponds to 5 of the former on the side).

The operation of photon integration within a pixel is a convolution of the continuous image with the pixel’s square area (also referred to as a pixel’s footprint or aperture).   The resulting space and time averaging acts as a low-pass function.  It is performed by every pixel at its location on the sensor as it collects arriving photons during Exposure, thereby sampling through its area the continuous image at pixel pitch spacing for the extent of the sensor.

Sampling in 2D: Convolutions and Multiplications

So a digital camera sensor with perfect, contiguous square pixels performs two functions during Exposure in the spatial domain in this perfect example:

  1. samples the image at even spacing at coordinates corresponding to the center of every pixel, typically laid out in a rectangular grid, with horizontal/vertical spacing of the pixels referred to as pixel pitch (p); and
  2. performs a convolution of image intensity over each pixel’s area, defined in this context as a square pixel aperture of width w.

In this example the sampling is performed at the very center of each pixel by a point with zero area, indicated above by a red dot.  The dots are one pixel pitch apart horizontally and vertically.  Mathematically, the zero area dot is a Dirac delta function.  The set of all delta functions at the center of each pixel as arranged on the sensor forms a rectangular grid of ‘red dots’ (a 2D comb or deltas on a  lattice).

The area of the pixel in this example, pixel aperture, is represented by a square of width w.  In this case w is equal to pixel pitch p because we have said that the square pixel is perfect.  If the pixel were not perfect, say as a result of inefficient microlenses, then its linear fill factor would be less than 100% and consequently aperture w^2 would be less than p^2.

Including our perfect lens we can therefore express the continuous image sampling process mathematically in the spatial domain as follows:

(2)   \begin{equation*} I_{sampled} = [I_{geom} \ast\ast PSF_{lens} \ast\ast PIX_{ap}] \cdot \delta\delta_{grid} \end{equation*}

The sampled image I_{sampled} is the result of 2D convolution (**) of the geometrical image on the sensing plane I_{geom} with the lens PSF and pixel aperture multiplied by the  Dirac delta grid – all in two dimensions.  The terms in the square brackets above represent the continuous image predicted by geometrical optics.

System Response in the Frequency Domain

Because of the Linear Shift Invariance assumption described later we can Fourier Transform each of the factors individually and combine them per equation (2) to obtain this simplified system’s response in the frequency domain.  Recall that convolutions in the spatial domain become multiplications in the frequency domain and vice versa.   And that the magnitude of the normalized Fourier transform of the system’s impulse response is its MTF, so for this example:

(3)   \begin{equation*} MTF_{Sys2D} = \left|\widehat{ PSF_{lens} }\cdot \widehat{PIX_{ap} }\right|_{pu} \ast\ast \delta\widehat{\delta_{pitch}} \end{equation*}

The hat symbols indicate the Fourier Transform of the specified function, {pu} normalization to one at the origin.  The three functions in our example are easily modeled in the frequency domain as shown below:

A) The Fourier Transform of the PSF of a perfect lens of circular aperture (the Airy pattern in equation 1) is

(4)   \begin{equation*} MTF_{\text diff} = \frac{2}{\pi}[\arccos(s)-s\sqrt{1-s^2}] \end{equation*}

with s the linear spatial frequency f_r[1] normalized for Diffraction Extinction: s =f_r\cdot\lambda N.  The MTF is taken to be zero beyond extinction, that is when s is greater than 1. f_r is measured radially from the center of the image with f_r=\sqrt{f_x^2+f_y^2}.  It is rotationally symmetric.

B) The pixel aperture area is ideally a square so its Fourier Transform is a separable product of linear sinc functions in the horizontal (f_{x}) and vertical (f_{y}) spatial frequency directions, with w the width of a pixel which, being square, is the same in both directions:

(5)   \begin{equation*} MTF_{PIX} = \left|\frac{sin(\pi f_{x} w)}{\pi f_{x} w}\right|\left|\frac{sin(\pi f_{y} w)}{\pi f_{y} w}\right| \end{equation*}

C) And the separable Dirac delta function grid in the spatial domain gets Fourier transformed into a Dirac delta grid at cycles per pixel-pitch spacing in the frequency domain.  Spacing in digital imaging is typically the same in the horizontal and vertical directions, therefore so is pixel pitch (p):

(6)   \begin{equation*} MTF_{\delta\delta} = comb(p f_x)comb(p f_y) \end{equation*}

All in two dimensions.  So with equations (4), (5) and (6) we can model the performance of the perfect imaging system in the frequency domain.

In the next post we will take a closer look at these functions, then simplify things somewhat by extracting and evaluating directional one-dimensional slices of the two dimensional results.  But first let’s get a couple of key underlying assumptions out of the way.

MTF and Linear Shift Invariance

In this article and the ones that follow ‘sharpness’ will refer to the objective quantitative spatial resolution information stored in the raw file of a digital camera, not the subjective qualitative version after demosaicing and rendering (post) processing.  It will be evaluated through the familiar Modulation Transfer Function (MTF) of the imaging system expressed in physical units of line pairs or cycles per meaningful parameter (e.g. lp/mm, lp/ph, cycles per pixel pitch, etc.).  See here for a description of the various units and their typical uses.

The first assumption is that the imaging system is Linear and Shift Invariant (LSI).  This assumption is needed in order to use the linearity and superposition properties of transfer functions which allow to break down a complex system into smaller pieces easier to deal with individually.

Linear means twice the intensity, twice the response: images captured in the raw data of commercially available digital cameras can be considered to fall into this category.  Shift invariant means that the response of the system to a given stimulus should be exactly the same no matter where we look on the sensor.  This second part is definitely not true in typical digital cameras because of their edgy square pixels and rectangular sampling grid.  However, we can typically set up the system to appear to be shift invariant locally within a certain range of operation – allowing the measurement of its MTF at various key spots around the field of view.

Notes and References

1. For a description of the units of linear spatial resolution see this article.
2. Thanks go to Alan Robinson for taking the time to explain this subject to me a couple of years ago.
3. Thanks also go to Frans van den Bergh for MTF Mapper and his excellent blog.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.