A Simple Model for Sharpness in Digital Cameras – Diffraction and Pixel Aperture

Now that we know from the introductory article that the spatial frequency response of a typical perfect digital camera and lens (its Modulation Transfer Function) can be modeled simply as the product of the Fourier Transform of the Point Spread Function of the lens and pixel aperture, convolved with a Dirac delta grid at cycles-per-pixel pitch spacing

(1) $\begin{equation*} MTF_{Sys2D} = \left|\widehat{ PSF_{lens} }\cdot \widehat{PIX_{ap} }\right|_{pu}\ast\ast\: \delta\widehat{\delta_{pitch}} \end{equation*}$

we can take a closer look at each of those components ( $pu$ here indicating normalization to one at the origin). I used Matlab to generate the examples below but you can easily do the same with a spreadsheet.

Diffraction’s MTF

The lens in our simple example is assumed to have circular aperture and to be diffraction limited with no aberrations, therefore just producing an Airy pattern on the sensing plane as defined in the last article. At f/16 the pattern would look as shown in Figure 1 of the last post, a tiny white dot at the center of a 1024×1024 linear unit black image. It can be modeled according to the following continuous equation in the frequency domain:

(2) $\begin{equation*} MTF_{\text diff} = \frac{2}{\pi}[\arccos(s)-s\sqrt{1-s^2}] \end{equation*}$

with $s$ the linear spatial frequency $f$ normalized so that it is unity at extinction: $s = f \cdot\lambda N$ . The MTF is zero when $s$ is greater than 1. This is what it looks like as a solid and in profile.

Airy MTF 25D — Figure 1. Left: Spatial Frequency Response of Airy Disk with f-number = 16, lambda = 0.53 units and pixel pitch = 5 units. Right: Projection of the figure at left onto the X-Z plane.

The graphs show MTF on the Z axis and the spatial resolution in cycles per pixel pitch (c/p) on the X and Y axes. As mentioned earlier the perfect pixel here is considered to be a square 5 linear units on the side. It doesn’t really matter but think of a unit as one micron if it helps understanding (more on this later).

If we perform a Discrete Fourier Transform of the f/16 Airy pattern image in Figure 1 of the previous article we get the same result as applying Equation (2), indicating that the formula is correct and that Matlab’s FFT routine works accurately. Below is the horizontal central radial slice of the resulting two dimensional MTF: the ‘Measured’ dots are from the DFT while the ‘Modeled’ blue line is from Equation (2). The plot is similar to the graph in Figure 1b but showing positive frequencies only, zoomed into the area of detail that we are normally used to seeing when evaluating MTF curves, the 0->1 c/p range. Thanks to the Projection Slice Theorem we know that it represents the one dimensional MTF of the projection of the corresponding PSF in that one direction only:

Airy SFR Model vs Measured — Figure 2. Radial slice of 2D MTF of perfect unaberrated lens of circular aperture, just diffraction.

They are identical. Because the Airy disc is rotationally symmetric, the one dimensional MTF profile will be the same no matter what radial slice out of the 2D MTF we look at. This graph shows numerically that the ideal circular aperture lens MTF matches that obtained by taking the Discrete Fourier Transform of the image of its Point Spread Function on the sensing plane.

This for a perfect lens of circular aperture and no aberrations. For the MTF of a lens of circular aperture with simple aberrations see this article.

Pixel Aperture’s MTF

We can perform the same exercise by comparing the FT and DFT of pixel aperture area, aka the pixel’s effective footprint. As discussed in the previous article, the PSF of a perfect square can be modeled in the frequency domain as

(3) $\begin{equation*} MTF_{PIX} = \left|\frac{sin(\pi f_{x} w)}{\pi f_{x} w}\right|\left|\frac{sin(\pi f_{y} w)}{\pi f_{y} w}\right| \end{equation*}$

with $f_{x}$ and $f_{y}$ the horizontal and vertical spatial frequency components and $w$ the width of the square effective aperture. Equation (3) is shown below as a solid on the left; and on the right its X-Z axis profile, as before.

Square MTF 25D — Figure 3. 2D MTF of perfectly square pixel aperture and its hotizontal/vertical projection.

A horizontal radial slice from the origin with no vertical frequency component can be modeled in one dimension as follows:

(4) $\begin{equation*} MTF_{PIXh} = |sinc(fw)|= \left|\frac{sin(\pi f w)}{\pi f w}\right| \end{equation*}$

with $f$ * the linear spatial frequency in the horizontal direction. It would of course look the same in the vertical direction, with no horizontal component.

We want to compare this analytical model to the computed Discrete Fourier Transform of the 5×5 linear unit square footprint of the perfect pixel under investigation. To do so we generate an image of the same size as the Airy pattern image (1024×1024 linear units), fill it with zeros everywhere except in the center where a 5×5 square area of ones is inserted, and take the two dimensional DFT of that. The plot below shows a horizontal radial slice from the analytical model in Equation (4) (‘Modeled’) and from the numerical result (‘Measured’) after correction for the fact that the latter is not bandwidth limited (to learn more about why this correction is needed see the Appendix at the bottom of the article)

Figure 4. Horizontal Radial Slice of 2D MTF of perfect square pixel aperture.

We can’t see the blue modeled line hidden under the orange measured dots because the curves are identical. Good, it works.

Modeling Lens and Pixel Aperture Together

The model predicts the ‘measurements’ obtained from each component in isolation. We can check how well it works on the components together by applying equation (1). We can generate a modeled 2D MTF by multiplying together equations (2) and (3) and compare the result to the 2D Discrete Fourier Transform of the image of the Airy disc multiplied point by point by the 2D DFT of the image of the 5×5 linear unit pixel aperture square.

Because diffraction at f/16 is dominant in this example the result appears to be rotationally symmetric, but it is not exactly so as can be gleaned by looking at the minimal differences below extinction (about 0.55 c/p) in Figure 5. At larger relative apertures the differences would be more marked.

SFR Product 25D — Figure 5. MTF of pixel aperture and diffraction together, 2D rendition and horizontal/vertical projection.

We can check theory against practice as before by plotting a radial slice of the MTF of the Lens times the MTF of the Pixel Aperture solid shown in Figure 6. We can work in one dimension by multiplying diffraction from equation (2) by a single sinc(f*w) function for the horizontal slice ‘Model’; and since we are assuming a perfect square aperture with 100% Fill Factor, by sinc(f*w)^2 for the diagonal slice, as shown below.

Product Slice — Figure 6. Horizontal/Vertical radial slice of 2D MTF of perfectly square pixel aperture interacting with irradiance from an unaberrated lens.

The curves are all on top of each other, indicating success, the simple model works. Of course real lenses are not unaberrated and real pixels are not perfectly square. You can read more about basic aberrations in the center of the lens starting here and some nuances of practical pixels below.

Asymmetric Squares

In this simple model series we have assumed pixels with perfectly square, uniform aperture per Equations (2) and (3) . It is a subjective assumption, with some basis in reality since I understand that these days sensor designers strive for square active areas and gapless microlenses [EDIT: no longer necessarily true today as shown by Figure 10].

However the response of a square is not rotationally symmetric. Its edgy pixels affect captured image spatial frequencies differently depending on the orientation of the detail, compared to that of the pixels, on the sensor. For instance if we had taken an MTF radial slice 45 degrees from the horizontal or vertical axis alone this would have been the result with a perfect 100% fill factor square pixel:

Square SFR Model vs Measured 45 deg — Figure 7. MTF of a perfect square pixel aperture in the direction 45 degrees from the horizontal or vertical. Pixel pitch (hence the x-axis) hasn’t changed but pixel aperture now is wider by a factor of √2.

The blue line of the model is once again invisible under the measured dots, indicating that the analytical and numerical results match perfectly. A 45 degree diagonal radial slice from the origin means that horizontal and vertical frequencies are the same . Applying Equation (3) then gives

(5) $\begin{equation*} MTF_{PIXdiag} =\left|\frac{sin(\pi f w)}{\pi f w}\right|^2 \end{equation*}$

This is the MTF of a triangle because that’s the intensity profile that arises when a square at 45 degrees to an axis is projected onto it. The linear spatial frequency $f$ in cycles per original horizontal pixel pitch here is the same as earlier but care must be taken when plotting these values that diagonal $w$ is $\sqrt{2}$ longer than in the horizontal (or vertical) direction.

Slices at intermediate angles will yield curves somewhere in between these two extremes per equation (3):

Figure 8. Radial slices of the 2D MTF of a perfect square pixel aperture, 100% fill factor, at varying angles off the horizontal or vertical. The horizontal axis is still in cycles per horizontal pixel pitch.

Note that slices of less than 8 degree inclination are virtually identical to the horizontal’s in the typical region of interest below 1 c/p used to evaluate imaging systems, while changes become noticeable above that. This is one of the reasons why the edge is ideally set up 4 to 6 degrees off the horizontal/vertical in the slanted edge method.

Pixel Aperture and Fill Factor

Effective pixel aperture width ( $w$ ) and pixel spacing (horizontal/vertical sampling pitch $p$ ) are separate variables. We sometime take them to have the same value in classic photographic digital sensors – therefore implicitly assuming a perfect square pixel footprint resulting in an effective 100% fill factor. But that’s definitely not a given, especially in specialized imaging applications.

Fill Factor is the active area of a pixel, the photosite or photodiode, expressed as a percentage of the pixel’s overall area suggested by pitch, including all support electronics.

Figure 9. Front Side Illuminated Active Pixel, courtesy of Olympus.

Effective Fill Factor (e $FF$ ) takes into consideration the capability of the pixel design – especially the microlenses, if present – to concentrate onto the photodiode photons that would otherwise fall onto non active portions of the pixel and be lost. A well designed pixel with a good microlens can have an effective Fill Factor that approaches 100% of the ideal, uniform, square area suggested by pitch. With square pixel layouts, Effective Fill Factor e $FF$ is related to pixel pitch $p$ and effective pixel aperture width $w$ as follows:

(6) $\begin{equation*} \text{e}FF= \frac{w^2}{ p^2} \end{equation*}$

For example one could have effective aperture width $w$ = 4um and pixel pitch $p$ = 8um to simulate an 8um pixel with 25% Fill Factor (sometimes also referred to as 50% linear Fill Factor because each side of an effective square would then be 50% of the pitch); or alternatively to simulate using every other pixel in a sensor with perfect contiguous 4um pixels (say for a single color channel of a Bayer CFA sensor).

A smaller effective pixel aperture area all else equal means a ‘sharper’ image because it reduces the low-pass effect of the pixel on system MTF by shifting the relative sinc’s first null (1/ $w$ ) to a higher spatial frequency than what would be suggested by pitch (1/ $p$ ). Of course that also means more aliasing and fewer interacting photons, hence lower sensitivity and a noisier picture – but that is a compromise that some manufacturers are willing to make for their use cases. Medium Format cameras like some Phase Ones or the Fuji GFX 50S are known to have reduced or absent microlenses to achieve this objective.

Fuji GFX 50S with Fujifilm GF 110mm f/2 R MTF SFR curve — Figure 10. MTF curve of the green raw channels of a Fuji GFX 50S with Fujifilm GF 110mm f/2 R, at ISO100 f/4, captured by Jim Kasson and processed with open source MTF Mapper. Note the first null, due to effective pixel aperture, at about 1.35 cycles per pixel pitch, suggesting a square area effective Fill Factor around 55%.

[More recently other specialized cameras appear to be following in those footsteps, see for instance this analysis of the Nikon Z7.]

Pixel Aperture Shape

Nor do the microlenses and the active area – therefore effective pixel aperture – need to necessarily be square, although I understand sensor designers strive for a square active area in digital cameras. Taking those into consideration a more appropriate model for the shape of the Point Spread Function of effective pixel aperture would typically be a pillow, for what would normally be called a ‘square’ pixel.

On the other hand one could instead desire to have round Back Side Illuminated pixels of diameter $2w$ at $p$ spacing. In that case the sinc function used in Formula 1, the Fourier Transform of a square pixel aperture. would have to be replaced by the equivalent for a disk. The Fourier Transform of a disk is the Sombrero/jinc/besinc function***

(7) $\begin{equation*} MTF_{PIX} = |somb(fw)|= \left|\frac{2J_1(\pi f w)}{\pi f w}\right| \end{equation*}$

Figure 11. Modulation Transfer Function of uniform square in H/V direction and isotropic circular area with diameter equal to pixel pitch. The disk MTF is the Sombrero, Jinc, Besinc function.

This function has a first zero at about 1.2197 c/p. If the disk’s $2w$ diameter is equal to pixel pitch $p$ , its area is 78% ( $\frac{pi}{4}$ ) of our reference uniform square with sides equal to $p$ . Assuming a square reference in this case would cause us to underestimate effective fill factor because $(\frac{1}{1.22})^2$ is equal to 67%.

Sampling and aliasing next.

* Spatial frequency $f$ is explained in this article.
** Here is the code if you wish to follow along.
*** Note the resemblance to Equation (1) in the previous article for the Airy Function: it is also the Fourier Transform (albeit squared) of a perfect circular aperture, in that case of the lens.

Appendix: Reconciling Analytical and Numerical Results for non-band-limited signals

When we compare the DFT of a 5×5 square at the center of a 1024×1024 image (‘measured’ Figure 12 below) with results modeled by analytical Equation (4) it looks like there is a pretty good fit up to about 1 c/p but after that the model and measured curves part ways.

SFR Square and ROI — Figure 12. The impact of the size and shape of the Region of Interest on discrete Fourier transformed image data

That’s because, contrary to the Fourier Transform shown as the ‘Modeled’ blue line above, which works on unbounded continuous curves resulting in only a baseband signal, the DFT works on a finite set of discrete input and output values, which produce an infinite set of replicas of the baseband signal in the frequency domain. If the signal is not band limited to start with, energy from these replicas seeps back into baseband in the form of aliasing.

A rect function is definitely not band limited (it has infinite support), so the aliased energy from the replicas adds up quickly. Its DFT is therefore not the sinc function shown in Equation (4) of the continuous model, but an aliased sinc (also known as asinc, periodic sinc, Dirichlet or diric function), which is separable and – when the rect is symmetric – can be written as follows for each axis:

(8) $\begin{equation*} DFT_{rect} = asinc(f w) = \frac{sin( \pi f w)}{sin(\pi f)} \end{equation*}$

with $f$ * in cycles per sample and $w$ the width of the square aperture as before. This can be thought of as the ratio of two sincs, equation (4) divided by $sinc(f)$ :

(9) $\begin{equation*} DFT_{rect} = asinc(f w) = w\frac{sinc(f w)}{sinc(f)} \end{equation*}$

So in this case the output of the DFT differs from the continuous transform by this additional factor in the denominator (its effect shown as the yellow dashed curve above). To make its MTF curve match the continuous transform model we therefore need to multiply it by $sinc(f)$ , which brings us back to having ‘measured’ results accurately match theory for a square aperture, as shown below just in the horizontal direction:

Square SFR Model vs Measured — Figure 13. Horizontal Radial Slice of 2D MTF of perfect square pixel aperture (also referred to as footprint): analytical and numerical results match.

The DFT’s output was not apparently different in the earlier Airy plots because of diffraction extinction, which produces an already bounded, band limited signal. The way the effect of the finite size of the sampling interval on the signal is often mitigated in practice when using numerical methods for these calculations is to increase the size of the sampling interval relative to the signal, effectively raising the ‘ROI’ curve in Figure 12. For more on this you can read the discussion on Q at the bottom of this article.

9 thoughts on “A Simple Model for Sharpness in Digital Cameras – Diffraction and Pixel Aperture”

kenny says:

September 12, 2018 at 5:36 am

Could you explain how you get the figure 1 in detail?
I couldn’t get it from the equation 2.
Thanks.

1. Jack says:
  
  September 12, 2018 at 7:04 am
  
  Hi Kenny, Extinction/cut-off is at normalized spatial frequency (s) equal to 1, or f = 1/lambda/N, which corresponds to 1/0.53/16*5 = 0.59 c/p as shown in Figure 1. If you read matlab this is the code to generate it:
  
kenny says:

September 14, 2018 at 8:04 am

Hi Jack,
It seems the pixel size is set as 5 um/pixel to get the answer,right?
In the previous article you give it an relative unit instead of absolute one.
And the extinction frequency should be in the unit of cycles/mm if no pixel information is given referred to some text book.

Thanks for your reply.

1. Jack says:
  
  September 14, 2018 at 8:25 am
  
  Yes, you can assume that one unit is one micron if it helps, as shown in the code above – but as long as wavelength and pitch are in the same units it makes no difference to Figure 1.
  
  Spatial frequency can be expressed in any convenient unit: c/p, lp/ph, lp/mm, lp/um. lp/mm is just 1000 x lp/um, see the article on units. lp/mm is often used when one wants to know the absolute performance of the imaging system in a specific spot of the field of view on the sensor as captured, regardless of display size.
  
Jack says:

February 9, 2022 at 8:39 am

Michael asks:

‘Figure 7 looks strange: Since the pixel diagonal is longer than the pixel pitch, the effective sample rate decreases and the Nyquist frequency should decrease as well by sqrt(2). The MTF should hit 0 at a lower frequency than before, not at a higher frequency’ ?

Good question, the answer is simple if not intuitive. Figures 4 and 7 (and 8) represent radial slices. In order to make them comparable the horizontal axis is the same spatial frequency f referenced to absolute units of cycles per pixel pitch (c/p).

Spatial Frequencies in the x and y directions add geometrically so, diagonally, f_45deg = sqrt(f_x^2+f_y^2). The 2D FT of a perfect pixel with square aperture width w results in first nulls at 1/w c/p in the horizontal and vertical directions (sinc, Equations 3 and 4). We also know that the zero of the radial slice of the FT in the diagonal direction (sinc^2, Equations 3 and 5) occurs when f_x=f_y=1/w, or f_45deg = sqrt(2)/w c/p.

Alan Geller says:

August 9, 2022 at 2:26 pm

“On the other hand one could instead desire to have round Back Side Illuminated pixels of diameter 2w at p spacing”

I think the diameter is w, not 2w. This is comparable to the square pixel with side of w. This way the disc gives slightly better performance than the square, not >2x the performance, right?

1. Jack says:
  
  August 9, 2022 at 4:03 pm
  
  Hi Alan,
  I suggested 2w at pixel pitch spacing p for a circular aperture that just fits within a perfect square pixel (so p=2w) because this way the formula that follows is in its usual form, with w representing the radius of the aperture disc. For a perfect square pixel aperture p = w, so perhaps less clear than it should be.
  Jack
  
Sheng Liu says:

February 14, 2023 at 5:46 pm

Your Matlab code of equation 3 does not have pi, can you help me to understand why? Also, I don’t understand the comb function, which is also not included in your Matlab code. Is it possible if you can have a Matlab perform an image processing, meaning that convert a perfect pinhole image (no diffraction considered) to a pixelated image (with lens blur, pixel MTF). That would really help the reader to understand the application.

1. Jack says:
  
  February 15, 2023 at 7:37 am
  
  Hello Sheng Liu, good questions.
  
  The code for Equation (3) does not show pi because the sinc function in Matlab has pi built-in.
  
  The comb functions are not included because the objective of this article are the MTFs of Diffraction and Pixel Aperture by themselves. Their effect is discussed in the next article on aliasing.
  
  It is easy to simulate capturing a continuous monochrome image in the spatial domain, per Equation (2) of the previous article. Just draw it out geometrically then produce a finite version of the lens PSF and convolve it with the sampled geometric image; then produce the pixel aperture PSF (a square with 100% Fill Factor in this article) and do the same; finally create an array selecting the continuous value physically corresponding to the center of each pixel (this is the job of the dirac delta combs). To understand what is happening you will want to oversample everything, per Figure 1 of the next article and then figure out how to downsize it to sensor resolution. I’ll leave it as an exercise for the reader 😉
  
  Jack

Strolls with my Dog