Introduction to Texture MTF

Texture MTF is a method to measure the sharpness of a digital camera and lens by capturing the image of a target of known characteristics. It purports to better evaluate the perception of fine details in low contrast areas of the image – what is referred to as ‘texture’ – in the presence of noise reduction, sharpening or other non-linear processing performed by the camera before writing data to file.

Figure 1. Image of Dead Leaves low contrast target. Such targets are designed to have controlled scale and direction invariant features with a power law Power Spectrum.

The Modulation Transfer Function (MTF) of an imaging system represents its spatial frequency response, from which many metrics related to perceived sharpness are derived: MTF50, SQF, SQRI, CMT Acutance etc. In these pages we have used to good effect the slanted edge method to obtain accurate estimates of a system’s MTF curves in the past.^[1]

In this article we will explore proposed methods to determine Texture MTF and/or estimate the Optical Transfer Function of the imaging system under test from a reference power-law Power Spectrum target. All three rely on variations of the ratio of captured to reference image in the frequency domain: straight Fourier Transforms; Power Spectral Density; and Cross Power Density. In so doing we will develop some intuitions about their strengths and weaknesses.

In summary, my initial feeling is that the Texture MTF approach appears more appropriate for non interchangeable lens cameras like those in smartphones, that produce highly processed images nonlinear with incoming light intensity, where fooling computational photography algorithms into not working may be a priority. However, the prescribed approaches blunt the sensitivity and accuracy of the estimated MTF curves.

Since technical setup challenges are effectively similar to those in the more sensitive and accurate slanted edge method, I believe that the latter remains the preferred tool for photographers interested in the spatial frequency performance of enthusiast Interchangeable Lens Camera hardware.

Dead Leaves Power Spectrum

So-called Dead Leaves, or Spilled Coins, images such as those shown in Figures 1 and 2 are a target of choice for Texture MTF measurements because their content can behave like pink noise with a power-law Power Spectrum. They are also space, scale and rotation invariant to a good degree, a key to easily estimate the unknown Power Spectrum of the target in practice, as we will discover.

The Power Spectrum (PS) of an image $I$ – also referred to as Spectral Power Distribution or Density (SPD) when plotted as a function of spatial frequency $f$ – is the magnitude of its Fourier Transform ( $\mathcal{F}$ ) squared.^[2] It can also be computed by multiplying the FT by its complex conjugate, indicated below by an asterisk:

(1) $\begin{equation*} PS(f) = SPD(f) = |\mathcal{F}(I)|^2 = \mathcal{F}(I)\mathcal{F}(I)^* \end{equation*}$

Power law just means that the relative Spectral Power Density is proportional to spatial frequency $f$ raised to a certain power $k$ . In this context

(2) $\begin{equation*} PS (f) = SPD(f) = \frac{c}{f^{k}} \end{equation*}$

The Power Law characteristic is relevant because many natural images exhibit it with $k$ a low number, so measurements off the target can be representative of the response expected with typical scenes. Here is an ideal Dead Leaves image generated by a Matlab function^[3] with $c = 0.323$ and $k=3$

Figure 2. Full contrast Dead Leaves image generated by the compute_dead_leaves_image.m Matlab function by Gabriel Peyré linked in the notes.

Such targets can be produced with various contrast ratios, the higher the contrast the lower the noise in the frequency domain. The one above is full contrast. The reason for producing reduced contrast ratio targets is to make it easier for users not to clip or block tones during capturing and rendering.

Incidentally power-law occurrence in nature was discovered a century ago and studied in the intervening years under the guise of 1/f ‘noise’, aka the Spectral Power Density of observed fluctuations. Confusingly it is sometimes referred to as Noise Power Density, including in the fluctuations what is normally referred to as both signal and noise in photographic sience.^[4]

Spectrum Radial Mean

The Discrete Fourier Transform (DFT) of a two-dimensional image is two-dimensional and of the same size, therefore so is its Spectrum. However, taking advantage of the Dead Leaves target’s approximate rotational invariance, when used for texture MTF computations it is customary to compute a radial mean of the Spectrum because the magnitude of the FT of such an image is supposedly invariant to rotation – and averaging reduces noise in the result.

Energy at each given radial spatial frequency $f$ equidistant from the origin is averaged over the relative concentric ring in the 2D DFT. In fact we could average just the portion of the ring in one quadrant since with a real input the DFT should exhibit Hermitian symmetry . The result is a one dimensional Spectrum averaged over all directions as a function of $f$ in units of cycles per pixel. For the target image in Figure 2 a Power Spectrum so computed produces the following blue curve on log-log axes:

Figure 3. Blue curve: radial mean of Power Spectral Density of Dead Leaves image in Figure 2. Orange curve: power law function with c = 0.323 and k=3. Horizontal axis in cycles per pixel.

It can indeed be loosely approximated by a straight line in the middle of the frequency range, though it tends to deviate from the ideal at either end. As we will see below things can be improved somewhat by undersampling the target image at the time of capture. A straight line in log-log format indicates a power law, the orange curve above representing $\frac{0.323}{f^3}$ .

However, while the target may be approximately rotational invariant the response of the imaging system typically is not, therefore taking a radial mean over all directions necessarily reduces the sensitivity of the result. Gone are meridional and sagittal nuances of a, say, decentered lens – as are directional dependencies of a squarish pixel aperture for instance. The radial mean of the spectrum will average out these subtleties and present them as one. On the other hand the resultant curve will be less noisy as a byproduct of averaging many equi-frequency energy readings together.

Alternatively, averaging can be limited to contiguous angular slices of the rings to safeguard directional information, though clearly the better such information, the noisier the result.

The plot above belongs to the family of Spectrograms or periodograms. Only frequencies up to the monochrome Nyquist frequency are shown in such plots (0.5 cycles/pixel pitch) because those beyond it are their mirror image, a consequence of the fact that the captured image is real so its Fourier Transform is subject to Hermitian symmetry. Therefore the power law function can only be applied to 0.5 c/p, a limitation of the texture MTF approach.

1) Texture OTF from Dead Leaves

The idea is to take a capture of the Dead Leaves target with the camera and lens under consideration and compare it to the power law reference. The target is usually set up to be relatively small in the captured field of view, say 500×500 pixels on the sensor, in order to oversample it and to obtain the radially averaged performance of the system in that location only. Then move the target and repeat the procedure to map the field of view as desired.

The captured image $I(x,y)$ is a consequence of known ideal target image $O(x,y)$ , which is corrupted by blur $PSF(x,y)$ and noise $\mathcal{N}(x,y)$ introduced during the imaging process. Therefore any measured deviations from the reference power-law Power Spectrum profile must be due to blurring and noise introduced by the capturing process. We can model it with the following two-dimensional variables:

(3) $\begin{equation*} I(x,y) = O (x,y)** PSF(x,y) + \mathcal{N}(x,y) \end{equation*}$

with $**$ indicating two-dimensional convolution. In the frequency domain convolutions become element-by-element multiplications so taking the Fourier Transform of both sides of the equation and denoting with a hat symbol the Discrete Fourier Transform,

(4) $\begin{equation*} \hat{I}(u,v) = \hat{O} (u,v) \hat{PSF}(u,v) + \hat{\mathcal{N}}(u,v) \end{equation*}$

with $(u,v)$ representing horizontal and vertical spatial frequencies respectively, which I will drop for readability henceforth together with image indexes $(x,y)$ . Keep however in mind that every term in these equations is a two-dimensional array the same size as captured image $I$ . The Fourier Transform of the Point Spread Function (PSF) is known as the Optical Transfer Function (OTF). Dividing both sides by $\hat{O}$ :

(5) $\begin{equation*} \frac{\hat{I}}{\hat{O} } = OTF + \frac{\hat{\mathcal{N}}}{\hat{O} } \end{equation*}$

where the 2D divisions are element-wise. In other words, with your photographic equipment capture in the raw data ( $I$ ) the Dead Leaves target in Figure 2 ( $O$ ), then divide the relative 2D Fourier Transforms frequency-by-frequency: the result is a noisy version of the OTF. MTF is the absolute value of the OTF^[5] so

(6) $\begin{equation*} MTF_{texture1} = |\frac{\hat{I}}{\hat{O}}| = |OTF + \frac{\hat{\mathcal{N}}}{\hat{O} }| \end{equation*}$

The $\frac{\hat{\mathcal{N}}}{\hat{O}}}$ noise term in Equation (5), a sort of noise-to-signal ratio in the frequency domain, should be relatively small with typical target capturing setups because Signal-to-Noise-Ratio is specified to be relatively large in such cases. On the other hand $\hat{O}$ ‘s power law SPD in the denominator, with its increasingly lower energy at increasing spatial frequencies, will ensure that the OTF will be increasingly noisy at higher frequencies. Incidentally, with typical testing setups $\mathcal{N}$ can be considered to be mostly a mixture of random noises with different standard deviations: shot and PRNU noise, which depend on the intensity of the image, and Read Noise, which depends on the sensor and electronics.

Other than the fact that this method is insensitive to directional changes because of its radial averaging, its main practical limitation is the requirement for perfect alignment of the ideal target image $O$ and captured image $I$ , something not easy to do, especially in light of less than perfect optics or setup.^[9] Also, the intensity profile of the physical target needs to match that of the ideal reference file. Anyone who has tried to match printed output to a linear reference recognizes the work that goes into achieving such a feat.

However with a perfect setup it can work quite well, as you can see at the bottom of the article.

2) Texture MTF from Dead Leaves, Cao et al.

Folks don’t always have on hand a perfect digital representation of their physical power law target $O$ . Cao et al. in their 2010 paper “Dead leaves model for measuring texture quality on a digital camera“^[6] suggest that it is not needed in order to obtain Texture MTF – as long as the power law Power Spectrum of the relative physical target $|O|^2$ can be estimated.

For instance it could be replaced by the power law function itself, populating a two dimensional array the same size as $\hat{O}$ , with the power at each position calculated based on the corresponding radial spatial frequency $f = \sqrt{u^2+v^2}$

(7) $\begin{equation*} \hat{T}(u,v) = \frac{1}{f_{(\sqrt{u^2+v^2})}} \end{equation*}$

Two-dimensional $\hat{T}^k$ can also be reduced to a radial mean. For example the resulting radial power-law Power Spectrum for 0.323 $\hat{T}^3$ is the orange line in Figure 3.

Alternatively, if we did have the specifications of the reference target or a pristine digital image of it, we could obtain $\hat{T}^k$ by calculating the relative Spectral Power Density from those directly – or from a meaningful crop of the uncorrupted image assuming it is truly space, scale and rotation invariant. No need for alignment, just get a meaningful sample of the relative Power Spectrum and scale it accordingly in the frequency domain.

However $\hat{T}^k$ is obtained, we can apply it by taking the absolute value and squaring both sides of Equation (5), converting it to Power Spectra

(8) $\begin{equation*} \frac{|\hat{I}|^2}{|\hat{O}|^2} = |OTF + \frac{\hat{\mathcal{N}}}{\hat{O}}|^2 \end{equation*}$

then replace the Power Spectrum of the reference image $|\hat{O}|^2$ with its surrogate $\hat{T}^k$ to obtain a noisier approximation to $|OTF|^2$ . Noisier because $|\hat{O}|^2$ fluctuates around $\hat{T}^k$ , as shown in Figure 3 for example. Since the modulus of OTF is MTF, taking the square root of both sides we get

(9) $\begin{equation*} MTF_{texture2} \approx \sqrt{\frac{|\hat{I}|^2}{\hat{T}^k}} \approx |OTF + \frac{\hat{\mathcal{N}}}{\hat{ot^k_1}}| \end{equation*}$

The advantage is that there is no need for the original data or for alignment of the uncorrupted target image $O$ , just calculate the PS of captured image $I$ and divide it by known array $\hat{T}^k$ .

The $\hat{ot^k_1}$ symbol in the denominator of capture noise $\hat{\mathcal{N}}$ stands to indicate that it is modified by both $\hat{O}$ and $\hat{T}^k$ . The better the surrogate matches the reference image Power Spectrum, the less the additional noise introduced by $\hat{T}^k$ .

The other relevant difference with the noise term of the previous method is that this one is the result of a ratio of absolute values only ( $\hat{T}^k$ is real only by definition): they cannot dip below zero, creating a positive bias in the mean signal as it approaches zero energy. This bias will become especially noticeable in the higher frequencies as a result of the previously discussed frequency dependence of the target.

There were some interim papers that proposed subtracting from such Texture MTF curves the Spectral Power Density of noise measured from a uniform patch of mean intensity in the captured target image. Though this subtraction may help somewhat, the discussion above suggests that the correction would not be accurate because, ignoring read noise, the Power Spectrum of noise here clearly varies with $O$ and $|\hat{O}|^2-\hat{T}^k$ (as well as with $I$ ).

Other than that, if the target followed the specified power law closely we could expect decent Texture MTF measurements. However we could tell that that was not the case with the target in Figure 2 because the blue power spectrum curve in Figure 3 did not overlay the orange power law line perfectly. In fact Cao et al. suggest that their approach works best in the center of the shown spatial frequency range only, even before noise from the imaging system is introduced.

The issue can be mitigated by ensuring that the captured resolution is substantially lower than the target’s. When a target image pixel is made small with respect to the camera’s pixel the acceptable frequency range improves but the inherent standard deviation tends to get worse. Some periodicities can be reduced with careful processing but there are diminishing returns, see the image at bottom. So while with this method there will be no need to perfectly overlay capture and reference images, the best we can hope for with Texture MTF measured off of a Dead Leaves target similar to that in Figure 2 is incorrect readings near the origin and an inherently noisy MTF.

3) Texture OTF from Dead Leaves, Kirk et al.

In their 2014 paper “Description of texture loss using the dead leaves target: Current issues and a new intrinsic approach“,^[7] Kirk et al. suggest using the cross correlation between ideal power law target $O$ and captured image $I$ , what is referred to in the frequency domain as a Cross Power Spectral Density (CPSD) or Cross Spectral Density (CSD). This quantity is normally computed in two dimensions as the Fourier Transform of the latter ( $\hat{I}$ ) times the complex conjugate of the Fourier Transform of the former ( $\hat{O}^*$ ) or vice versa

(10) $\begin{equation*} CSD= \hat{I} \hat{O}^*} \end{equation*}$

Recalling that the Power Spectrum of an image can be computed by multiplying its Fourier Transform with the resulting complex conjugate, it is obvious that if we divide the $CSD$ so derived by the Power Spectrum of the original target image, the $\hat{O}^*$ in the numerator and denominator will cancel out, producing by different means the noisy $OTF$ described in the first method above:

(11) $\begin{equation*} \frac{\hat{I} \hat{O}^*}{\hat{O} \hat{O}^*} = \frac{\hat{I}}{\hat{O}} = OTF + \frac{\hat{\mathcal{N}}}{\hat{O}} \end{equation*}$

However this formulation allows us to use the guesstimated power law Power Spectrum $\hat{T}^k$ in the denominator of the first fraction above per Cao et al, obtaining

(12) $\begin{equation*} \frac{\hat{I} \hat{O}^*}{\hat{T}^k} \approx OTF + \frac{\hat{\mathcal{N}}}{\hat{ot_2^k}} \end{equation*}$

Kirk et al. suggest showing Texture MTF as the real part of this result, which with some additional math and ignoring noise may be plausible because with real positive inputs the full power contribution to the CPSD is twice the real part of either of the cross correlations in the frequency domain, $\hat{I}\hat{O}^*$ or $\hat{I}^*\hat{O}$ .^[8] Others have recommended taking the absolute value of the noisy OTF instead, per the definition of MTF, which I also prefer for consistency:

(13) $\begin{equation*} MTF_{texture3} \approx |\frac{\hat{I} \hat{O}^*}{\hat{T}^k} | \approx |OTF + \frac{\hat{\mathcal{N}}}{\hat{ot_2^k}}| \end{equation*}$

The supposed benefit compared to the second method is that here the first fraction above carries all real and complex components before the absolute value is taken so the derived spectrum will be better behaved at higher frequencies as it approaches zero energy. The disadvantage is that we again need to have the exact target image $O$ to be matched in space and intensity,^[9] like with the direct first method. If that’s the case I am not sure why this would be preferable given that it would be noisier, since the power-law Power Spectrum of the pristine target is unlikely to be exactly the same as $\hat{T}^k$ , introducing an additional component of noise in the $\frac{\hat{\mathcal{N}}}{\hat{ot_2^k}}$ fraction, like in the previous method.

An Example, Warts and All

This is how Texture MTF looks from a 512×512 pixel Dead Leaves perfect synthetic image generated with the routine used for Figure 2 with $k = 3$ , to which Gaussian blur of 1 pixel standard deviation and noise for a typical smartphone camera was applied, you can see the details of the simulation in the Matlab code linked at the bottom of the notes.

The objective of this example is to highlight the strengths and weaknesses of the three methods as described, so without any improvements like windowing or other types of filtering. The real part of the MTFs was plotted in this case but absolute results are virtually identical.

Figure 4. Texture MTF from the three methods described in the article. The 512×512 pristine Dead Leaves image was blurred by a Gaussian PSF with one pixel standard deviation, then noise was added to simulate a smartphone camera.

The direct FT ratio described in Method 1 (blue curve) provides the cleanest result, albeit with the somewhat impractical requirement of having to perfectly align reference and captured images. A paper linked in the notes quantifies the effects of misregistration.^[9] As expected, noise increases with frequency.

The square root of the Power Spectra ratio in Method 2 (red curve) shows the expected bias at higher frequencies and incorrect estimate at lower ones. The MTF curve was scaled by factor $c$ so that the two Power Spectra best overlay each other in the most linear region of Figure 3, between .03 and 0.3 c/p. The error near the origin and big excursions between 0.15 and 0.25 c/p are due to deviations of the actual Power Spectrum of the target from the ideal $\frac{0.323}{f^3}$ . Beyond that noise bias takes over. Some of the bias could be removed by subtracting the Power Spectrum of noise from a uniform patch of mean intensity in the target, as suggested by Cao et al., with the limitations mentioned earlier. To correct MTF at lower frequencies they propose starting the plot from around .02 c/p or borrowing that part of the curve from a slanted edge reading.

The CPD ratio in Method 3 (yellow curve) is a compromise of the previous two: it gets around the noise bias as described in the relative section but suffers from departure of the actual versus ideal Power Spectrum of the target, which glancing at Figure 3 also explains higher values than expected at higher frequencies. As mentioned it requires perfect alignment of the reference and captured images, making it apparently less useful in practice.

A target – the Power Spectrum of which better tracks an ideal power-law function independently of area, scale and rotation -would perform consequently better in methods 2 and 3. They can be bought online.

As Good as It Gets

You may be wondering why not use the known Spectral Power Density of the pristine target shown in Figure 3 for $T^k$ : surely that will get rid of most of the noise introduced by having been forced to use an approximate surrogate. And indeed IF the specifications of uncorrupted target $O$ are available, $T^k$ can replace $|\hat{O}|^2$ in Equations (9) and (13), effectively minimizing an important source of noise. The curves improve substantially:

Figure 5. Same as Figure 4 but using the known Spectral Power Density of the target in the denominator of Equations (9) and (13).

In this case the direct FT of Method 1 (blue dots) and CPD of Method 3 (yellow curve) are identical, because as explained earlier the two methods are two sides of the same coin. The issue of registration remains for both, though in this Figure it is swept under the carpet because here the digital images can be (and are) perfectly aligned in space and intensity.

The square root of the Power Spectra ratio in Method 2 looks better here than in Figure 4 because the Spectral Power Density of a uniform area within one of the larger, dark gray ‘leaves’ of the corrupted image was subtracted from it before dividing by $|\hat{O}^2|$ . We only need the Power Spectrum profile to do that so registration is not an issue – but the resultant curve still shows some bias, making it less accurate than the other two methods, if not more practical to obtain.

I believe this is as good as one can hope for out of the Texture MTF methods. Reality in the field is evidently somewhere in between the two extreme cases shown.

In addition you may have noticed that the chosen MTF curves show virtually no energy above the monochrome Nyquist frequency (0.5 c(p). That’s because as mentioned Texture MTF does not deal well with aliasing, introducing an additional variable that will be left for another time. Though it may have its place in smartphone camera Image Quality measurements, I think I will stick to raw files and the tried and true Slanted Edge Method when evaluating the ‘sharpness’ of enthusiast Interchangeable Lens Camera hardware, because it can produce accurate results like this with similar effort:

Notes and References

_{1. Open source MTF Mapper by Frans van den Bergh provides a relatively easy way for photographers at home to estimate the spatial frequency response of a camera and lens over the entire field of view.

2. For a definition of Fourier and Power Spectrum see for instance Gonzalez and Woods, third edition, p. 245.

3. The compute_dead_leaves_image.m Matlab function by Gabriel Peyré can be found here.

4. A good introduction to 1/f noise can be found here.

5. The absolute value of a complex function is also referred to as its modulus or its magnitude and it is computed as the quadrature sum of its real and imaginary parts.

6. Frédéric Cao, Frédéric Guichard and Hervé Hornung, “Dead leaves model for measuring texture quality on a digital camera“, Proc. SPIE 7537, 75370E (2010); doi:10.1117/12.838902

7. Leonie Kirk, Philip Herzer, Uwe Artmann, Dietmar Kunz, Description of texture loss using the dead leaves target: Current issues and a new intrinsic approach, 2014

8. The two cross correlations are complex conjugates of each other, so the full contribution to the cross power can be obtained from twice the real part of either, see the relative wikipedia page.

9. The effect of misaligning the reference and the captured image for method 3 is quantified in “The Effects of misregistration on the dead leaves cross correlation texture blur analysis, Robert C Sumner, Ranga Burada and Noa Kram, 2017.

10. The Matlab code for the simulation of Figure 4 can be downloaded by clicking here. Figure 5 is a trivial extension of it.}

5 thoughts on “Introduction to Texture MTF”

Chris Dainty says:

November 19, 2023 at 4:27 pm

These are great notes, they take me back to Image Science, published in 1974! Things have advanced but the fundamentals are the same.

1. Jack says:
  
  November 19, 2023 at 5:09 pm
  
  Why, thank you much Chris!
  
Paul Hofseth says:

November 20, 2023 at 5:40 pm

Sir,

I did not gtrasp the essence of yoru method since I changed from studying teh more ezoteric maths of physics, chose Boolean functions over understanding Cauchy surfaces and waas satisfied with Seidel when reading optics. So in order to grasp its urility I would appreiate seeing MTFgraphs such as the Zeiss K-8 ones besides the results of your method applied to the same lenses.

p-

1. Jack says:
  
  November 21, 2023 at 8:11 am
  
  Hello Paul, thanks for your comment
  
  I have added a couple of examples. You can also take a look at the papers by Cao et al. and Kirk et al. in the links above for plots of their results. However, do not expect performance comparable to what could be obtained from an optical bench because as mentioned Texture MTF is a blunt and noisy tool.
  
  To obtain something in between, more than acceptable for photographic hardware evaluations that can be performed at home, I suggest the slanted edge method using open source MTF Mapper. You can find links to both in the article.
  
Clement says:

December 18, 2023 at 5:21 pm

Very nice, thanks!

These kind of propagation of the texture power spectrum through the imaging system are also used in medical imaging to assess if the radiologist will be able to separate the signal from the textured noise at the end. Breast radiographies especially have a “lumpy background” with a known power-law NPS.

Strolls with my Dog