Combining Bayer CFA Modulation Transfer Functions – I

In this and the following article I will discuss my thoughts on how MTF50 results obtained from raw data of the four Bayer CFA color channels off a neutral target captured with a typical camera through the slanted edge method can be combined to provide a meaningful composite MTF50 for the imaging system as a whole. The perimeter of the discussion are neutral slanted edge measurements of Bayer CFA raw data for linear spatial resolution (‘sharpness’) photographic hardware evaluations. Corrections, suggestions and challenges are welcome.

Part I: a Little Background

Contrast and the Modulation Transfer Function

Modulation Transfer Function curves, approximated by spectral frequency response curves, form the basis of many metrics to estimate the ‘sharpness’ IQ of our photographic equipment. They measure one thing only: loss of CONTRAST at various spatial frequencies.

They can be conveniently calculated in the frequency domain as the magnitude of the Fourier transform of the imaging system’s impulse response (the Point Spread Function, which contains all frequencies) normalized to one at the origin. However it is useful to understand their meaning and effect in the spatial domain.

MTF in the Spatial Domain

Contrast in this context means the relative difference in peak to valley intensity as line pairs get closer together, apparent in the less and more blurred Siemens stars in the figure below. The top one has more contrast between its lightest and darkest areas and the contrast in both diminishes closer to the center of the images. Luminance from the scene is assumed to be constant and proportional to the values written in the camera’s raw file.

In the spatial domain this is called Michelson Contrast, calculated by taking the difference between the linear intensity ( $I$ ) at the peak and valley of a line pair at a given spacing divided by their sum:

$MTF(f) = \frac{I_{max} - I_{min}}{I_{max} + I_{min}}$

$f$ is the spatial frequency at which the reading is taken, equal to the inverse of the spacing of the line pair. For instance, looking 1/3 of the way down from the top of the Siemens stars in the center, a pair of wedges are about 10um apart. One hundred such pairs (alternatively cycles) would fit in a mm so we could say that the linear spatial frequency there is 100 lp/mm¹.

Reading the linear pixel values in the raw data corresponding to the peak and valley linear intensities in that location gets the following Michelson Contrast for the top star at a linear spatial frequency of 100 lp/mm

$MTF(100) = \frac{11719 - 1709}{11719 + 1709} = 0.745$

Similarly for the bottom star

$MTF(100) = \frac{9033-4395}{9033 + 4395} = 0.345$

Note that MTF measures relative intensity, the absolute values get lost: mean intensity could be 100, 1000 or 100,000 but the MTF as measured above would not change because it’s a ratio. Numerator and denominator would be multiplied by the same factor and cancel out. The highest possible natural MTF occurs at zero frequency. It is one there, after having been normalized to the intensity before interaction with the system under evaluation.

Also note that values in the MTF calculation above do not have a ‘color’. Contrast, hence MTF, does not care whether the subject under evaluation is red, yellow or blue, it only cares about the normalized peak to valley linear value swings it sees from data off the sensing plane. MTF treats every subject as if it were gray.

Of course the MTF curve of a neutral subject illuminated by red light will be quite different from that of the same setup illuminated by blue light because light of different wavelengths may behave differently within an imaging system, as shown in the previous article. But MTF measures them all as if they were gray subjects with different spatial properties.

We See ‘Sharpness’ mostly in Black and White Too

For purposes of sharpness and spatial resolution, to a good extent so apparently does the human visual system (HVS). It is well known that achromatic (luminance) spatial acuity in humans is better than chromatic spatial acuity. Despite the fact that the part of the retina responsible for acuity, the fovea, also works off three types of photoreceptors in photopic vision – rho, beta and gamma cones, each with different spectral sensitivity functions – there is good evidence that the human visual system is able to recover spatial information from every receptor with minimal loss. There appears to be no spatial resolution penalty for trichromacy^[2] ^[4] .

In other words the HVS effectively recreates a full resolution grayscale image from its individually sparse color sensitive receptors. In fact stimuli from the three types of cone are collected by neurons which encode them into three new signals before transmission to the visual cortex via nerve fibers bundled in the optic nerve: an ‘achromatic‘ channel and two ‘color difference‘ ones that carry color information^[3]. That’s what the brain ‘sees’.

Raw CFA luminance = Output RGB luminance

In this article capitalized Luminance refers to the photometric quantity in $cd/m^2$ , while luminance with a lowercase ‘l’ means the spatial map of linear image intensity Y also known as luma Y’ if gamma encoded.

The HVS approach suggests the separation of luminance and chrominance information as an effective model for digital images, a fact exploited by Y CbCr-class color space encoding and related color subsampling. Alleysson et al. demonstrate that in the frequency domain the baseband luminance channel of the raw Bayer CFA image is the same as that of the linear trichromatic RGB image in a colorimetric output color space: “Thus, trichromatic color images and CFA images share the same definition of luminance; only chrominance is subsampled in the CFA image” ^[4] .

Luminance in the Frequency Domain

By looking at the problem in the frequency domain, they suggest that a Bayer CFA raw signal can be interpreted as a full resolution baseband grayscale/luminance component plus two chrominance components modulated at monochrome Nyquist spatial frequencies^[5].

Dubois^[6] shows that the Fourier transform of the raw CFA baseband luminance image is obtained as the following weighted sum of the Fourier transform of the three individual raw color planes, $\hat r,\hat g,\hat b$ :

(1) $\begin{equation*} \hat L = 0.25\hat r + 0.5\hat g + 0.25\hat b \end{equation*}$

Resolution is Best With an Achromatic Subject

Therefore the presence of the two chrominance components corrupt baseband luminance information and much effort has been spent trying to figure out how to minimize this interference.

When light from the scene is achromatic (hueless, neutral) however, the two chromatic components are equal to zero, leaving luminance undisturbed. The captured color is neutral in Bayer CFA raw data when mean values in the four raw color channels that make up the three color planes are the same. In a hueless RGB linear image (neutral, gray, achromatic) luminance is proportional to the intensity of the individual color channels adjusted for white balance^[7].

Critical White Balance

One of the key properties of the HVS is Chromatic Adaptation, otherwise known as Color Constancy. It means that it is able to adjust to changes in illumination in order to preserve the appearance of object colors. A white diffuse reflector looks white whether it is illuminated by a candle indoors or by the sun on a glacier. Therefore the HVS is constantly applying auto white balance to what we see.

In photography we achieve a similar objective by ensuring that all raw channels are normalized to the same average intensity on a neutral ‘gray card’. In this case white in the raw data should be represented by the same maximum value for the three color planes, say [1,1,1] for r, g and b when the data is scaled from zero to one. Many linear transformations later, white should still be [1,1,1] in the output colorimetric color space and all neutral levels of gray should ideally result in equal values for R, G and B in, say, sRGB.

In Summary

This suggests that in order to obtain the best spatial resolution information from a test target with an imaging system based on a Bayer CFA sensor, a photographer should capture with a uniform illuminant a neutral subject from which the system’s impulse response can be derived – and properly white balance the resulting raw data.

Take-aways for part II:

MTF is the magnitude of the Fourier transform of the imaging system’s Point Spread Function normalized to one at the origin.
Absolute intensity is irrelevant to MTF
MTF treats every color plane or set of data as ‘gray’ (and apparently so does the HVS as far as spatial resolution is concerned)
In the frequency domain a Bayer CFA image can be interpreted as a full resolution baseband luminance image plus two chromatic components at the Nyquist frequencies suggested by pixel pitch
The raw CFA image and the corresponding colorimetric RGB color image share the same baseband linear luma channel
The least corrupted luminance information is obtained when the chromatic components are zero, which occurs when the subject is hueless/neutral and the raw file is white balanced (see this article for an example)
In such a situation the Fourier transform of the full resolution baseband luminance image is obtained as the weighted sum of the Fourier transforms of the individual raw CFA color planes: $\hat L = 0.25\hat r + 0.5\hat g + 0.25\hat b$

Notes and References

_{1. Other spatial frequency units such as lp/ph or cycles/pixel may be preferred depending on the application. Here is how to convert between them.

2. The Cost of Trichromacy for Spatial Vision. David R. Williams, Nobutoshi Sekiguchi, William Haake, David Brainard, Orin Packer.

3. Section 1.6, Visual Signal Transmission. From: Measuring Color. Third Edition. R.W.G. Hunt. Fountain Press, 1998.

4. Frequency selection demosaicking: A review and a look ahead. D. Alleysson and B. Chaix de Lavarène, Proc. SPIE 6822, 68221M, 2008, Section 2.

5. Linear demosaicing inspired by the human visual system. David Alleysson, Sabine Susstrunk, Jeanny Herault. IEEE Transactions on Image Processing, Institute of Electrical and Electronics Engineers, 2005, 14 (4), pp.439-449.

6. Frequency-Domain Methods for Demosaicking of Bayer-Sampled Color Images. Eric Dubois. IEEE SIGNAL PROCESSING LETTERS, 2005, 12 (12), p. 847.

7. Digital Image Processing, An Algorithmic Introduction Using Java. Wilhelm Burger, Mark J. Burge. Springer Science & Business Media, 2008, pp. 256-257.}

Strolls with my Dog