This is a vast and complex subject for which I do not have formal training.  In this and the previous article I present my thoughts on how MTF50 results obtained from  raw data of the four Bayer CFA channels off  a uniformly illuminated neutral target captured with a typical digital camera through the slanted edge method can be combined to provide a meaningful composite MTF50 for the imaging system as a whole1.  Corrections, suggestions and challenges are welcome.

Part II: adding MTF50s of neutral slanted edge color planes

Slanted Edges Make Things Simple

The capture of a neutral slanted edge in the raw data with good technique  can provide an Edge Spread Function, the differential of which is its  Line Spread Function, the normalized modulus of the Fourier transform of which is a good approximation to the Modulation Transfer Function of the imaging system.   The resulting curve refers to the linear spatial resolution response of the imaging system near the center of the edge in the direction perpendicular to it  (shown as the ‘edge normal’ below).  See here for a more in-depth description of the method.

Frans van den Bergh, author of excellent open source MTF Mapper, shows it this way:

Edge Spread Function
Figure 1. The Slanted Edge Method to Obtain a Modulation Transfer Function, Courtesy of Frans van den Bergh

The sub-pixel definition of the edge intensity profile is due to the  slight inclination of the edge with respect to the sensing pixel array.  Angles are typically in the range of 4 to 6 degrees and edges are often 200 pixels or longer, resulting in 100x super-resolution oversampling of the edge.  For all intents and purposes the method provides a good approximation to the one-dimensional continuous edge profile on the sensing plane before digitization by the sensor.

Note that because of super-resolution if the angle and edge length are appropriately chosen pixel spacing is not critical in order to obtain good edge definition.  Within limits, an accurate edge spread function could just as easily be obtained if only every other raw pixel were present in the mosaic.  Nor is noise an issue in typical testing conditions, because with a hundred pixels or more being projected every small interval onto the edge normal, their average intensity will smooth out most noise in the ESF.  And recall that what matters is the relative intensity of the edge because MTF does not care about absolute intensity, normalizing the curve to 1.  LSFs can be normalized to any value without impacting the resulting MTF curve.

Figure 2. Fully populated Bayer CFA on the left, Monochrome to the right.  Image courtesy of Cburnett, under licence

Therefore, within limits, it makes no difference to the ESF whether it is the result of a fully populated raw mosaic (such as that found in a monochrome sensor, gray right above), or a sparsely populated one (such as those found in the single raw color planes of a Bayer CFA sensor below).

Figure 3. Sparsely populated color planes under Bayer CFA filters.      Image courtesy of Cburnett, under licence, replaced bottom text with ‘Color Planes’

Nor does it care whether the average intensity of the red raw channel is half that of the green one.  As far as the method is concerned, they can be considered as full  and separate images, one for each color plane.  Within limits, the resulting ESFs will be just as good as those produced by a 3 way beam splitter projecting the scene onto three fully populated monochrome sensors behind sheet color filters of equivalent characteristics to the filters in the CFA.

3 Continuous Grayscale ESFs Yield 3 MTFs

Each raw color plane will stand on its own and produce ‘grayscale’ images with linear spatial characteristics that depend (mainly) just on the spectral properties of the filter and the illuminant it is sitting under, as discussed earlier.  The resulting ESFs will have been normalized to the same mean intensity by white balance.

CFA Sensor Frequency Domain Model
Figure 4. As far as the slanted edge method is concerned the three color planes might as well have been produced by full resolution sensors sitting behind full sheet color filters

The consequent three raw color plane LSFs will therefore be good approximations of the continuous unidirectional profiles of the two-dimensional Point Spread Functions needed to compute the Modulation Transfer Function for each of the three raw color planes individually in that one direction.  Which are obtained by taking the Fourier transform of their individual LSFs, resulting in \hat r,\hat g,\hat b[2].

Adding Them up to Obtain a System MTF

We know from the preceding post that the grayscale, luminance signal’s Fourier transform for a Bayer CFA array is related to the Fourier transform of the single raw color planes by the following relationship[3]

(1)   \begin{equation*} \hat L = 0.25\hat r + 0.5\hat g + 0.25\hat b \end{equation*}

In a linear system what is valid in two dimensions is valid in one.   We also know from there that when the subject is hueless the baseband  luminance information in the frequency domain is uncorrupted by color and at its best, therefore accurate as-is in the raw data once white balanced.

Therefore with a few simplifying assumptions that have mainly to do with ignoring phase we come up with the answer to the question posed a couple of posts ago: as far as the measurement of linear spatial resolution off a white balanced raw data capture by a Bayer CFA sensor of a neutral slanted edge under a uniform illuminant is concerned, MTF50s obtained from the three separate raw color planes in isolation can be added in the following proportions to produce an accurate MTF50 system grayscale composite result:

(2)   \begin{equation*} MTF_L = 0.25 MTF_r + 0.5 MTF_g + 0.25 MTF_b \end{equation*}

where ‘L’ stands for luminance, also referred to as luma Y.  With some limitations this is true for the full MTF curves, even though at MTF50 it is more likely that the underlying assumptions will hold.

This result is not surprising if looked at from an Information Science perspective: it indeed represents a correct proportion of the spatial information collected in the r,g,g,b raw data.  Superposition in the spatial domain would also suggest this result.

That’s it, if you are satisfied you can stop reading here.  The rest of the article is for the doubting Thomases amongst us, where I attempt to show that color science does not need to be involved at all  if one is working off white balanced raw image data of a neutral test target.

What About HVS Color Sensitivity?

In this article capitalized Luminance refers to the photometric quantity in cd/m^2, while luminance with a lowercase ‘l’ means the spatial map of linear image intensity Y (also known as luma Y’ when gamma encoded).

In a fully linear imaging system Luminance from the scene in cd/m^2 is supposed to be proportional to the Luminance in cd/m^2 striking the eyes of the viewer of the photograph.  In the days of black and white photography and TV that was self-evident.  When color came of age and the captured images started to be produced in standard (colorimetric) output RGB color spaces, different formulas were developed to attempt to recover the original luminance (or luma) channel  when unavailable:

    \begin{align*} Y_{NTSC} &= 0.2990R + 0.5870G + 0.1140B \\ Y_{HDTV} &= 0.2125R + 0.7154G + 0.0720B \\ Y_{UHDTV} &= 0.2627R + 0.6780G + 0.0593B \\ Y_{D850_{raw}->XYZ_{D50}} &= 0.2463R + 0.9022G - 0.1485B \\ \end{align*}

The various coefficient choices seek to estimate from data in a standard output RGB color space the original full resolution grayscale intensity proportional to Luminance from the scene.   They attempt to reflect the fact that we perceive red and green as substantially brighter than blue, that phosphors in analog TVs were close to certain primaries and the primaries of modern LCD panels can be close to others, that the response of output devices were nonlinear and required gamma corrections while today’s may not – but they are all compromises and a distant second best to having access to the actual original grayscale information proportional to Luminance.

Color Need Not Apply

The fact is, for our purposes1 we do have the original full resolution luminance information: it is the white balanced, grayscale, baseband image in the otherwise pristinely linear raw data, fully proportional to scene Luminance.

Once the hueless, neutral edge under uniform illumination is captured and the relative raw data is white  balanced, each of the four raw channels (r,g1,g2,b) is effectively gray and can be a proxy for luminance: twice the luminance, twice the mean values; half the luminance, half the mean values; and so on all across the edge profile.

And because the system is supposed to be linear, so it should be all the way to the output color space.  If a pixel – any pixel, independently of its color heritage – has an intensity value of 1/10th of full scale in the white balanced raw data, it should ideally have an intensity of 1/10th of full scale in sRGB before gamma is applied.  Ideally zero maps to zero, 100% maps to 100% and all other neutral values in between fall into place linearly.   If a hueless subject is white balanced in the raw data with r=g1=g2=b it should be neutral in sRGB with R=G=B, and vice versa.  Recall that the linear colorimetric RGB color image and the raw CFA image are supposed to share the same luminance channel Y[4].

Of course pixels sitting under the r, g or b Bayer CFA filters will have different characteristics because they are the result of physically different wavelengths and quantities of photons. These differences only mean different noise and spatial properties in the three planes.  We do not care about different noise because within these  limits the slanted edge method is relatively insensitive to it; and the different spatial properties are what we wish to measure.

WB Raw = Most Accurate Grayscale Image for This

So the white balanced undemosaiced raw data mosaic represents the most accurate grayscale image for our purposes1 ever.  It’s the one Y that we want and what all those formulas above are trying to reengineer back-to-front from output RGB data:

(3)   \begin{equation*} Y_{grayscale} = \text{white-balanced raw data intensity} \end{equation*}

with the raw data under the four CFA color channels as laid out on the sensor.  In this hueless context we already have luminance Y in the raw data for free, just as Dubois and Alleysson explained[3][4].  Here is one example of full resolution D610 rggb raw data displayed as-is, after white balancing and nothing else:

Figure 5. The full resolution raw data was unpacked by DCRAW -D -4, white balanced off the forehead and saved losslessly to 8 bit PNG with sRGB’s gamma (10MB download).

It’s part of the B&W back cover of a magazine (I trust it’s fair use, if not I’ll remove it), illuminated with a halogen light and white balanced off the forehead.  There was also some light coming in from outdoors.

It was captured slightly out of focus on purpose because moirè off the fabric and the printing process was otherwise drawing lines all over the page.  The objective is to show the untouched grayscale Y channel in the raw data.  There is some slight pixelation locally where the white balance  begins to be less representative of neutral; for instance the magazine paper and the white paper it is resting on have different white points.

Of course the grayscale image breaks down in the upsidedown ColorChecker color patches, which are not hueless hence outside the perimeter of this article1.  To drive the point home here is another example of the grayscale image in the untouched raw data, this time a Sony a7II ISO200  capture of’s studio scene, just the rggb white balanced intensities as they were on the sensor:

Figure 6. The full resolution raw data was unpacked by DCRAW -D -4, white balanced on the third neutral ColorChecker square from the right  and saved losslessly to 8 bit PNG with sRGB’s gamma (13MB download)

In the neutral areas of the scene the full resolution rggb CFA raw data is representative of luminance Y as-is.  In the color areas it is not but that does not concern us here.


Therefore in my opinion the most accurate way to obtain System MTF performance from MTF readings of the three individual r,g,b raw color planes – from raw captures of neutral (hueless,achromatic) slanted edges under a uniform illuminant by Bayer CFA digital cameras with the objective of judging the linear spatial resolution (‘sharpness’) of photographic equipment – is one part each r and b plus two g, as shown in formula (2):

MTF50_Y = 0.25 MTF50_r + 0.5 MTF50_g + 0.25 MTF50_b

All other coefficient choices seem to be poorer approximations when working directly off the raw data in this context1, especially when one realizes that the rendered output RGB color image and the raw CFA image are supposed to share the same luminance channel Y[4].

Corrections, suggestions and challenges are welcome.


Notes and References

1. In this article ‘the context’ or ‘the purpose’ will mean raw captures of neutral (hueless,achromatic) slanted edges under a uniform illuminant by Bayer CFA digital cameras for the purposes of measuring by the slanted edge method the linear spatial resolution (‘sharpness’) of photographic equipment.
2. Taking the Fourier Transform of the Line Spread Function is equivalent to applying the Fourier Slice Theorem to the Radon Transform of the system’s two dimensional Point Spread Function. It results in a radial slice of the two dimensional Fourier Transform of the two dimensional PSF in the direction of the edge normal.
3. Frequency-Domain Methods for Demosaicking of Bayer-Sampled Color Images. Eric Dubois. IEEE SIGNAL PROCESSING LETTERS, 2005, 12 (12), p. 847.
4. Frequency selection demosaicking: A review and a look ahead. D. Alleysson and B. Chaix de Lavarène, Proc. SPIE 6822, 68221M, 2008, Section 2, Spatio-Chromatic Model of CFA Image.


One thought on “COMBINING BAYER CFA MTF Curves – II”

  1. Jim Kasson Wrote:
    These are not different approximations. These are different because the color space primaries are different, and thus conversions to 1931 CIE XYZ yields different Y. If the standard observer is right, they are right. Of course, the standard observer does not take adaptation and spatial effects into account. That’s where the guesstimation occurs.

    Makes sense, thanks Jim. I’ve corrected the text to reflect that.

Comments are closed.