Bayer CFA Effect on Sharpness

In this article we shall find that the effect of a Bayer CFA on the spatial frequencies and hence the ‘sharpness’ information captured by a sensor compared to those from the corresponding monochrome version can go from (almost) nothing to halving the potentially unaliased range – based on the chrominance content of the image and the direction in which the spatial frequencies are being stressed.

A Little Sampling Theory

We know from Goodman[1] and previous articles that the sampled image (I_{s} ) captured in the raw data by a typical current digital camera can be represented mathematically as  the continuous image on the sensing plane (I_{c} ) multiplied by a rectangular lattice of Dirac delta functions positioned at the center of each pixel at coordinates (x,y):

(1)   \begin{equation*} I_{s}(x,y) = I_{c}(x,y) \cdot comb(\frac{x}{p}) \cdot comb(\frac{y}{p}) \end{equation*}

with the comb functions representing the two dimensional grid of delta functions, sampling pitch p apart horizontally and vertically.

To keep things simple the sensing plane is considered here to be the imager’s silicon itself, which sits below microlenses and other filters so the continuous image I_{c} is assumed to incorporate their as well as pixel aperture’s effects.

Because spatial domain multiplications become convolutions in the frequency domain, the Fourier Spectrum (|F_{s}|) of sampled image I_{s} is just the magnitude of the Fourier Transform (F_{c}) of the continuous image I_{c} convolved with the Fourier Transform of the comb functions, which turn out to also be comb functions albeit of inverse spacing, as shown below

(2)   \begin{equation*} |F_{s}(f_{x},f_{y})| = |F_{c}(f_{x},f_{y}) \ast\ast [p^2 comb(pf_{x}) comb(pf_{y})]| \end{equation*}

with \ast\ast indicating two dimensional convolution and p sampling pitch (pixel pitch here).  We saw what the Spectrum looked like with a perfect lens and pixel aperture in the article on Aliasing:

Figure 1. Convolution of spectrum of continuous perfect lens and pixel aperture  (|Fs|) with 2D Dirac delta combs.

Monochrome Spectrum

This time I am going to show the magnitude of the Discrete Fourier Transform of a typical test target captured by a monochrome digital camera in 2D as an image, with brightness representing  the energy of the relative spatial frequency:

Figure 2. Left: Siemens star target as captured by a Leica Monochrome Typ 216 in the raw data.  Right: Spectrum of the Discrete Fourier Transform of the image on the right.

On the left is the grayscale rendition of a Siemens star target captured by the fine folks at DPReview.com with a Leica Monochrome Typ 216 at base ISO.  On the right is the linear magnitude of the DFT of the same raw capture as performed by Matlab/Octave, otherwise known as its two dimensional Fourier Spectrum, shown as an image.

It is difficult to see what’s going on above right because of the very large energy excursion involved, so normally the logarithm of the Spectrum is shown instead – keeping in mind that this approach tends to overemphasize low energy effects.   Below the Spectrum in Figure 2 is displayed as the natural logarithm of (1+Spectrum):

Figure 3. ln(1+Spectrum) of image in Figure 2.  Origin is top left.  The yellow lines represent Nyquist frequencies.

The horizontal and vertical units are cycles per pixel pitch with zero cycles/pitch (c/p) top left and one c/p at the other three corners: (0,0), (1,0), (1,1), (0,1) clockwise for linear spatial frequencies (f_{x},f_{y}) in the x and y directions respectively.  The monochrome Nyquist frequencies beyond which aliasing can occur are then half way down the top, bottom left and right edges of Figure 3, corresponding to the yellow lines at 0.5 c/p horizontally and vertically.

The DFT routine only shows one period of the baseband Spectrum but, mentally, tile Figure 3 vertically and horizontally (almost) forever because the function is actually infinitely periodic, per equation (2).    The tiling is caused by the orthogonal Dirac delta combs that effectively modulate baseband out at cycles per sampling pitch spacing, as seen in the repeating ‘tent’ pattern in Figure 1.  When you do that it becomes sometimes useful to look at the baseband Spectrum with the spatial frequency origin (0,0) shifted to the center of the relative spectrum image, pretend that we are looking down onto a single tent in Figure 1:

Figure 4.  Baseband spectrum in Figure 3, with the origin in the center.  0.5 c/p Nyquist frequencies run along the edges of the image.

Now the origin of the (f_{x},f_{y}) spatial frequencies is in the center of the Spectrum image, the four corners representing (-0.5,-0.5), (0.5,-0.5), (0.5,0.5), (-0.5,0.5) c/p clockwise starting top left.  The monochrome Nyquist frequencies therefore run along the yellow lines now  along the four edges.  It is clear that as long as all the energy of the spatial image fits within the Nyquist Frequency boundaries there will be no aliasing.  On the other hand it is obvious by looking at Figure 3 that this is not the case here because some of the Spectrum clearly extends into neighboring quadrants, therefore exceeding Nyquist.  Ignore the ‘reflection’ of rays at the edges of the Spectrum in Figure 4, they are an artifact due to having performed the DFT without any padding (tsk, tsk).

The Impact of a Bayer CFA

So that is the story for a sampled image I_s produced by a monochrome digital sensor.  What impact will the introduction of a Bayer Color Filter Array have on the Spectrum of a current digital camera all else equal?   Linearity and superposition apply so one way to look at the CFA image is to pretend that we are actually sub-sampling four separate continuous images, one per color plane, at a spacing of every other monochrome pixel according to the Bayer layout.  We can then estimate the spectrum of each sub-sampled image separately – and add the individual results up to obtain the spectrum of the CFA image.

Figure 5. One way to think of the effect of a Bayer CFA on the Spectrum captured by  a sensor is to consider separate color planes sampled every other pixel.  Here the two G channels are shown superimposed. Image courtesy of Cburnett, text modified.

Each sub-sampled color plane has half the linear pixels of the fully populated monochrome sensor so sampling pitch (p) is apparently doubled for a Bayer CFA sensor compared to monochrome .  Twice the pitch means convolving the baseband spectrum with combs with half the spacing between the Dirac deltas in equation (2), therefore halving the Nyquist frequency.  To verify this intuition we can take a look at the spectrum of a Siemens star, again captured in the raw data by the fine folks at DPReview.com, this time with a Bayer Color Filter Array Nikon D7200 at base ISO.  It is shown below with the same spatial frequency scale as in Figure 4.  Note pixelation in the spatial image to the left due to the different relative intensity in the four color planes resulting from the different Spectral Sensitivity Functions of each in the CFA:

Figure 6.  Left: Grayscale rendition of Siemens star captured in the raw file by DPReview.com with a Bayer CFA Nikon D7200 at base ISO.  Right: Spectrum of the raw file, computed as the magnitude of the DFT of the linear image to the left as-is..

Comparing Figure 6 to Figure 4, the spectrum of the Bayer CFA D7200 does indeed appear to have components packed twice as tightly as the Monochrome Typ 216, reducing by half the potentially pristine unaliased baseband signal.  Nyquist here seems to occur at -1/4 and 1/4 cycles per monochrome pitch, versus -1/2 and 1/2 before.

However, this intuitive explanation is somewhat dissatisfying.  For instance, why couldn’t a Bayer CFA sensor behave more like a monochrome sensor if the subject were neutral and the raw data properly white balanced before demosaicing?

A Frequency Domain Bayer CFA Model

David Alleysson[2] and, subsequently, Eric Dubois[3] came up with a mathematical model that clearly explains the  effect of color on a Bayer CFA sampled image in the frequency domain, their papers are worth a read.  In this article I will use Dubois’ notation which I find a little easier to follow.

Their insight is based on assuming three fully populated planes (i.e. not subsampled, each the size of the equivalent full size monochrome sensor), receiving exactly the same light from the scene, each behind a large color filter with spectral sensitivity equivalent to that of the respective R, G or B Color Filter Array on the digital sensor, as shown below left.

Figure 7.  Another way to model the impact of a Bayer CFA on the Spectrum captured by a digital sensor is to consider three full resolution images, each under one of the three color filters of a typical CFA, and to separate them into full resolution grayscale and chrominance components.

Following the simple math in Dubois’ letter we can see that the image information captured in the raw data by a Bayer CFA sensor can be expressed as follows in the spatial domain:

(3)   \begin{equation*} \begin{align*} CFA_{(x,y)} &= L_{(x,y)} \\ &+ C_{1(x,y)}\cdot e^{i\frac{2\pi(x+y)}{2}}\\ &+ C_{2(x,y)}\cdot (e^{i\frac{2\pi x}{2}}- e^{i\frac{2\pi y}{2}}) \end{align*} \end{equation*}

where:

  • CFA is the Bayer CFA image captured in the raw data
  • (x,y) are the coordinates of the center of each pixel in the full size image on the sampling lattice, 0,1,2,3… #samples horizontally and vertically resp.
  • L is the input-referred full resolution baseband ‘luma’ component of the image on the sensing plane,  defined below
  • C_{1} and C_{2} are the two full resolution ‘chrominance’ components defined below
  • e are the exponential terms responsible for the checkered look of the Bayer CFA image; the 2 in the denominator indicates half the ‘sampling’ frequency as (x,y).

L, C_{1} and C_{2} are input-referred and defined in terms of full resolution R, G and B raw color plane data as follows:

(4)   \begin{equation*} \begin{align*} L_{(x,y)} &= +\tfrac{1}{4}R_{(x,y)} +  \tfrac{1}{2}G_{(x,y)} + \tfrac{1}{4}B_{(x,y)}\\ C_{1(x,y)} &= -\tfrac{1}{4}R_{(x,y)} +  \tfrac{1}{2}G_{(x,y)} - \tfrac{1}{4}B_{(x,y)}\\ C_{2(x,y)} &= -\tfrac{1}{4}R_{(x,y)} + \tfrac{1}{4}B_{(x,y)} \end{align*} \end{equation*}

Full Res Grayscale, Subsampled Chrominance

By full resolution I mean not sub-sampled – but with the same pixel layout and resolution as an equivalent monochrome sensor.  As Dubois says, after taking the Fourier Transform of each term in Equation (3) the image in each quadrant of the frequency domain ‘can be interpreted as a baseband luma component , a chrominance component modulated at the spatial frequency (0.5, 0.5), and a second chrominance component modulated at the two spatial frequencies (0.5, 0) and (0, 0.5), where spatial frequencies are expressed in cycles per pixel…’ , not too far from our earlier intuition.

Going from the spatial to the frequency domain, i.e. taking the Fourier Transform of Equation (3), e^{i2\pi x k} becomes a delta function at k and products become convolutions so it is clear that the exponential terms are responsible for performing the mentioned modulation, hence for the replicas of C_{1} at the corners and C_{2} at the cardinal points of the 2D spectrum.

Figure 8.  Same as Figure 6, with labels for the Fourier Transform of the luma (L) and chrominance (C1, C2) components.

This is interesting because it suggests that mathematically a Bayer CFA image consists of a full resolution baseband luma component L with Nyquist frequency at \frac{1}{2} monochrome c/p similarly to the monochrome sensor – and only the additive chrominance components C_{1} and C_{2} are sub-sampled at twice the pitch.  As a result the chrominance components  get modulated out to the equivalent monochrome Nyquist frequencies and potentially corrupt the otherwise pristine baseband signal, possibly halving monochrome Nyquist and the useful frequency range depending on their energy, as shown in Figure 8.

A Bayer CFA raw file contains a full resolution baseband luma image L because of the correlation between adjacent color pixels, which our earlier thought experiment ignored.

To Subsample or not to Subsample

The other insight comes from Equation (4), where we can easily see that the sub-sampling function of the e terms in Equation (3) on the C_{1} and C_{2} full resolution images is immaterial when C_{1} and/or C_{2} are equal to zero.

For instance it is obvious that C_{2} disappears whenever R = B, that is when those two color channels have the same intensity.  I simulated this case by taking the full resolution raw data of the Monochrome Typ 216 and applying a factor of 0.6 to the pixels that would have corresponded to the R and B channels in a Bayer CFA file – a common scenario in raw data captured by current digital cameras in daylight.  Below to the left you can see the demosaiced image, to the right the Spectrum of the mosaiced CFA as described.  Note the missing C_2 components at the cardinal points:

Figure 9.  Left: The Siemens Star monochrome image in Figure 2 with assumptive R and B channels multiplied by 0.6 and then demosaiced.  Right: The spectrum of the relative CFA image.  Note the missing C2 replicas at the cardinal points.

The same will be true anywhere the R and B planes are equal.

On the other hand C_{1} disappears when the sum of the two G channels is equal to the sum of R and B.   In this case I applied factors of 1.4 and 0.6 to  the R and B planes respectively, note the missing C_1 components at the corners of the spectrum:

Figure 10. Left: The Siemens Star monochrome image in Figure 2 with assumptive R channel multiplied by 1.4 and the B channel multiplied by 0.6 and then demosaiced.  Right: The spectrum of the relative CFA image. Note the missing C1 replicas in the corners and that 1.4+0.6 = 2.

The same will be true anywhere the sum of the two G planes is equal to the sum of R and B.  Here for instance factors of 0.7 and 1.3 respectively were applied instead, to the same effect:

Figure 11. Left: The Siemens Star monochrome image in Figure 2 with assumptive R channel multiplied by 0.7 and the B channel multiplied by 1.3 and then demosaiced.  Right: The spectrum of the relative CFA image. Note the missing C1 replicas in the corners and that 0.7+1.3 = 2.

Of course both C_{1} and C_{2} chrominance components disappear when R = G = B, the case where the subject is neutral and the color planes each see the same intensity,  Here the baseband luma component is left alone, uncorrupted by chrominance crosstalk:

Figure 12. Left: The grayscale Siemens Star monochrome image in Figure 2 with assumptive R, B and G channels multiplied by 1 and then demosaiced.  Right: The spectrum of the relative Bayer CFA image.  Note the absence of C1 and/or C2 components.

What about in practice?

That’s how it works in theory.  In the real life Bayer CFA Spectrum of the earlier D7200 capture in Figure 6, white balancing the raw data before demosaicing effectively suppressed C_{1} and C_{2} as shown below in Figure 13.  There’s the approximation of the full resolution baseband luma image L present in the raw data of a Bayer sensor (don’t forget that what is actually shown is the natural logarithm of the magnitude of the Discrete Fourier Transform, which tends to overemphasize low level signals).

Figure 13. Left: Grayscale rendering of a Siemens Star captured by the Bayer CFA D7200 in Figure 6, white balanced and then demosaiced.  Right: The natural logarithm of the Spectrum of the white balanced Bayer CFA.

White balancing the raw data was not able to completely eliminate C_{1} and C_{2} because of residual slight differences in the information collected by each channel, some physical and some due to non-idealities in the system.  For instance shot noise, a non mono-chromatic light source, differences in the amount of noise, diffraction, chromatic aberrations – and non-uniformities in the sensor.

If you would like to see a couple of example black and white images of undemosaiced white balanced raw data you can take a look at those in an earlier article with a less formal take on this subject.

How a Bayer CFA Affects Sharpness

In conclusion we have seen that the effect of a Bayer CFA on the spatial frequencies and hence the ‘sharpness’ information captured by a sensor compared to those from the corresponding monochrome version can go from (almost) nothing to halving the potentially unaliased range, based on the chrominance content of the image and the direction in which the spatial frequencies are being stressed.

This kind of analysis was responsible for a flurry of papers on frequency domain demosaicing which were state-of-the-art about a decade ago and still pretty good today.

 

Appendix

1) What do L, C_1 and C_2 represent?

L, C_{1} and C_{2}, the luma and chrominance components under discussion,  make up a linear space.    Together I consider them an input-referred linear space because they represent the performance of the hardware directly, before any psychovisual weighting.  As such there is no one-for-one correspondence to standard output-referred Color Science terms like photometric luminance and chromaticity from the colorimetric XYZ color space, that are instead weighted by CIE functions purported to represent the response of the Human Visual System.

Of course unless the reference monochrome sensor had a spectral response exactly equal to \bar{y}(\lambda), which is virtually equal to the photopic luminosity function, its baseband response would also not correspond exactly to luminance.

In the Alleysson and Dubois papers luma L is defined to be exactly equal to 0.25r + 0.5g +0.25b,  with r,g,b representing white balanced raw data from a Bayer CFA sensor for the given illuminant.  On the other hand the estimate of photometric luminance Y_{D65} for the Nikon D7200 treated in this page is the result of a compromise which I estimated off a ColorChecker 24 to be approximately equal to 0.23r + 0.93g – 0.16b.

Since LC_1C_2 and XYZ are both color spaces linked linearly to rgb, they are also linearly related to each other by simple matrix projection.  With a neutral subject, that is when r=g=b and therefore C_1=C_2=0, baseband grayscale image L is equally legitimate to Y and similar in both spaces, as explained in part 3) below.

What space is more appropriate for resolution evaluations is a good question.  I would argue that both have their place depending on context – and in fact perhaps there is a yet-to-be-defined third space more relevant to perceptual sharpness measurements.  Until then in my opinion LC_1C_2 is a convenient input-referred space useful for investigating the effects of imaging hardware on resolution.

2) LC_1C_2 Space Conversion Matrices

Since both LC_1C_2 and XYZ are linear spaces linearly related to linear white balanced raw data rgb, we can obtain matrices to convert from one space to the other.

(5)   \begin{equation*} \left[ \begin{array}{c} L \\ C_1 \\ C_2 \end{array} \right] = \begin{bmatrix} 0.25 & 0.50 & 0.25 \\-0.25 & 0.50 & -0.25 \\ -0.25& 0 & 0.25\end{bmatrix} \times \left[ \begin{array}{c} r \\ g \\ b \end{array} \right] \end{equation*}

(6)   \begin{equation*} \left[ \begin{array}{c} r \\ g \\ b \end{array} \right] = \begin{bmatrix} 1 & -1 & -2 \\ 1 & 1 & 0 \\ 1& -1 & 2\end{bmatrix} \times \left[ \begin{array}{c} L \\ C_1 \\ C_2 \end{array} \right] \end{equation*}

On the other hand the following compromise matrices only apply approximately to the Bayer CFA Nikon D7200 in the conditions tested here:

(7)   \begin{equation*} \left[ \begin{array}{c} X_{D65} \\ Y_{D65} \\ Z_{D65} \end{array} \right] \approx  \begin{bmatrix} 0.5501 & 0.3039 & 0.0914 \\ 0.2329 & 0.9264 & -0.1593 \\ 0.0264 & -0.1745 & 1.2282 \end{bmatrix} \times \left[ \begin{array}{c} r \\ g \\ b \end{array} \right] \end{equation*}

Assuming the raw data was the result of a capture under illuminant D65, substituting rgb from Equation (6) into Equation (7) we obtain the compromise matrix for conversion from LC_1C_2 to XYZ_{D65}:

(8)   \begin{equation*} \left[ \begin{array}{c} X_{D65} \\ Y_{D65} \\ Z_{D65} \end{array} \right] \approx \begin{bmatrix} 0.945 & -0.338 & -0.918 \\ 1.000 & 0.853 & -0.784 \\ 1.080 & -1.429 & 2.404 \end{bmatrix} \times \left[ \begin{array}{c} L \\ C_1 \\ C_2 \end{array} \right] \end{equation*}

Inverting Equation (8), we obtain the compromise matrix  from XYZ_{D65} to LC_1C_2:

(9)   \begin{equation*} \left[ \begin{array}{c} L \\ C_1 \\ C_2 \end{array} \right] \approx \begin{bmatrix} 0.225 & 0.514 & 0.253 \\ -0.787 & 0.790 & -0.043 \\ -0.569 & 0.239 & 0.277 \end{bmatrix} \times \left[ \begin{array}{c} X_{D65} \\ Y_{D65} \\ Z_{D65} \end{array} \right] \end{equation*}

Finally for completeness we can take LC_1C_2 from the Nikon D7200 in this condition to two widely used linear colorimetric output RGB spaces, sRGB and Adobe RGB:

(10)   \begin{equation*} \left[ \begin{array}{c} R_{sRGB} \\ G_{sRGB} \\ B_{sRGB} \end{array} \right] \approx \begin{bmatrix} 1.00 & -1.69 & -2.97 \\ 1.00 & 1.87 & -0.48 \\ 1.00 & -1.70 & 2.65 \end{bmatrix} \times \left[ \begin{array}{c} L \\ C_1 \\ C_2 \end{array} \right] \end{equation*}

(11)   \begin{equation*} \left[ \begin{array}{c} R_{aRGB} \\ G_{aRGB} \\ B_{aRGB} \end{array} \right] \approx \begin{bmatrix} 1.00 & -1.42 & -2.84 \\ 1.00 & 2.01 & -0.48 \\ 1.00 & -1.82 & 2.76\end{bmatrix} \times \left[ \begin{array}{c} L \\ C_1 \\ C_2 \end{array} \right] \end{equation*}

For obvious reasons the first columns will always be ones while the second and third columns will vary with the camera and the illuminant.

3) Is luma L the Baseband Grayscale Image?

Assume that a neutral subject in daylight projects image I on a digital camera’s sensing plane, I representing the photometric luminance distribution of the subject.   A monochrome sensor at the sensing plane captures an approximation proportional to I in its raw file, call it I_M.  It is an approximation because the monochrome sensor’s response is not in practice the same as the photopic luminosity function that determines photometric luminance.

Now cover the monochrome sensor with a Bayer Color Filter Array.  After white balancing the captured raw data so that r=g=b in a uniform patch of the neutral subject, it also stores an approximation proportional to I in its file, call it I_B.

It is then clear from Equation (5) that L=I_B and C_1=C_2=0.  From Equation (8) Y=L=I_B.  Equations (10) and (11) confirm that, everything being linear, in the output RGB spaces a neutral subject results in R=G=B=L and therefore I_B.

So with a neutral input, the chrominance components disappear and luma image L represents a valid approximation of achromatic baseband image I throughout the chain – just as legitimate, when rendered in grayscale, as those reproduced in XYZ, RGB or any other space by the monochrome sensor, as evidenced by such renditions of the B&W magazine pictures in the previous article.

Notes and References

1. Introduction to Fourier Optics 3rd Edition, Joseph W. Goodman, p. 22.
2. Linear demosaicing inspired by the human visual system. David Alleysson, Sabine Susstrunk, Jeanny Herault. IEEE Transactions on Image Processing, Institute of Electrical and Electronics Engineers, 2005, 14 (4), pp.439-449.
3. Frequency-Domain Methods for Demosaicking of Bayer-Sampled Color Images. Eric Dubois. IEEE SIGNAL PROCESSING LETTERS, 2005, 12 (12), p. 847.
4. Adaptive Filtering for Color Filter Array demosaicing. IEEE Transactions on Image processing, vol. 16, no. 10, October 2007, Nai-Xiang Lian, LanlanChang, Yap-PengTan, and VitaliZagorodnov.
5. Eric Dubois CFA Demosaicing page.
6. I was introduced to this line of thinking by Cliff Rames in this fascinating thread with contributions by star physicist and AMAZE demosaic creator Emil J. Martinec.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.