Linear Color Transforms

Building on a preceeding article of this series, once demosaiced raw data from a Bayer Color Filter Array sensor represents the captured image as a set of triplets, corresponding to the estimated light intensity at a given pixel under each of the three spectral filters part of the CFA. The filters are band-pass and named for the representative peak wavelength that they let through, typically red, green, blue or $r$ , $g$ , $b$ for short.

Since the resulting intensities are linearly independent they can form the basis of a 3D coordinate system, with each $rgb$ triplet representing a point within it. The system is bounded in the raw data by the extent of the Analog to Digital Converter, with all three channels spanning the same range, from Black Level with no light to clipping with maximum recordable light. Therefore it can be thought to represent a space in the form of a cube – or better, a parallelepiped – with the origin at [0,0,0] and the opposite vertex at the clipping value in Data Numbers, expressed as [1,1,1] if we normalize all data by it.

Figure 1. The linear sRGB Cube, courtesy of Matlab toolbox Optprop.

The job of the color transform is to project demosaiced raw data $rgb$ to a standard output $RGB$ color space designed for viewing. Such spaces have names like $sRGB$ , $Adobe RGB$ or $Rec. 2020$ . The output space can also be shown in 3D as a parallelepiped with the origin at [0,0,0] with no light and the opposite vertex at [1,1,1] with maximum displayable light.

The projection is often expressed as follows, with $M'$ the needed color transformation

(1) $\begin{equation*} RGB \approx M'* rgb \end{equation*}$

The convention in Color Science is for the input and output data to be in column-vector format, i.e. 3xN, with N the number of tone triplets, as shown below. At its simplest the transform $M'$ is a 3×3 matrix made up of 9 coefficients $c$ related to the camera+lens, the scene, the light source and the output color space. In such case * represents matrix multiplication and linear algebra applies: just matrix multiply demosaiced raw data triplets $rgb$ by the appropriate matrix $M'$ to obtain image intensities in the $RGB$ color space of choice.

(2) $\begin{equation*} \left[ \begin{array}{c} R \\ G \\ B \end{array} \right] \approx \begin{bmatrix} c_{11} & c_{12} & c_{13} \\ c_{21} & c_{22} & c_{23} \\ c_{31} & c_{32} & c_{33} \end{bmatrix} \left[ \begin{array}{c} r \\ g \\ b \end{array} \right] \end{equation*}$

In this article we will break down the transform into its components using as an example a sunny 5300K (D53) capture of a ColorChecker 24 target by the Nikon D5100 camera and lens featured recently.

Step by Step

If we had a set of raw data from our camera/lens and respective reference values in the colorimetric color space of choice we could solve for the transform directly as will be shown below. However, it is useful to first break down the process into its components to distill it into its essence and to better understand the factors at play.

Thanks to the associative property, we know that single matrix $M'$ can be split into the product of a number of 3×3 matrices representing every step of color conversion, from raw to $RGB$ . For a given camera and lens, demosaiced raw data $rgb$ is in sequence:

white balanced by the illuminant-dependent matrix containing the multipliers on the diagonal, $diag(K_{rgb}$ );^[1]
projected to $XYZ$ by the appropriate scene-and-illuminant-dependent compromise color matrix $M_{wbr2XYZ}$ (or simply $M$ for consistency with previous articles);
Chromatically Adapted from the capture’s illuminant (here D53) White Point to that of the viewer (say D65) by matrix $M_{CA}$ ;
projected to $RGB$ , the output color space (for example $sRGB$ ), by standard matrix $M_{XYZ2RGB}$

This can be expressed as follows

(3) $\begin{equation*} M' = M_{XYZ2RGB}*M_{CA}*M_{wbr2XYZ}* diag(K_{rgb}) \end{equation*}$

The sequence of multiplication appears to be in reverse order because image intensities in Color Science are conventionally expected to be column vectors arranged in a 3xN array as shown in Equation (2).^[2] We will refer to the projection $M_{wbr2XYZ}$ from demosaiced, white balanced raw data to the $XYZ$ color space in Step 2 as $M$ for the rest of the article in order to simplify notation and because that’s what we have been calling it in this series.

Note that the final matrix to the chosen output $RGB$ color space in step 4 is constant (we can lift it from Bruce Lindbloom’s site^[9]) but the other three depend on the illuminant.

Pre-Cooked Transforms (M)

Assuming we knew the characteristics of the light source, however, we should be able to obtain the white balance multipliers in step 1 for the given camera and lens – for instance from the camera’s own estimate or from a gray card – and to calculate the chromatic adaptation matrix in step 3.^[8] Therefore, for a given illuminant, camera and lens the only remaining unknowns would be the 9 coefficients in the 3×3 matrix of step 2, the Compromise Color Matrix $M$ that projects white-balanced raw data to $XYZ$ in the given conditions. In addition to the above, these also require knowledge of the scene.^[3]

If the camera/converter had a number of precomputed white balanced raw to $XYZ$ transforms for different scenes and illuminants, the main input variable in full-trip conversion from raw to output color space would be the illuminant: for a given scene figure out the illuminant, select the appropriate transform $M$ and the full-trip conversion $M'$ falls into place.

And so it is, every camera (and raw converter) comes indeed loaded with a set of pre-cooked transforms computed for its specific Spectral Sensitivity Functions as setup, each to be used with a particular scene and illuminant combination.

Solving for M by Linear Regression

To determine 3×3 matrix $M$ we typically capture in the raw data of the camera that we wish to characterize a known target under a known illuminant. We attempted such a feat less formally in the Forward Matrix article, it may be useful to give it a quick glance before reading on.

The target presents a number of representative diffuse reflectances, the more representative of the scenes we are planning to shoot the better. For general photography the 24 patches of a ColorChecker 24 can be unexpectedly effective.^[4]

Since there are typically more equations than unknowns by design, the problem is overdetermined. In linear algebra it is usually shown in the following form

(4) $\begin{equation*} Ax=b \end{equation*}$

which presumes data in row-vector format, i.e. Nx3 (also the format used by Matlab/Octave for vectorization). Compared to the notation used in matrix multiplication and Color Science therefore $A$ would correspond to demosaiced and white-balanced raw data $rgb^T$ , $b$ to the relative intensity in $XYZ^T$ and $x$ to $M^T$ , with $^T$ indicating the transpose of the relative array. Restating Equation (1) in the form of (4) we have for the problem at hand

(5) $\begin{equation*} rgb^T * M^T\approx XYZ^T \end{equation*}$

So for $rgb$ and reference $XYZ$ in row-vector format the resulting matrix $M^T$ is the transpose of that expected by Color Science. The Normal Equation provides a unique least-squares solution to the overdetermined problem as stated. In our case

(6) $\begin{equation*} M \approx [inv(A^T*A)*A^T*b]^T \end{equation*}$

with $inv$ the inverse, $^T$ the transpose of the relative array and $M$ referenced to Color Science friendly 3xN column-vector data. For instance if $rgb$ is the BabelColor30 CC24 target ‘scene’ captured in the raw data;^[5] by the Nikon D5100 with the mystery lens featured in the last few articles;^[6] under a daylight illuminant of correlated color temperature around 5300 degrees Kelvin, call it D53; the resulting matrix $M$ would be

with a mean residual CIEDE2000 error of 1.093 over the 18 color patches plus mid-gray patch 22. This matrix projects white balanced raw data to the $XYZ$ color space with white point equal to [0.9559 1.0000 0.8763], as can be gathered by multiplying it by the white balanced raw triplet corresponding to normalized maximum diffuse white, $[1 1 1]^T$ – equivalent to summing its rows.

Solving for M with an Optimization Algorithm

The least-square solution minimizes the sum of the squares of the differences between the entries on either side of the equal sign of Equation (5) with equal weight, which is often not ideal because the Human Visual System is more sensitive to some colors than others. We can fine tune the result by applying an efficient optimization algorithm (such as Matlab’s Genetic Pattern Search) to find more perceptually relevant solutions for $M$ . In this case I used fminunc to minimize the residual CIEDE2000 differences^[10]

Now mean CIEDE2000 over the 19 patches is a little bit better at 0.950, differences are often more marked. This matrix projects white balanced raw data to the $XYZ$ color space with white point [0.9587 1.0000 0.8802]. Here are the residual errors for each of the CC24 patches after optimization

Figure 2. Residual CIEDE2000 error by the selected transform in each of the patches of the BabelColor.com 30 database ColorChecker 24. D53 illuminant and Nikon D5100 plus mystery lens from Darrodi et al.

As can be seen in the code linked to in the notes, the cost function for the optimization routine is the mean CIEDE2000 of all of the 24 patches, 6 of which are varying intensities of gray. This ensures that the result does not stray too far from the appropriate White Point for the given illuminant.

Note however that neither white point of the last two matrices is exactly equal to the White Point of the illuminant, D53, which is instead [0.9593 1.0000 0.8833] in the 380-730nm range we are using in this series of articles with CIE (2012) 2-deg “physiologically relevant” CMFs. These are small errors in this case but they can be substantial in others, causing unwelcome color casts.

White Preserving Normal Equation

The solution is to correct the Normal Equation by adding a Lagrange multiplier in order to force the sum of the rows to be equal to the White Point of the illuminant. I used the formula in the appendix of the relative cic97 paper by Finlayson and Drew, ^[7] modified to allow specifying the White Point. You can find the code in a Matlab / Octave function referenced in the notes at bottom.

Figure 3. Code to modify the Normal Equation solution so that it will produce a white point preserving matrix. Data is in row-vector Nx3 format, so take the transpose of M to show it in Color Science.

The result is the least squares solution constrained to the White Point of the illuminant:

And in fact the sum of the rows now corresponds to illuminant D53’s White Point [0.9593 1.0000 0.8833], though this transform’s weakness is still that it does not take into consideration the sensitivity of the Human Visual System to different tones.

White Preserving Optimization

However, we can also set up the optimization algorithm to only look for solutions with the sum of the rows equal to the White Point. In fact this effectively reduces the number of unknowns in the 3×3 matrix from 9 to 6, since we are adding three equations specifying the weighted sums of the variables as three constants. Should you be interested you can see how this is accomplished in the Matlab code in the notes. Here is the relative White Preserved optimized matrix $M$

The sum of the rows correspond indeed to the D53 White Point, [0.9593 1.0000 0.8833]. By doing so we have lost a tiny bit of fidelity, the CIEDE2000 average difference over the chosen 19 patches is now 0.954 instead of 0.950, immaterial in this case (though it can be material in other cases). These are the residual errors for each of the CC24 patches after white point preserving optimization

Figure 4. Residual CIEDE2000 error by the selected transform in each of the patches of the BabelColor.com 30 database ColorChecker 24. D53 illuminant and Nikon D5100 plus mystery lens from Darrodi et al.

In this case the WPP optimization is trivially different from the freely obtained one – but this is not always the case.