Connecting Photographic Raw Data to Tristimulus Color Science

Absolute Raw Data

In the previous article we determined that the three r_{_L}g_{_L}b_{_L} values recorded in the raw data in the center of the image plane in units of Data Numbers per pixel – by a digital camera and lens as a function of absolute spectral radiance L(\lambda) at the lens – can be estimated as follows:

(1)   \begin{equation*} r_{_L}g_{_L}b_{_L} =\frac{\pi p^2 t}{4N^2} \int\limits_{380}^{780}L(\lambda) \odot SSF_{rgb}(\lambda)  d\lambda \end{equation*}

with subscript _L indicating absolute-referred units and SSF_{rgb} the three system Spectral Sensitivity Functions.   In this series of articles \odot is wavelength by wavelength multiplication (what happens to the spectrum of light as it progresses through the imaging system) and the integral just means the area under each of the three resulting curves (integration is what the pixels do during exposure).  Together they represent an inner or dot product.  All variables in front of the integral were previously described and can be considered constant for a given photographic setup.

The Connection to Luminance and Exposure

Radiance L_e represents the sum (or the integral) of the power contributed by  every small constant-width wavelength interval of spectral radiance L(\lambda) within the visible spectrum.  It has units of W/sr/m^2.

In Color Science and photography the corresponding photometric quantity  is Luminance L_v with watts W in the units of radiance replaced by lumens lm.  lumens are just watts as perceived by the Human Visual System, which has spectral efficiency V(\lambda), see Figure 1 below.    Since in the SI System lm/sr is a candela (cd), the total luminance L_v contained in the visible wavelength range is usually expressed in cd/m^2  – a unit photographers are familiar with:

(2)   \begin{equation*} L_v = K_m \int\limits_{380}^{780} L(\lambda)\odot V(\lambda) d\lambda, \text{      $cd/m^2$} \end{equation*}

where V(\lambda) is the Standard Observer photopic luminous efficiency function shown in Figure 1 below and K_m is the luminous efficiency conversion constant, about 683.002 lm/W.[1]

For instance, if radiance L(\lambda) were equal to a spectrally flat stimulus of 1 mW/sr/m^2/nm, Luminance L_v would be equal to 77.2 cd/m^2.

It is obvious that if one of the Spectral Sensitivity Functions in Equation (1) were equal to V(\lambda) then by definition the respective  integral would resolve to Luminance L_v.  Taking the area of a pixel p^2 out of the equation, we would have for the given channel

(3)   \begin{equation*} H_v = \frac{\pi}{4} \frac{t}{N^2}L_v, \text{      $lx$-$s$} \end{equation*}

which if L_v is in cd/m^2 is the equation for photographic Luminous  Exposure H_v in lxs – for a camera and lens in the center of the field of view, ignoring in this context vignetting and lens transmission losses since we said that L(\lambda) here represents radiance at the front of the lens .  This is the basis of the Exposure Value system.

For instance with shutter speed t = 1/30 of a second, f-number N = 4 and L_v as above, Exposure H_v would be equal to 0.1263 lxs.  For reference, many digital cameras today start to clip around one lxs at ISO 100 in the raw data.

The Connection to Tristimulus XYZ Color Space

Conveniently it turns out that the \bar{y}(\lambda) Color Matching Function of the CIE 2° Standard Observer is in fact chosen to be identical or very similar to  V(\lambda), as can be seen in Figure 1 below.  This is the reason why the Y channel in the CIE XYZ color space is often referred to as Luminance.  Ideally it is Luminance, mostly relative but possibly absolute.

This tidbit reminds us that one of the central tenets of photographic Color Science is projecting spectral distributions L(\lambda) to the tristimulus XYZ color space as follows:

(4)   \begin{equation*} XYZ  = K  \int\limits_{380}^{780}L(\lambda) \odot CMF_{xyz}(\lambda) d\lambda \end{equation*}

with K an appropriately chosen constant and CMF_{xyz}(\lambda) the three CIE Color Matching Functions \bar{x}(\lambda), \bar{y}(\lambda) and \bar{z}(\lambda) in turn.  The integral, typically replaced by discrete summation in practice, results in one value for each of the three curves, designated in turn as tristimulus value X, Y and Z.

color matching luminous efficiency functions
Figure 1. CIE (2012) 2° XYZ “physiologically-relevant” Color Matching Functions and photopic Luminous Efficiency Function (V) from cvrl.org, the Colour & Vision Research Laboratory at UCL. The items in the legend refer to CMF_X, CMF_Y and CMF_Z in the text respectively.  After normalization to unit area the set of curves is scaled for display so that Y = V peaks at 1.

Clearly if K is made equal to the luminous efficiency conversion constant, 683.002 lumens/watt, then Y is Luminance in cd/m^2 by definition, per Equation (2).

On the other hand in imaging it is typically more convenient to choose K so that Y becomes equal to 1 (or 100%) when L represents the spectrum of brightest diffuse white under the given illuminant.  We accomplish this by making K the inverse of Luminance from a perfect diffuser L_{mdw}(\lambda), reflecting or transmitting the spectral distribution of the illuminant

(5)   \begin{equation*} K_{Y} = \frac{1}{\int L_{mdw}(\lambda)\odot \bar{y} \text{ }d\lambda} \end{equation*}

Here K_Y is a single scaler and _{mdw} stands for maximum diffuse white.

CIE CMFs are Area Normalized to Begin with

CMF_{xyz} in this article represents the CIE (2012) 2-deg XYZ “physiologically-relevant” color matching functions, fine tuned over decades of psychophysical testing with live Observers – we will drop the _{xyz} and  _{rgb} subscripts in CMF and SSF from now on not to clutter the relative equations, just remember that they represent three curves each.

When used with Equation (4) they are the most important functions in colorimetry because if two spectral stimuli result in the same XYZ tristimulus values they should look the same to the average Observer. And if they are sufficiently different they should look different.

Note that the CMF curves in Figure 1 are in arbitrary units, meaning that they have been normalized individually so that their areas are all equal under flat spectral radiance, aka equi-energy stimulus Se.  They have also been all scaled by a common factor so that \bar{y}(\lambda) peaks at one, just like luminous efficiency function V(\lambda) does.  It is clear however that this scaling becomes irrelevant after the normalization provided by K_{Y} in Equation (5): for instance under maximum diffuse equi-energy stimulus the tristimulus values for XYZ will be [1 1 1], Standard Illuminant S_e ‘s so-called White Point as we are used to seeing it in imaging.

SSFs are Area Normalized by White Balancing

To make the two sets of functions intuitively comparable, Spectral Sensitivity Functions can also be normalized so that the areas under their curves are all equal to one for radiance from the illuminant after maximum diffuse reflection, L_{mdw}(\lambda).  Unlike for CMF, however, the areas under the three SSF curves of our imaging systems are not usually presented equal to start with, so this time we achieve the objective by dividing each function by the area under its own curve

(6)   \begin{equation*} K_{rgb} = \frac{1}{\frac{\pi p^2 t}{4N^2} \int L_{mdw}(\lambda) \odot SSF(\lambda) d\lambda} \end{equation*}

Collecting that normalization and all constants in Equation (1) into K_{rgb} we see that it evaluates to three scalars under the given spectral radiance, one each for the \bar{r}(\lambda), \bar{g}(\lambda) and \bar{b}(\lambda) functions of the SSF.  These are the multipliers that are applied to the respective rgb color channel as part of the operation normally referred to as ‘white balance‘ during raw conversion.

The normalized forms of Equations (1) and (4) allow us to concentrate on the shape and position of the individual functions as opposed to their amplitude.  This is what the Spectral Sensitivity Functions of the  Nikon D5100 camera and lens shown at the bottom of the last article look like after normalization by  K_{rgb} under equi-energy stimulus S_e (scaled, for comparison only, by the same CIE factor as CMFs were subjected to in Figure 1):

Figure 2. Area normalized Spectral Sensitivity Functions of a Nikon D5100 with an unknown lens as measured by Darrodi et al at the National Physical Laboratory, London, here scaled so that CMF_Y in Figure 1 and SSF_G have the same area.  Legend items refer to SSF_R, SSF_G and SSF_B in the text respectively.

Trichromatic Equation (1), producing rgb raw values, then takes the same form as Equation (4), producing XYZ tristimulus values:

(7)   \begin{equation*} rgb = K_{rgb}\int\limits_{380}^{780} L(\lambda)\odot SSF(\lambda)  d\lambda \end{equation*}

The rgb raw values produced by Equation (7) under maximum diffuse S_e will then also be equal to [111] after normalization by K_{rgb}, as seen in Figure 3 below.

In fact they will always be [1 1 1] after white balance independently of the illuminant in L_{mdw} because we defined K_{rgb} as a function of all three \bar{r}, \bar{g}, \bar{b} curves.  This is not the case for XYZ values normalized by single scalar K_Y – so while Y will be equal to one after normalization, X and Z will vary with the illuminant.  Their values will be those of the illuminant’s White Point.

We are now firmly in the relative domain, having scaled each CMF and SSF curve individually at will, effectively giving up on absolute values.  This is ok because the curves are linearly independent (they are not mixed – yet) and because working with relative units simplifies the process considerably in photography.

Maximum Diffuse White at Y = g = 1

So for a given spectral radiance from the scene we get two sets of triplets: CIE XYZ tristimulus reference values per equation (4), and rgb trichromatic raw data per equation (7).  The absolute energy of radiance L(\lambda) determines the linear range of responses in both systems, which under standard normalizations go from a minimum value of 0 with no light to a value of 1 in the Y and  g  channels with maximum diffuse white L_{mdw}(\lambda) as the stimulus.

One is sometimes taken to mean 100(%) in Color Science, in which case the range of values to diffuse white is shown from 0 to 100 instead of from 0 to 1.  In image processing it is more convenient to use 1 because it simplifies the math, so that’s what we’ll do here.

Photographers and automatic exposure meters typically attempt to choose parameters for their captures that will result in maximum diffuse white landing somewhere around full scale in the raw data – so rgb intensities at clipping, say around 15000 DN for a current bias-subtracted 14-bit camera, get normalized to 1 by K_{rgb}.  Continuing the previous example, assuming a typical digital camera that clips around 1 lxs at ISO100, we would then expect raw data around 12.6% of full scale with Luminance of 77.2 cd/m^2 from the earlier example.

Unattainable Perfect Color

For ‘exact’ CIE color the response of the imaging system, rgb raw intensities, should mimic as closely as possible XYZ values under the same stimulus.   It is useful to normalize CMF and SSF by K_{Y} and K_{rgb} , shown as CMF' and SSF' below respectively, because it brings them to a similar scale and makes them easier to compare visually.   Equations (4) and (7) would then ideally produce identical results:

(8)   \begin{equation*} \int\limits_{380}^{780} {L(\lambda)\odot CMF'(\lambda) d\lambda= \int\limits_{380}^{780} L(\lambda)\odot SSF'(\lambda) d\lambda \end{equation*}

with ' indicating normalization as discussed.  Clearly if SSF'(\lambda) were identical to CMF'(\lambda) we would have perfect color out-of-camera.   But it is obvious by comparing their plots under equi-energy stimulus S_e below that the two sets of normalized curves have quite different shapes and also appear somewhat shifted with respect to one another.

Figure 3. Left: CIE (2012) XYZ “physiologically relevant” Colour Matching Functions sampled every nm; they are scaled by the same factor so that the ybar curve results in Y = 1 under illuminant S_e. Right: Nikon D5100 with unknown lens Spectral Sensitivity Functions, sampled every nm, each curve normalized so that the area under each curve also adds to 1 under the same stimulus.

On the other hand there are, not coincidentally, some similarities: both sides of Equation (8) resolve to [1 1 1] under equi-energy illuminant S_e; the curves are all mostly contained in the 400-700nm wavelength range; and each curve has a main peak more or less spaced out from the others (the second minor hump in \bar{x}(\lambda) is a byproduct of the math, see the next article).

I have never seen  SSF' that look exactly like CMF' in current consumer cameras, in part because of physical limitations of sensors like saturation and noise; and the fact that they need to be able to work in many different conditions, such as in candle light (much more red energy than blue) as well as on a bright sunny day on a Nordic glacier (vice versa).

In order to minimize the possibility of clipping and/or noise in rgb, it is usually useful to have a more efficient green channel in the center acting like the fulcrum of a seesaw – with weaker red and blue at either end moving up and down in opposite directions depending on scene and illuminant (refer to Figure 4 in the previous article). This forces other compromises that result in SSF curves different in shape and more spread out than the ideal.

Let’s Compromise

So next best is to try to minimize differences between the two sets of three curves:  since the system is assumed to be linear, it should be possible to obtain an improved approximation to CMF' by using a linear combination of the three SSF' under a given spectral stimulus.  Linear algebra tells us that we can find a 3×3 matrix M that minimizes the square of differences between the resulting curves so that approximately

(9)   \begin{equation*} L(\lambda)\odot CMF' \approx M * [L(\lambda)\odot SSF'(\lambda)] \end{equation*}

with * indicating matrix multiplication.   Absolute radiance can be replaced by relative spectral radiance L'(\lambda) because L(\lambda) appears on both sides of Equation (9) so its absolute strength cancels out: what remains is just its spectral signature, often normalized for convenience.  For instance if it represents just irradiance from the the light source it is often normalized to have a value of one at the mean wavelength in the working range, or alternatively 560nm.

After pixel integration we get the equivalent of Equation (8)

(10)   \begin{equation*} \int\limits_{380}^{780} {L'(\lambda)\odot CMF'(\lambda) \approx  M * \int\limits_{380}^{780} L'(\lambda)\odot SSF'(\lambda) \end{equation*}

d\lambda‘s omitted because they can be assumed to be the same (and to make the Equation fit on one line:).  Equation (10), based on triplets of trichromatic values, is looser and more forgiving than Equation (9), which instead works on full spectra – because it also allows for metamers.  For a given camera and set of Spectral Sensitivity Functions we can solve for M as a function of the relative spectral radiance L'(\lambda) at the lens:

(11)   \begin{equation*} M_{(L'_{\lambda})} = [L'(\lambda) CMF'(\lambda)] * [L'(\lambda) SSF'(\lambda)]^{-1} \end{equation*}

with ^{-1} the inverse of the array in the square brackets and * matrix multiplication.  The other products are either element-wise or ‘dot’ depending on whether Equation (9) or (10) is applied respectively.   Clearly the resulting matrices will be similar but not exactly equal in either case, having minimized two slightly different functions from the same set of data: least square wavelength-by-wavelength differences between the curves themselves vs differences between the relative integrated tristimulus values.

Note that there would also be a closed form solution for matrix M even if we hadn’t normalized CMF and SSF curves by their respective constants K_{Y} and K_{rgb} – the matrix would simply absorb the constants and work happily with the functions as they are before normalization (as we will see in the article on Linear Color Transforms).

Transform matrix M above is optimized for the spectral distribution of one radiance only, L'(\lambda).  Assuming flat spectral stimulus S_e at the lens and applying Equation (11) with full spectra, matrix M_{(S_{e})} for the D5100 with unknown lens in Figure 2 would be

Using that matrix to project the relative SSF' to  the XYZ connection color space,  this is how close (or not) its curves come to  the respective CMF':

Nikon D5100 Spectral Sensitivity Functions in XYZ
Figure 4. CIE 2deg CMFs (solid curves) and given SSFs (dotted) projected to XYZ by matrix multiplication, assuming just spectrally flat stimulus Se and the same scaling as Figures 1 and 2 for comparison only.

The projected green curve \bar{g} does a decent job of staying close to its \bar{y} counterpart, the other two less so.  Of course some system SSFs are better than others.  In 2012 Jun Jiang et al at the Rochester Institute of Technology measured the Root Mean Square Error resulting from 28 digital camera SSFs with unknown lenses, early Canons did better than most.[2]

Locally Generalized Compromises

More in general, and more relevant to photography, we can obtain matrices optimized for specific scene types and light source combinations – say sunny spring mountain landscapes – by generating a set of radiances L'_n(\lambda) based on a number of expected principal reflectances  R_n(\lambda) at the scene under one illuminant at the time.  Recall that luminance L is reflected illuminance E so that

(12)   \begin{equation*} L'_n(\lambda) = E'(\lambda) \odot R_n(\lambda)  k_n \end{equation*}

where E'(\lambda) is the relative Spectral Power Distribution of the illuminant at the scene (normalized to one at 560nm), R_n(\lambda) the n principal diffuse reflectances expected in such a scene (in % spectral efficiency) and k_n constants that we can drop because we are only interested in normalized L'_n(\lambda) which appears on both sides of the relevant equations.

By applying Equation (11) repeatedly to each L'_n(\lambda) expected at a scene type we can generate a mean single compromise color matrix, call it M_{(L'_{n\lambda})}, that minimizes potential errors for that scene under the chosen illuminant.  Such compromise matrices can be calculated for specific applications like landscapes, studio or non studio portraits, art reproduction, B&W etc.  Tellingly,  similarly named camera profiles are often found in current raw converters.

Using the 24 reflectances of an X-Rite ColorChecker target as measured by BabelColor[3] , matrix M_{(L'_{n\lambda})} from full spectra for the previously mentioned Nikon D5100 and unknown lens under standard illuminant S_e would be

Such Compromise Color Matrices can be easily generated for other illuminants and sets of reflectances, as can be deduced from the short Matlab/Octave example linked in the notes below.[4]

With M_{(L'_{n\lambda})}  in hand for the expected radiance mix, XYZ tristimulus values  can be approximated as follows from captured rgb demosaiced raw data normalized as discussed earlier,

(13)   \begin{equation*} XYZ \approx M_{(L'_{n\lambda})} * rgb \end{equation*}

where data triplets are in 3xN column vector notation and * indicates matrix multiplication.  Equation (13) works just as well on trichromatic vs the respective full spectral data of Equation (10) because of the distributive property of multiplication.[5]  If we keep track of all the constants we can obtain approximate results in absolute units X_LY_LZ_L, although this is not necessary in photographic imaging.

An Aside

Matrices obtained by minimizing the square difference between spectral or trichromatic data can be used to convert rgb raw data triplets to XYZ tristimulus values because what is good for the 31+ Dimensional goose is good for the 3D gander.[6]   However, minimizing errors between the two sets of curves without taking into consideration the different color sensitivity of the Human Visual System throughout the spectrum may not be the best way to obtain optimal perceptual results.

For instance the attentive reader may have noticed by summing up the respective rows that neither matrix presented above results in the expected white point for a neutral input of [1 1 1], both having the objective of minimizing the sum of square errors in the mix of spectral radiances instead.  Though this approach may result in smaller overall deviations between the curves, it can produce the appearance of objectionable color casts.  More controlled and relevant results can be had by constrained regression[7] and by using well established perceptual color difference metrics, as outlined in the Color Transform article.

She may also have wondered why try to mimic the bumpy, synthetic XYZ color space instead of one with more affinity to the physical world, such as the Long, Medium and Short LMS space where Cone Fundamentals reside.  Next.

 

Notes and References


1. A description of the photopic Luminosity Function can be found here.
2. Jun Jiang, Dengyu Liu, Jinwei Gu and Sabine Susstrunk. What is the Space of Spectral Sensitivity Functions for Digital Color Cameras?. IEEE Workshop on the Applications of Computer Vision (WACV), 2013.
3. The BabelColor page with the Excel spreadsheet containing the 24 spectral reflectances of pre-2014 ColorChecker24 targets, measured and averaged over 30 samples with different instruments, can be found here.
4.  The routine used to estimate a Compromise Color Matrix given SSFs, CMFs and expected illuminant and reflectances at the scene can be downloaded by clicking here.
5. It is easy to prove to oneself that the distributive property of multiplication implies that multiplying spectral curves by matrix M and then summing the result into a triplet is the same as summing the curves first and then multiplying the resulting triplets by M.
6. Just like the tristimulus and raw value triplets can be considered to constitute unique vectors representing color in 3D XYZ and RGB coordinate systems, continuous spectra can be considered to be vectors of infinite length/dimensions.  In practice it usually suffices to sample the visible wavelengths at least every 10 nm.  So sampling a given spectral curve at a minimum from  400 to 700nm every 10nm results in a vector of at least 31 values/dimensions.
7. For constrained regression resulting in white point preserving matrices see for instance Finlayson, Graham D. and Mark S. Drew: “White-Point Preservation Enforces Positivity.” Color Imaging Conference (1998).
8. In this article  guidance and solace came from Hunt’s excellent book: Measuring Color, Third Edition, R.W.G. Hunt. Fountain Press, 1998.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.