Now that we know how to create a 3×3 linear matrix to convert white balanced and demosaiced raw data into connection space – and where to obtain the 3×3 linear matrix to then convert it to a standard output color space like sRGB – we can take a closer look at the matrices and apply them to a real world capture chosen for its wide range of chromaticities.
We understand from the previous article that rendering color during raw conversion essentially means mapping raw data represented by RGB triplets into a standard color space via a Profile Connection Space in a two step process
The process I will use first white balances and demosaics the raw data, which at that stage we will refer to as , followed by converting it to Profile Connection Space through linear transformation by an unknown ‘Forward Matrix’ (as DNG calls it) of the form
Determining the nine coefficients of this matrix is the main subject of this article. Continue reading Color: Determining a Forward Matrix for Your Camera
How do we translate captured image information into a stimulus that will produce the appropriate perception of color? It’s actually not that complicated.
Recall from the introductory article that a photon absorbed by a cone type (, or ) in the fovea produces the same stimulus to the brain regardless of its wavelength. Take the example of the eye of an observer which focuses on the retina the image of a uniform object with a spectral photon distribution of 1000 photons/nm in the 400 to 720nm wavelength range and no photons outside of it.
Because the system is linear, cones in the foveola will weigh the incoming photons by their relative sensitivity (probability) functions and add the result up to produce a stimulus proportional to the area under the curves. For instance a cone will see about 321,000 photons arrive and produce a relative stimulus of about 94,700, the weighted area under the curve:
This article will set the stage for a discussion on how pleasing color is produced during raw conversion. The easiest way to understand how a camera captures and processes ‘color’ is to start with an example of how the human visual system does it.
An Example: Green
Light from the sun strikes leaves on a tree. The foliage of the tree absorbs some of the light and reflects the rest diffusely towards the eye of a human observer. The eye focuses the image of the foliage onto the retina at its back. Near the center of the retina there is a small circular area called the foveola which is dense with light receptors of well defined spectral sensitivities called cones. Information from the cones is pre-processed by neurons and carried by nerve fibers via the optic nerve to the brain where, after some additional psychovisual processing, we recognize the color of the foliage as green.
What are the basic low level steps involved in raw file conversion? In this article I will discuss what happens under the hood of digital camera raw converters in order to turn raw file data into a viewable image, a process sometimes referred to as ‘rendering’. We will use the following raw capture to show how image information is transformed at every step along the way:
Rendering = Raw Conversion + Editing
This post will continue looking at the spatial frequency response measured by MTF Mapper off slanted edges in DPReview.com raw captures and relative fits by the ‘sharpness’ model discussed in the last few articles. The model takes the physical parameters of the digital camera and lens as inputs and produces theoretical directional system MTF curves comparable to measured data. As we will see the model seems to be able to simulate these systems well – at least within this limited set of parameters.
The following fits refer to the green channel of a number of interchangeable lens digital camera systems with different lenses, pixel sizes and formats – from the current Medium Format 100MP champ to the 1/2.3″ 18MP sensor size also sometimes found in the best smartphones. Here is the roster with the cameras as set up:
The series of articles starting here outlines a model of how the various physical components of a digital camera and lens can affect the ‘sharpness’ – that is the spatial resolution – of the images captured in the raw data. In this one we will pit the model against MTF curves obtained through the slanted edge method from real world raw captures both with and without an anti-aliasing filter.
With a few simplifying assumptions, which include ignoring aliasing and phase, the spatial frequency response (SFR or MTF) of a photographic digital imaging system near the center can be expressed as the product of the Modulation Transfer Function of each component in it. For a current digital camera these would typically be the main ones:
all in two dimensions Continue reading Taking the Sharpness Model for a Spin
We now know how to calculate the two dimensional Modulation Transfer Function of a perfect lens affected by diffraction, defocus and third order Spherical Aberration – under monochromatic light at the given wavelength and f-number. In digital photography however we almost never deal with light of a single wavelength. So what effect does an illuminant with a wide spectral power distribution, going through the color filter of a typical digital camera CFA before the sensor have on the spatial frequency responses discussed thus far?
Monochrome vs Polychromatic Light
Not much, it turns out. Continue reading A Simple Model for Sharpness in Digital Cameras – Polychromatic Light
Spherical Aberration (SA) is one key component missing from our MTF toolkit for modeling an ideal imaging system’s ‘sharpness’ in the center of the field of view in the frequency domain. In this article formulas will be presented to compute the two dimensional Point Spread and Modulation Transfer Functions of the combination of diffraction, defocus and third order Spherical Aberration for an otherwise perfect lens with a circular aperture.
Spherical Aberrations result because most photographic lenses are designed with quasi spherical surfaces that do not necessarily behave ideally in all situations. For instance, they may focus light on slightly different planes depending on whether the respective ray goes through the exit pupil closer or farther from the optical axis, as shown below:
This article will discuss a simple frequency domain model for an AntiAliasing (or Optical Low Pass) Filter, a hardware component sometimes found in a digital imaging system. The filter typically sits right on top of the sensing plane and its objective is to block as much of the aliasing and moiré creating energy above the Nyquist spatial frequency while letting through as much as possible of the real image forming energy below that, hence the low-pass designation.
In consumer digital cameras it is often implemented by introducing one or two birefringent plates in the sensor’s filter stack. This is how Nikon shows it for one of its DSLRs:
This article is a little esoteric so one may want to skip it unless one is interested in the underlying mechanisms that cause quantization error as photographic signal and noise approach the darkest levels of acceptable dynamic range in our digital cameras: one least significant bit in the raw data. We will use our simplified camera model and deal with Poissonian Signal and Gaussian Read Noise separately – then attempt to bring them together.
Whether the human visual system perceives a displayed slow changing gradient of tones, such as a vast expanse of sky, as smooth or posterized depends mainly on two well known variables: the Weber-Fechner Fraction of the ‘steps’ in the reflected/produced light intensity (the subject of this article); and spatial dithering of the light intensity as a result of noise (the subject of a future one).
We’ve seen how information about a photographic scene is collected in the ISOless/invariant range of a digital camera sensor, amplified, converted to digital data and stored in a raw file. For a given Exposure the best information quality (IQ) about the scene is available right at the photosites, only possibly degrading from there – but a properly designed** fully ISO invariant imaging system is able to store it in its entirety in the raw data. It is able to do so because the information carrying capacity (photographers would call it the dynamic range) of each subsequent stage is equal to or larger than the previous one. Cameras that are considered to be (almost) ISOless from base ISO include the Nikon D7000, D7200 and the Pentax K5. All digital cameras become ISO invariant above a certain ISO, the exact value determined by design compromises.
In this article we’ll look at a class of imagers that are not able to store the whole information available at the photosites in one go in the raw file for a substantial portion of their working ISOs. The photographer can in such a case choose out of the full information available at the photosites what smaller subset of it to store in the raw data by the selection of different in-camera ISOs. Such cameras are sometimes improperly referred to as ISOful. Most Canon DSLRs fall into this category today. As do kings of darkness such as the Sony a7S or Nikon D5.
In the last few posts I have made the case that Image Quality in a digital camera is entirely dependent on the light Information collected at a sensor’s photosites during Exposure. Any subsequent processing – whether analog amplification and conversion to digital in-camera and/or further processing in-computer – effectively applies a set of Information Transfer Functions to the signal that when multiplied together result in the data from which the final photograph is produced. Each step of the way can at best maintain the original Information Quality (IQ) but in most cases it will degrade it somewhat.
IQ: Only as Good as at Photosites’ Output
This point is key: in a well designed imaging system** the final image IQ is only as good as the scene information collected at the sensor’s photosites, independently of how this information is stored in the working data along the processing chain, on its way to being transformed into a pleasing photograph. As long as scene information is properly encoded by the system early on, before being written to the raw file – and information transfer is maintained in the data throughout the imaging and processing chain – final photograph IQ will be virtually the same independently of how its data’s histogram looks along the way.
We know that the best Information Quality possible collected from the scene by a digital camera is available right at the output of the sensor and it will only be degraded from there. This article will discuss what happens to this information as it is transferred through the imaging system and stored in the raw data. It will use the simple language outlined in the last post to explain how and why the strategy for Capturing the best Information or Image Quality (IQ) possible from the scene in the raw data involves only two simple steps:
1) Maximizing the collected Signal given artistic and technical constraints; and
2) Choosing what part of the Signal to store in the raw data and what part to leave behind.
The second step is only necessary if your camera is incapable of storing the entire Signal at once (that is it is not ISO invariant) and will be discussed in a future article. In this post we will assume an ISOless imaging system.
Olympus just announced the E-M5 Mark II, an updated version of its popular micro Four Thirds E-M5 model, with an interesting new feature: its 16MegaPixel sensor, presumably similar to the one in other E-Mx bodies, has a high resolution mode where it gets shifted around by the image stabilization servos during exposure to capture, as they say in their press release
‘resolution that goes beyond full-frame DSLR cameras. 8 images are captured with 16-megapixel image information while moving the sensor by 0.5 pixel steps between each shot. The data from the 8 shots are then combined to produce a single, super-high resolution image, equivalent to the one captured with a 40-megapixel image sensor.’
A great idea that could give a welcome boost to the ‘sharpness’ of this handy system. This preliminary test shows that the E-M5 mk II 64MP High-Res mode gives in this case a 10-12% advantage in MTF50 linear spatial resolution compared to the Standard Shot 16MP mode. Plus it apparently virtually eliminates the possibility of aliasing and moiré. Great stuff, Olympus.
So, is it true that a Four Thirds lens needs to be about twice as ‘sharp’ as its Full Frame counterpart in order to be able to display an image of spatial resolution equivalent to the larger format’s?
It is, because of the simple geometry I will describe in this article. In fact with a few provisos one can generalize and say that lenses from any smaller format need to be ‘sharper’ by the ratio of their sensor linear sizes in order to produce the same linear resolution on same-sized final images.
This is one of the reasons why Ansel Adams shot 4×5 and 8×10 – and I would too, were it not for logistical and pecuniary concerns.
Equivalence – as we’ve discussed one of the fairest ways to compare the performance of two cameras of different physical formats, characteristics and specifications – essentially boils down to two simple realizations for digital photographers:
- metrics need to be expressed in units of picture height (or diagonal where the aspect ratio is significantly different) in order to easily compare performance with images displayed at the same size; and
- focal length changes proportionally to sensor size in order to capture identical scene content on a given sensor, all other things being equal.
The first realization should be intuitive (future post). The second one is the subject of this post: I will deal with it through a couple of geometrical diagrams.
The key variable as far as the tolerances required to position the lens for accurate focus are concerned (at least in a simplified ideal situation) is an appropriate approximate distance between the desired in-focus plane and the actual in-focus plane (which we are assuming is slightly out of focus). It is a distance in the direction perpendicular to the x-y plane normally used to describe position of the image on it, hence the designation delta z, or dz in this post. The lens’ allowable focus tolerance is therefore +/- dz, which we will show in this post to vary as the square of the format’s diagonal. Continue reading Focus Tolerance and Format Size
Is MTF50 a good proxy for perceived sharpness? It turns out that the spatial frequencies that are most closely related to our perception of sharpness vary with the size and viewing distance of the displayed image.
For instance if an image captured by a Full Frame camera is viewed at ‘standard’ distance (that is a distance equal to its diagonal) the portion of the MTF curve most representative of perceived sharpness appears to be around MTF90. Continue reading MTF50 and Perceived Sharpness