Just in case anyone was wondering (I was), it turns out that my smartphone camera produces a better SMI color score off a ColorChecker Passport target than a full frame Nikon D610 DSLR .
My latest phone, a late 2017 incarnation of the LG V34, produces raw DNG files, so I went poking around. From what I could gather the sensor is most likely Sony’s IMX 234, 1/2.6″, Back Side Illuminated, stacked and based on the latest and cleanest Exmor RS technology. The sensor’s 1.12um pixels produce 16MP raw files with 10-bit depth, which I understand to be typical for current phone cameras. Other features include phase detect AF, an electronic shutter with variable integration time, HDR, hot pixel suppression and raw noise reduction (ugh!) – plus a slew of video features. Continue reading Phone Camera Color ‘Accuracy’→
While checking some out-of-gamut tones on an xy Chromaticity Diagram I started to wonder how far two tones needed to be in order for an observer to notice a difference. Were the tones in the yellow and red clusters below discernible or would they be indistinguishable, all being perceived as the same ‘color’?
We’ve seen how humans perceive color in daylight as a result of three types of photoreceptors in the retina called cones that absorb wavelengths of light from the scene with different sensitivities to the arriving spectrum.
A photographic digital imager attempts to mimic the workings of cones in the retina by having different color filters arranged in an array (CFA) on top of its photoreceptors, which we normally call pixels. In a Bayer CFA configuration there are three filters named for the predominant wavelengths that each lets through (red, green and blue) arranged in quartets such as shown below:
It is the quality of the combined filtering part of the imaging system (lenses, UV/IR, CFA, sensor etc.) that determines how accurately a digital camera is able to capture color information from the scene. So what are the characteristics of better systems and can perfection be achieved? In this article I will pick up the discussion where it was last left off and, ignoring noise for now, attempt to answer this question using CIE conventions, in the process gaining insight in the role of the compromise color matrix and developing a method to visualize its effects.Continue reading The Perfect Color Filter Array→
Over the last two posts we’ve been exploring some of the differences introduced by tweaks to the Color Filter Array of the Phase One IQ3 100MP Trichromatic Digital Back versus its original incarnation, the Standard Back. Refer to those for the background. In this article we will delve into some of these differences quantitatively.
Let’s start with the compromise color matrices we derived from David Chew’s captures of a ColorChecher 24 in the shade of a sunny November morning in Ohio. These are the matrices necessary to convert white balanced raw data to the perceptual CIE XYZ color space, where it is said there should be one-to-one correspondence with colors as perceived by humans, and therefore where most measurements are performed. They are optimized for each back in the current conditions but they are not perfect, the reason for the word ‘compromise’ in their name:
We have seen in the last post that Phase One apparently performed a couple of main tweaks to the Color Filter Array of its Medium Format IQ3 100MP back when it introduced the Trichromatic: it made the shapes of color filter sensitivities more symmetric by eliminating residual transmittance away from the peaks; and it boosted the peak sensitivity of the red (and possibly blue) filter. It did this with the objective of obtaining more accurate, less noisy color out of the hardware, requiring less processing and weaker purple fringing to boot.
Both changes carry the compromises discussed in the last article so the purpose of this one and the one that follows is to attempt to measure – within the limits of my tests, procedures and understanding – the effect of the CFA changes from similar raw captures by the IQ3 100MP Standard Back and Trichromatic, courtesy of David Chew. We will concentrate on color accuracy, leaving purple fringing for another time.
It is always interesting when innovative companies push the envelope of the state-of-the-art of a single component in their systems because a lot can be learned from before and after comparisons. I was therefore excited when Phase One introduced a Trichromatic version of their Medium Format IQ3 100MP Digital Back last September because it could allows us to isolate the effects of tweaks to their Bayer Color Filter Array, assuming all else stays the same.
Thanks to two virtually identical captures by David Chew at getDPI, and Erik Kaffehr’s intelligent questions at DPR, in the following articles I will explore the effect on linear color of the new Trichromatic CFA (TC) vs the old one on the Standard Back (SB). In the process we will discover that – within the limits of my tests, procedures and understanding – the Standard Back produces apparently more ‘accurate’ color while the Trichromatic produces better looking matrices, potentially resulting in ‘purer’ signals. Continue reading Phase One IQ3 100MP Trichromatic vs Standard Back Linear Color, Part I→
In this article we shall find that the effect of a Bayer CFA on the spatial frequencies and hence the ‘sharpness’ captured by a sensor compared to those from a corresponding monochrome imager can go from (almost) nothing to halving the potentially unaliased range based on the chrominance content of the image projected on the sensing plane and the direction in which the spatial frequencies are being stressed.
A Little Sampling Theory
We know from Goodman and previous articles that the sampled image ( ) captured in the raw data by a typical current digital camera can be represented mathematically as the continuous image on the sensing plane ( ) multiplied by a rectangular lattice of Dirac delta functions positioned at the center of each pixel:
with the functions representing the two dimensional grid of delta functions, sampling pitch apart horizontally and vertically. To keep things simple the sensing plane is considered here to be the imager’s silicon itself, which sits below microlenses and other filters so the continuous image is assumed to incorporate their as well as pixel aperture’s effects. Continue reading Bayer CFA Effect on Sharpness→
In the last article we saw that the Point Spread Function and the Modulation Transfer Function of a lens could be easily obtained numerically by applying Discrete Fourier Transforms to its generalized exit pupil function twice in sequence.
Obtaining the 2D DFTs is easy: simply feed MxN numbers representing the two dimensional complex image of the pupil function in its space to a Fast Fourier Transform routine and, presto, it produces MxN numbers that represent the amplitude of the PSF on the sensing plane. Figure 1a shows a simple case where pupil function is a uniform disk representing the circular aperture of a perfect lens with MxN = 1024×1024. Figure 1b is the resulting intensity PSF.
Goodman, in his excellent Introduction to Fourier Optics, describes how an image is formed on a camera sensing plane starting from first principles, that is electromagnetic propagation according to Maxwell’s wave equation. If you want the play by play account I highly recommend his math intensive book. But for the budding photographer it is sufficient to know what happens at the exit pupil of the lens because after that the transformations to Point Spread and Modulation Transfer Functions are straightforward, as we will show in this article.
The following diagram exemplifies the last few millimeters of the journey that light from the scene has to travel in order to smash itself against our camera’s sensing medium. Light from the scene in the form of field arrives at the front of the lens. It goes through the lens being partly blocked and distorted by it (we’ll call this blocking/distorting function ) and finally arrives at its back end, the exit pupil. The complex light field at the exit pupil’s two dimensional plane is now as shown below:
Now that we know how to create a 3×3 linear matrix to convert white balanced and demosaiced raw data into connection space – and where to obtain the 3×3 linear matrix to then convert it to a standard output color space like sRGB – we can take a closer look at the matrices and apply them to a real world capture chosen for its wide range of chromaticities.
We understand from the previous article that rendering color during raw conversion essentially means mapping raw data in the form of triplets into a standard color space via a Profile Connection Space in a two step process
The first step white balances and demosaics the raw data, which at that stage we will refer to as , followed by converting it to Profile Connection Space through linear projection by an unknown ‘Forward Matrix’ (as DNG calls it) of the form
How do we translate captured image information into a stimulus that will produce the appropriate perception of color? It’s actually not that complicated.
Recall from the introductory article that a photon absorbed by a cone type (, or ) in the fovea produces the same stimulus to the brain regardless of its wavelength. Take the example of the eye of an observer which focuses on the retina the image of a uniform object with a spectral photon distribution of 1000 photons/nm in the 400 to 720nm wavelength range and no photons outside of it.
Because the system is linear, cones in the foveola will weigh the incoming photons by their relative sensitivity (probability) functions and add the result up to produce a stimulus proportional to the area under the curves. For instance a cone may see about 321,000 photons arrive and produce a relative stimulus of about 94,700, the weighted area under the curve:
This article will set the stage for a discussion on how pleasing color is produced during raw conversion. The easiest way to understand how a camera captures and processes ‘color’ is to start with an example of how the human visual system does it.
An Example: Green
Light from the sun strikes leaves on a tree. The foliage of the tree absorbs some of the light and reflects the rest diffusely towards the eye of a human observer. The eye focuses the image of the foliage onto the retina at its back. Near the center of the retina there is a small circular area called the foveola which is dense with light receptors of well defined spectral sensitivities called cones. Information from the cones is pre-processed by neurons and carried by nerve fibers via the optic nerve to the brain where, after some additional psychovisual processing, we recognize the color of the foliage as green.
What are the basic low level steps involved in raw file conversion? In this article I will discuss what happens under the hood of digital camera raw converters in order to turn raw file data into a viewable image, a process sometimes referred to as ‘rendering’. We will use the following raw capture by a Nikon D610 to show how image information is transformed at every step along the way:
This post will continue looking at the spatial frequency response measured by MTF Mapper off slanted edges in DPReview.com raw captures and relative fits by the ‘sharpness’ model discussed in the last few articles. The model takes the physical parameters of the digital camera and lens as inputs and produces theoretical directional system MTF curves comparable to measured data. As we will see the model seems to be able to simulate these systems well – at least within this limited set of parameters.
The following fits refer to the green channel of a number of interchangeable lens digital camera systems with different lenses, pixel sizes and formats – from the current Medium Format 100MP champ to the 1/2.3″ 18MP sensor size also sometimes found in the best smartphones. Here is the roster with the cameras as set up:
The series of articles starting here outlines a model of how the various physical components of a digital camera and lens can affect the ‘sharpness’ – that is the spatial resolution – of the images captured in the raw data. In this one we will pit the model against MTF curves obtained through the slanted edge methodfrom real world raw captures both with and without an anti-aliasing filter.
With a few simplifying assumptions, which include ignoring aliasing and phase, the spatial frequency response (SFR or MTF) of a photographic digital imaging system near the center can be expressed as the product of the Modulation Transfer Function of each component in it. For a current digital camera these would typically be the main ones:
We now know how to calculate the two dimensional Modulation Transfer Function of a perfect lens affected by diffraction, defocus and third order Spherical Aberration – under monochromatic light at the given wavelength and f-number. In digital photography however we almost never deal with light of a single wavelength. So what effect does an illuminant with a wide spectral power distribution, going through the color filter of a typical digital camera CFA before the sensor have on the spatial frequency responses discussed thus far?
Spherical Aberration (SA) is one key component missing from our MTF toolkit for modeling an ideal imaging system’s ‘sharpness’ in the center of the field of view in the frequency domain. In this article formulas will be presented to compute the two dimensional Point Spread and Modulation Transfer Functions of the combination of diffraction, defocus and third order Spherical Aberration for an otherwise perfect lens with a circular aperture.
Spherical Aberrations result because most photographic lenses are designed with quasi spherical surfaces that do not necessarily behave ideally in all situations. For instance, they may focus light on slightly different planes depending on whether the respective ray goes through the exit pupil closer or farther from the optical axis, as shown below:
This series of articles has dealt with modeling an ideal imaging system’s ‘sharpness’ in the frequency domain. We looked at the effects of the hardware on spatial resolution: diffraction, sampling interval, sampling aperture (e.g. a squarish pixel), anti-aliasing OLPAF filters. The next two posts will deal with modeling typical simple imperfections in the system: defocus and spherical aberrations.
Defocus = OOF
Defocus means that the sensing plane is not exactly where it needs to be for image formation in our ideal imaging system: the image is therefore out of focus (OOF). Said another way, light from a point source would go through the lens but converge either behind or in front of the sensing plane, as shown in the following diagram, for a lens with a circular aperture:
This article will discuss a simple frequency domain model for an AntiAliasing (or Optical Low Pass) Filter, a hardware component sometimes found in a digital imaging system. The filter typically sits right on top of the sensing plane and its objective is to block as much of the aliasing and moiré creating energy above the Nyquist spatial frequency while letting through as much as possible of the real image forming energy below that, hence the low-pass designation.
In consumer digital cameras it is often implemented by introducing one or two birefringent plates in the sensor’s filter stack. This is how Nikon shows it for one of its DSLRs:
This article is about specifying the units of the Discrete Fourier Transform of an image and the various ways that they can be expressed. This apparently simple task can be fiendishly unintuitive.
The image we will use as an example is the familiar Airy Disk from the last few posts, at f/16 with light of mean 530nm wavelength. Zoomed in to the left in Figure 1; and as it looks in its 1024×1024 sample image to the right:
Having shown that our simple two dimensional MTF model is able to predict the performance of the combination of a perfect lens and square monochrome pixel we now turn to the effect of the sampling interval on spatial resolution according to the guiding formula:
The hats in this case mean the Fourier Transform of the relative component normalized to 1 at the origin (), that is the individual MTFs of the perfect lens PSF, the perfect square pixel and the delta grid.
Sampling in the Spatial and Frequency Domains
Sampling is expressed mathematically as a Dirac delta function at the center of each pixel (the red dots below).
Now that we know from the introductory article that the spatial frequency response of a typical perfect digital camera and lens can be modeled simply as the product of the Modulation Transfer Function of the lens and pixel area, convolved with a Dirac delta grid at cycles-per-pixel spacing
The next few posts will describe a linear spatial resolution model that can help a photographer better understand the main variables involved in evaluating the ‘sharpness’ of photographic equipment and related captures. I will show numerically that the combined spectral frequency response (MTF) of a perfect AAless monochrome digital camera and lens in two dimensions can be described as the normalized product of the Fourier Transform (FT) of the lens Point Spread Function by the FT of the (square) pixel footprint, convolved with the FT of a rectangular grid of Dirac delta functions centered at each pixel, as better described in the article
With a few simplifying assumptions we will see that the effect of the lens and sensor on the spatial resolution of the continuous image on the sensing plane can be broken down into these simple components. The overall ‘sharpness’ of the captured digital image can then be estimated by combining the ‘sharpness’ of each of them. Continue reading A Simple Model for Sharpness in Digital Cameras – I→
While perusing Jim Kasson’s excellent Longitudinal Chromatic Aberration tests I was impressed by the quantity and quality of the information the resulting data provides. Longitudinal, or Axial, CA is a form of defocus and as such it cannot be effectively corrected during raw conversion, so having a lens well compensated for it will provide a real and tangible improvement in the sharpness of final images. How much of an improvement?
A number of interesting insights come to light once one realizes that as far as the slanted edge method (of measuring the Modulation Transfer Function of a Bayer CFA digital camera and lens from its raw data) is concerned it is as if it were dealing with identical full resolution images behind three color filters, each in their own separate color planes:
This is a vast and complex subject for which I do not have formal training. In this and the previous article I present my thoughts on how MTF50 results obtained from raw data of the four Bayer CFA channels off a uniformly illuminated neutral target captured with a typical digital camera through the slanted edge method can be combined to provide a meaningful composite MTF50 for the imaging system as a whole1. Corrections, suggestions and challenges are welcome. Continue reading COMBINING BAYER CFA MTF Curves – II→
This is a vast and complex subject for which I do not have formal training. In this and the following article I will discuss my thoughts on how MTF50 results obtained from raw data of the four Bayer CFA color channels off a neutral target captured with a typical camera through the slanted edge method can be combined to provide a meaningful composite MTF50 for the imaging system as a whole. The perimeter are neutral slanted edge measurements of Bayer CFA raw data for linear spatial resolution (‘sharpness’) photographic hardware evaluations. Corrections, suggestions and challenges are welcome. Continue reading Combining Bayer CFA Modulation Transfer Functions – I→
For the purposes of ‘sharpness’ spatial resolution measurement in photography cameras can be considered shift-invariant, linear systems.
Shift invariant means that the imaging system should respond exactly the same way no matter where light from the scene falls on the sensing medium . We know that in a strict sense this is not true because for instance a pixel has a square area so it cannot have an isotropic response by definition. However when using the slanted edge method of linear spatial resolution measurement we can effectively make it shift invariant by careful preparation of the testing setup. For example the edges should be slanted no more than this and no less than that. Continue reading Linearity in the Frequency Domain→
My camera has a 14-bit ADC. Can it accurately record information lower than 14 stops below full scale? Can it store sub-LSB signals in the raw data?
With a well designed sensor the answer, unsurprisingly if you’ve followed the last few posts, is yes it can. The key to being able to capture such tiny visual information in the raw data is a well behaved imaging system with a properly dithered ADC. Continue reading Sub Bit Signal→
This article is a little esoteric so one may want to skip it unless one is interested in the underlying mechanisms that cause quantization error as photographic signal and noise approach the darkest levels of acceptable dynamic range in our digital cameras: one least significant bit in the raw data. We will use our simplified camera model and deal with Poissonian Signal and Gaussian Read Noise separately – then attempt to bring them together.
Physicists and mathematicians over the last few centuries have spent a lot of their time studying light and electrons, the key ingredients of digital photography. In so doing they have left us with a wealth of theories to explain their behavior in nature and in our equipment. In this article I will describe how to simulate the information generated by a uniformly illuminated imaging system using open source Octave (or equivalently Matlab) utilizing some of these theories. Since as you will see the simulations are incredibly (to me) accurate, understanding how the simulator works goes a long way in explaining the inner workings of a digital sensor at its lowest levels; and simulated data can be used to further our understanding of photographic science without having to run down the shutter count of our favorite SLRs. This approach is usually referred to as Monte Carlo simulation.
Whether the human visual system perceives a displayed slow changing gradient of tones, such as a vast expanse of sky, as smooth or posterized depends mainly on two well known variables: the Weber-Fechner Fraction of the ‘steps’ in the reflected/produced light intensity (the subject of this article); and spatial dithering of the light intensity as a result of noise (the subject of a future one).
We’ve seen how information about a photographic scene is collected in the ISOless/invariant range of a digital camera sensor, amplified, converted to digital data and stored in a raw file. For a given Exposure the best information quality (IQ) about the scene is available right at the photosites, only possibly degrading from there – but a properly designed** fully ISO invariant imaging system is able to store it in its entirety in the raw data. It is able to do so because the information carrying capacity (photographers would call it the dynamic range) of each subsequent stage is equal to or larger than the previous one. Cameras that are considered to be (almost) ISOless from base ISO include the Nikon D7000, D7200 and the Pentax K5. All digital cameras become ISO invariant above a certain ISO, the exact value determined by design compromises.
In this article we’ll look at a class of imagers that are not able to store the whole information available at the photosites in one go in the raw file for a substantial portion of their working ISOs. The photographer can in such a case choose out of the full information available at the photosites what smaller subset of it to store in the raw data by the selection of different in-camera ISOs. Such cameras are sometimes improperly referred to as ISOful. Most Canon DSLRs fall into this category today. As do kings of darkness such as the Sony a7S or Nikon D5.
In the last few posts I have made the case that Image Quality in a digital camera is entirely dependent on the light Information collected at a sensor’s photosites during Exposure. Any subsequent processing – whether analog amplification and conversion to digital in-camera and/or further processing in-computer – effectively applies a set of Information Transfer Functions to the signal that when multiplied together result in the data from which the final photograph is produced. Each step of the way can at best maintain the original Information Quality (IQ) but in most cases it will degrade it somewhat.
IQ: Only as Good as at Photosites’ Output
This point is key: in a well designed imaging system** the final image IQ is only as good as the scene information collected at the sensor’s photosites, independently of how this information is stored in the working data along the processing chain, on its way to being transformed into a pleasing photograph. As long as scene information is properly encoded by the system early on, before being written to the raw file – and information transfer is maintained in the data throughout the imaging and processing chain – final photograph IQ will be virtually the same independently of how its data’s histogram looks along the way.
In photography, digital cameras capture information about the scene carried by photons reflected by it and store the information as data in a raw file pretty well linearly. Data is the container, scene information is the substance. There may or may not be information in the data, no matter what its form. With a few limitations what counts is the substance, information, not the form, data.
A Simple Example
Imagine for instance that you are taking stock of the number of remaining pieces in your dinner place settings. You originally had a full set of 6 of everything but today, after many years of losses and breakage, this is the situation in each category: Continue reading The Difference Between Data and Information→
We know that the best Information Quality possible collected from the scene by a digital camera is available right at the output of the sensor and it will only be degraded from there. This article will discuss what happens to this information as it is transferred through the imaging system and stored in the raw data. It will use the simple language outlined in the last post to explain how and why the strategy for Capturing the best Information or Image Quality (IQ) possible from the scene in the raw data involves only two simple steps:
1) Maximizing the collected Signal given artistic and technical constraints; and
2) Choosing what part of the Signal to store in the raw data and what part to leave behind.
The second step is only necessary if your camera is incapable of storing the entire Signal at once (that is it is not ISO invariant) and will be discussed in a future article. In this post we will assume an ISOless imaging system.
Ever since Einstein we’ve been able to say that humans ‘see’ because information about the scene is carried to the eyes by photons reflected by it. So when we talk about Information in photography we are referring to information about the energy and distribution of photons arriving from the scene. The more complete this information, the better we ‘see’. No photons = no information = no see; few photons = little information = see poorly = poor IQ; more photons = more information = see better = better IQ.
Sensors in digital cameras work similarly, their output ideally being the energy and location of every photon incident on them during Exposure. That’s the full information ideally required to recreate an exact image of the original scene for the human visual system, no more and no less. In practice however we lose some of this information along the way during sensing, so we need to settle for approximate location and energy – in the form of photoelectron counts by pixels of finite area, often correlated to a color filter array.
When photographers talk about grayscale ‘tones’ they typically refer to the number of distinct gray levels present in a displayed image. They don’t want to see distinct levels in a natural slow changing gradient like a dark sky: if it’s smooth they want to perceive it as smooth when looking at their photograph. So they want to make sure that all possible tonal information from the scene has been captured and stored in the raw data by their imaging system.
My camera has an engineering Dynamic Range of 14 stops, how many bits do I need to encode that DR? Well, to encode the whole Dynamic Range 1 bit will suffice. The reason is simple, dynamic range is only concerned with the extremes, not with tones in between:
So in theory we only need 1 bit to encode it: zero for minimum signal and one for maximum signal, like so
Dynamic Range (DR) in Photography usually refers to the linear working tone range, from darkest to brightest, that the imaging system is capable of capturing and/or displaying. It is expressed as a ratio, in stops:
It is a key Image Quality metric because photography is all about contrast, and dynamic range limits the range of recordable/displayable tones. Different components in the imaging system have different working dynamic ranges and the system DR is equal to the dynamic range of the weakest performer in the chain.
Most of the photographs captured these days end up being viewed on a display of some sort, with at best 4K (4096×2160) but often no better than HD resolution (1920×1080). Since the cameras that capture them have typically several times that number of pixels, 6000×4000 being fairly normal today, most images need to be substantially downsized for viewing, even allowing for some cropping. Resizing algorithms built into browsers or generic image viewers tend to favor expediency over quality, so it behooves the IQ conscious photographer to manage the process, choosing the best image size and downsampling algorithm for the intended file and display medium.
When downsizing the objective is to maximize the original spatial resolution retained while minimizing the possibility of aliasing and moirè. In this article we will take a closer look at some common downsizing algorithms and their effect on spatial resolution information in the frequency domain.
I’ve mentioned in the past that I prefer to take spatial resolution measurements directly off the raw information in order to minimize often unknown subjective variables introduced by demosaicing and rendering algorithms unbeknownst to the operator, even when all relevant sliders are zeroed. In this post we discover that that is indeed the case for ACR/LR process 2010/2012 and for Capture NX-D – while DCRAW appears to be transparent and perform straight out demosaicing with no additional processing without the operator’s knowledge.
In fact the question is more generic than that. Smaller format lens designers try to compensate for their imaging system geometric resolution penalty (compared to a larger format when viewing final images at the same size) by designing ‘sharper’ lenses specifically for it, rather than recycling larger formats’ designs (feeling guilty APS-C?) – sometimes with excellent effect. Are they succeeding? I will use mFT only as an example here, but input is welcome for all formats, from phones to large format.
There are several ways to extract Sensor IQ metrics like read noise, Full Well Count, PRNU, Dynamic Range and others from mean and standard deviation statistics obtained from a uniform patch in a camera’s raw file. In the last post we saw how to do it by using such parameters to make observed data match the measured SNR curve. In this one we will achieve the same objective by fitting mean and standard deviation data. Since the measured data is identical, if the fit is good so should be the results.
Sensor Metrics from Measured Mean and Standard Deviation in DN
We’ve seen how to model sensors and how to collect signal and noise statistics from the raw data of our digital cameras. In this post I am going to pull both things together allowing us to estimate sensor IQ metrics: input-referred read noise, clipping/saturation/Full Well Count, Dynamic Range, Pixel Response Non-Uniformities and gain/sensitivity.
There are several ways to extract these metrics from signal and noise data obtained from a camera’s raw file. I will show two related ones: via SNR in this post and via total noise N in the next. The procedure is similar and the results are identical.
Imperfections in an imaging system’s capture process manifest themselves in the form of deviations from the expected signal. We call these imperfections ‘noise’. The fewer the imperfections, the lower the noise, the higher the image quality. However, because the Human Visual System is adaptive within its working range, it’s not the absolute amount of noise that matters to perceived IQ as much as the amount of noise relative to the signal. That’s why to characterize the performance of a sensor in addition to noise we also need to determine its sensitivity and the maximum signal it can detect.
In this series of articles I will describe how to use the Photon Transfer method and a spreadsheet to determine basic IQ performance metrics of a digital camera sensor. It is pretty easy if we keep in mind the simple model of how light information is converted into raw data by digital cameras:
Over the last couple of years I’ve been using Frans van den Bergh‘s excellent open source MTF Mapper to measure the Modulation Transfer Function of imaging systems off a slanted edge target, as you may have seen in these pages. As long as one understands how to get the most out of it I find it a solid product that gives reliable results, with MTF50 typically well within 2% of actual in less than ideal real-world situations (see below). I had little to compare it to other than to tests published by gear testing sites: they apparently mostly use a commercial package called Imatest for their slanted edge readings – and it seemed to correlate well with those.
Then recently Jim Kasson pointed out sfrmat3, the matlab program written by Peter Burns who is a slanted edge method expert who worked at Kodak and was a member of the committee responsible for ISO12233, the resolution and spatial frequency response standard for photography. sfrmat3 is considered to be a solid implementation of the standard and many, including Imatest, benchmark against it – so I was curious to see how MTF Mapper 0.4.1.6 would compare. It did well.
A reader suggested that a High-Res Olympus E-M5 Mark II image used in the previous post looked sharper than the equivalent Sony a6000 image, contradicting the relative MTF50 measurements, perhaps showing ‘the limitations of MTF50 as a methodology’. That would be surprising because MTF50 normally correlates quite well with perceived sharpness, so I decided to check this particular case out.
‘Who are you going to believe, me or your lying eyes’?
Olympus just announced the E-M5 Mark II, an updated version of its popular micro Four Thirds E-M5 model, with an interesting new feature: its 16MegaPixel sensor, presumably similar to the one in other E-Mx bodies, has a high resolution mode where it gets shifted around by the image stabilization servos during exposure to capture, as they say in their press release
‘resolution that goes beyond full-frame DSLR cameras. 8 images are captured with 16-megapixel image information while moving the sensor by 0.5 pixel steps between each shot. The data from the 8 shots are then combined to produce a single, super-high resolution image, equivalent to the one captured with a 40-megapixel image sensor.’
A great idea that could give a welcome boost to the ‘sharpness’ of this handy system. This preliminary test shows that the E-M5 mk II 64MP High-Res mode gives in this case a 10-12% advantage in MTF50 linear spatial resolution compared to the Standard Shot 16MP mode. Plus it apparently virtually eliminates the possibility of aliasing and moiré. Great stuff, Olympus.
So, is it true that a Four Thirds lens needs to be about twice as ‘sharp’ as its Full Frame counterpart in order to be able to display an image of spatial resolution equivalent to the larger format’s?
It is, because of the simple geometry I will describe in this article. In fact with a few provisos one can generalize and say that lenses from any smaller format need to be ‘sharper’ by the ratio of their sensor linear sizes in order to produce the same linear resolution on same-sized final images.
This is one of the reasons why Ansel Adams shot 4×5 and 8×10 – and I would too, were it not for logistical and pecuniary concerns.
Equivalence – as we’ve discussed one of the fairest ways to compare the performance of two cameras of different physical formats, characteristics and specifications – essentially boils down to two simple realizations for digital photographers:
metrics need to be expressed in units of picture height (or diagonal where the aspect ratio is significantly different) in order to easily compare performance with images displayed at the same size; and
focal length changes proportionally to sensor size in order to capture identical scene content on a given sensor, all other things being equal.
The first realization should be intuitive (future post). The second one is the subject of this post: I will deal with it through a couple of geometrical diagrams.
Several sites perform spatial resolution ‘sharpness’ testing of imaging systems for photographers (i.e. ‘lens+digital camera’) and publish results online. You can also measure your own equipment relatively easily to determine how sharp your hardware is. However comparing results from site to site and to your own can be difficult and/or misleading, starting from the multiplicity of units used: cycles/pixel, line pairs/mm, line widths/picture height, line pairs/image height, cycles/picture height etc.
This post will address the units involved in spatial resolution measurement using as an example readings from the slanted edge method.
Determining the Signal to Noise Ratio (SNR) curves of your digital camera at various ISOs and extracting from them the underlying IQ metrics of its sensor can help answer a number of questions useful to photography. For instance whether/when to raise ISO; what its dynamic range is; how noisy its output could be in various conditions; or how well it is likely to perform compared to other Digital Still Cameras. As it turns out obtaining the relative data is a little time consuming but not that hard. All you need is your camera, a suitable target, a neutral density filter, dcraw and free ImageJ, Octave or (pay) Matlab.
Effective Quantum Efficiency as I calculate it is an estimate of the probability that a visible photon – from a ‘Daylight’ blackbody radiating source at a temperature of 5300K impinging on the sensor in question after making it through its IR filter, UV filter, AA low pass filter, microlenses, average Color Filter – will produce a photoelectron upon hitting silicon:
One of the fairest ways to compare the performance of two cameras of different physical characteristics and specifications is to ask a simple question: which photograph would look better if the cameras were set up side by side, captured identical scene content and their output were then displayed and viewed at the same size?
Achieving this set up and answering the question is anything but intuitive because many of the variables involved, like depth of field and sensor size, are not those we are used to dealing with when taking photographs. In this post I would like to attack this problem by first estimating the output signal of different cameras when set up to capture Equivalent images.
It’s a bit long so I will give you the punch line first: digital cameras of the same generation set up equivalently will typically generate more or less the same signal in independently of format. Ignoring noise, lenses and aspect ratio for a moment and assuming the same camera gain and number of pixels, they will produce identical raw files. Continue reading Equivalence and Equivalent Image Quality: Signal→
You have obtained a raw file containing the image of a slanted edge captured with good technique. How do you get the MTF curve of the camera and lens combination that took it? Download and feast your eyes on open source MTF Mapper by Frans van den Bergh. No installation required, simply store it in its own folder.
My preferred method for measuring the spatial resolution performance of photographic equipment these days is the slanted edge method. It requires a minimum amount of additional effort compared to capturing and simply eye-balling a pinch, Siemens or other chart but it gives immensely more, useful, accurate, quantitative information in the language and units that have been used to characterize optical systems for over a century: it produces a good approximation to the Modulation Transfer Function of the two dimensional Point Spread Function of the camera/lens system in the direction perpendicular to the edge.
Much of what there is to know about a system’s spatial resolution performance can be deduced by analyzing such a curve, starting from the perceptually relevant MTF50 metric, discussed a while back. And all of this simply from capturing the image of a black and white slanted edge, which one can easily produce and print at home.
Why Raw? The question is whether one is interested in measuring the objective, quantitative spatial resolution capabilities of the hardware or whether instead one would prefer to measure the arbitrary, qualitatively perceived sharpening prowess of (in-camera or in-computer) processing software as it turns the capture into a pleasing final image. Either is of course fine.
My take on this is that the better the IQ captured the better the final image will be after post processing. In other words I am typically more interested in measuring the spatial resolution information produced by the hardware comfortable in the knowledge that if I’ve got good quality data to start with its appearance will only be improved in post by the judicious use of software. By IQ here I mean objective, reproducible, measurable physical quantities representing the quality of the information captured by the hardware, ideally in scientific units.
You want to measure how sharp your camera/lens combination is to make sure it lives up to its specs. Or perhaps you’d like to compare how well one lens captures spatial resolution compared to another you own. Or perhaps again you are in the market for new equipment and would like to know what could be expected from the shortlist. Or an old faithful is not looking right and you’d like to check it out. So you decide to do some testing. Where to start? Continue reading How Sharp are my Camera and Lens?→
Now that we know how to determine how many photons impinge on a sensor we can estimate its Effective Quantum Efficiency, that is the efficiency with which it turns such a photon flux into photoelectrons ( ), which will then be converted to raw data to be stored in the capture’s raw file:
I call it ‘effective’ because it represents the probability that a photon arriving on the sensing plane from the scene will be converted to a photoelectron by a typical digital camera sensor. It therefore includes the effect of microlenses, fill factor, CFA and other filters on top of silicon in the pixel. It is usually expressed as a percentage. For instance if an average of 100 photons per pixel within the sensor’s passband were incident on a uniformly lit spot of the sensor and on average each pixel produced a signal of 20 photoelectrons we would say that the Effective Quantum Efficiency of the sensor is 20%. Clearly the higher the eQE the better for Image Quality parameters such as SNR and DR. Continue reading What is the Effective Quantum Efficiency of my Sensor?→
This is a recurring nightmare for a new photographer: they head out with their brand new state-of-the art digital camera, capture a set of images with a vast expanse of sky or smoothly changing background, come home, fire them up on their computer, play with a few sliders and … gasp! … there are visible bands (posterization, stairstepping, quantization) all over the smoothly changing gradient. ‘Is my new camera broken?!’, they wonder in horror.
How many photons impinge on a pixel illuminated by a known light source during exposure? To answer this question in a photographic context we need to know the area of the pixel, the Spectral Power Distribution of the illuminant and the relative Exposure.
We know the pixel’s area and we know that the Spectral Power Distribution of a common class of light sources called blackbody radiators at temperature T is described by Spectral Radiant Exitance – so all we need to determine is what Exposure this irradiance corresponds to in order to obtain the answer.
How many photons are emitted by a light source? To answer this question we need to evaluate the following simple formula at every wavelength in the spectral range we are interested in and add the values up:
The astute reader will have realized that the units above are simply . Written more formally:
When first approaching photographic science a photographer is often confused by the unfamiliar units used. In high school we were taught energy and power in radiometric units like watts (W) – while in photography the same concepts are dealt with in photometric units like lumens (lm).
Once one realizes that both sets of units refer to the exact same physical process – energy transfer – but they are fine tuned for two slightly different purposes it becomes a lot easier to interpret the science behind photography through the theory one already knows.
It all boils down to one simple notion: lumens are watts as perceived by the Human Visual System.
How many visible photons hit a pixel on my sensor? The answer depends on Exposure, Spectral power distribution of the arriving light and pixel area. With a few simplifying assumptions it is not too difficult to calculate that with a typical Daylight illuminant the number is roughly 11,850 photons per lx-s per micron. Without the simplifying assumptions* it reduces to about 11,260. Continue reading How Many Photons on a Pixel→
I measured the Spectral Photon Distribution of the three CFA filters of a Nikon D610 in ‘Daylight’ conditions with a cheap spectrometer. Taking a cue from this post I pointed it at light from the sun reflected off a gray card and took a raw capture of the spectrum it produced.
An ImageJ plot did the rest. I took a dozen captures at slightly different angles to catch the picture of the clearest spectrum. Shown are the three spectral curves averaged over the two best opposing captures, each proportional to the number of photons let through by the respective Color Filter. The units on the vertical axis are raw black-subtracted values from the raw file (DN), therefore the units on the vertical axis are proportional to the number of incident photons in each case. The Photopic Eye Luminous Efficiency Function (2 degree, Sharpe et al 2005) is also shown for reference, scaled to the same maximum as the green curve (although I have a feeling it is in units of energy, my bad). Continue reading Nikon CFA Spectral Power Distribution→
The key variable as far as the tolerances required to position the lens for accurate focus are concerned (at least in a simplified ideal situation) is an appropriate approximate distance between the desired in-focus plane and the actual in-focus plane (which we are assuming is slightly out of focus). It is a distance in the direction perpendicular to the x-y plane normally used to describe position of the image on it, hence the designation delta z, or dz in this post. The lens’ allowable focus tolerance is therefore +/- dz, which we will show in this post to vary as the square of the format’s diagonal. Continue reading Focus Tolerance and Format Size→
Is MTF50 a good proxy for perceived sharpness? It turns out that the spatial frequencies that are most closely related to our perception of sharpness vary with the size and viewing distance of the displayed image.
For instance if an image captured by a Full Frame camera is viewed at ‘standard’ distance (that is a distance equal to its diagonal) the portion of the MTF curve most representative of perceived sharpness appears to be around MTF90. On the other hand, when pixel peeping, the spatial frequencies around MTF50 look to be quite a decent indicator of it. Continue reading MTF50 and Perceived Sharpness→
The in-camera ISO dial is a ballpark milkshake of an indicator to help choose parameters that will result in a ‘good’ perceived picture. Key ingredients to obtain a ‘good’ perceived picture are 1) ‘good’ Exposure and 2) ‘good’ in-camera or in-computer processing. It’s easier to think about them as independent processes and that comes naturally to you because you shoot raw in manual mode and you like to PP, right? Continue reading Exposure and ISO→
Deconvolution is one of the processes by which we can attempt to undo blurring introduced by our hardware while capturing an image. It can be performed in the spatial domain via a kernel or in the frequency domain by dividing image data by one or more Point Spread Functions. The best single deconvolution PSF to use when Capture Sharpening is the one that resulted in the blurring in the first place: the System PSF. It is often not easy or practical to determine it. Continue reading What is the Best Single Deconvolution PSF to Use for Capture Sharpening 1?→
A Point Spread Function is the image projected on the sensing plane when our cameras are pointed at a single, bright, infinitesimally small Point of light, like a distant star on a perfectly dark and clear night. Ideally, that’s also how it would appear on the sensing material (silicon) of our camera sensors: a singularly small yet bright point of light surrounded by pitch black. However a PSF can never look like a perfect point because in order to reach silicon it has to travel at least through an imperfect lens (1) of finite aperture (2), various filters (3) and only then finally land typically via a microlens on a squarish photosite of finite dimensions (4).
Each time it passes through one of these elements the Point of light is affected and spreads out a little more in slightly different ways, so that by the time it reaches silicon it is no longer a perfect Point but it is a slightly blurry Point instead: the image that this spread out Point makes on the sensing material is called the System’s Point Spread Function. It is what we try to undo through Capture Sharpening. Continue reading Point Spread Function and Capture Sharpening→
We have seen in the previous post how the radius for deconvolution capture sharpening by a Gaussian PSF can be estimated for a given setup in well behaved and characterized camera systems. Some parameters like pixel aperture and AA strength should remain stable for a camera/prime lens combination as f-numbers are increased (aperture is decreased) from about f/5.6 on up – the f/stops dear to Full Frame landscape photographers. But how should the radius for generic Gaussian deconvolution change as the f-number increases from there? Continue reading Deconvolution PSF Changes with Aperture→
The following approach will work if you know the MTF50 in cycles/pixel of your camera/lens combination as set up at the time that the capture you’d like to sharpen by deconvolution with a Gaussian PSF was taken.
When capturing a typical photograph light from one or more sources is reflected from the scene, reaches the lens, goes through it and eventually hits the sensing plane. Continue reading What Is Exposure→