Several sites perform spatial resolution ‘sharpness’ testing of imaging systems for photographers (i.e. ‘lens+digital camera’) and publish results online. You can also measure your own equipment relatively easily to determine how sharp your hardware is. However comparing results from site to site and to your own can be difficult and/or misleading, starting from the multiplicity of units used: cycles/pixel, line pairs/mm, line widths/picture height, line pairs/image height, cycles/picture height etc.
This post will address the units involved in spatial resolution measurement using as an example readings from the slanted edge method.
The slanted edge method produces a Modulation Transfer Function for a given hardware set up, that is a curve that shows how well spatial resolution information is transferred from the scene to (ideally) the raw data. Its natural units are cycles per pixel. To see why let’s follow the method step by step.
Cycles/pixel: Natural Units of MTF in Photography
The profile of the intensity of light reflected by the edge (rotated so that it is perfectly vertical) is shown below. Refer to the earlier link if you are interested in understanding how the ESF can be generated to that level of precision (key word = super-sampling).
The dark portion of the edge is on the left, the bright portion is on the right. The vertical axis represents raw levels normalized to 16 bit precision, which are proportional to the recorded intensity. Since the sensing elements are uniformly spaced pixels, the units of the horizontal axis are simply the distance center-to-center between contiguous pixels horizontally, otherwise known as horizontal pixel pitch. When dealing with units pixel pitch is often shortened just to ‘pixel’ , as for example in cycles/pixel (c/p).
The ESF of a non-existent perfect imaging system would ideally be a step function with the transition occuring at zero. However, blurring introduced by the physical hardware (the aperture, the lens, the sensor stack and sensor) spreads the step out to a stretched S shape as shown above. The shorter the rise in pixels, the better the performance of the lens/camera combination. As a first approximation we could arbitrarily say that this lens/sensor combination produces an edge which rises from 10% to 90% intensity within the space of a couple of pixels (center-to-center = pixel pitch).
By taking the differential of the ESF we obtain a Line Spread Function, equivalent to the one dimensional intensity profile in the direction perpendicular to the edge that a distant, perfectly thin white line against a black background would project on the imaging plane, as captured in the raw data. The units of the horizontal axis are still the distance between two contiguous pixels in the direction under consideration:
A perfect imaging system would record the profile of the line as a spike of vertical intensity at zero pixels only. In practice that’s physically impossible but clearly the narrower the LSF is spread out in terms of pixels the better its performance. We could arbitrarily say for instance that one ‘line’ fits within about four pixels, from dark to bright to dark again.
A Line Pair
What would happen on the imaging plane if we had more than one such line, parallel and side by side? Assuming the lines were lit incoherently (mostly true in nature) the resulting pattern on the imaging plane would simply be the sum of the individual LSFs, point by point, as represented by the continuous red curve below. That’s the intensity profile that would be recorded in the raw data from projections of two distant perfectly thin lines against a black background.
Two lines is one line pair or – interchangeably if you are a person of science – a cycle. The cycle refers to the bright to dark to bright transitions, in the case of the line pair above it goes peak-to-peak in 2.5 pixels. Spatially we would say that the period of one cycle (or one line pair) is 2.5 pixels. Frequency is one over the period so we could also say that the spatial frequency corresponding to this line spacing is 0.4 cycles/pixel ( or equivalently line pairs per pixel).
Clearly the resulting intensity swing from brightest point to darkest point in between the two lines changes depending on how much their line spread functions overlap. This relative intensity swing is called Michelson Contrast, which is directly related to our ability to see detail in the image. If no contrast is lost at a specific line spacing (spatial frequency) it means that our imaging system is able to transfer the full intensity swings present in the scene to the raw data. If on the other hand all contrast is lost (that is our imaging system is only able to record a uniform intensity where originally there was contrast) it means that no spatial resolution information from the scene was captured at that spatial separation/frequency. The wider the line spread function and/or the closer the two lines are spaced, the more the overlap and the more the lost contrast – hence the more the lost ‘sharpness’. See here for a slightly different take on this.
Measuring Response to All Frequencies at Once
The loss of contrast at decreasing spatial separation – or inversely at increasing spatial frequency – is what the slanted edge method measures objectively and quantitatively for a given imaging system in one go. It achieves this feat by taking the Fourier transform of the LSF as determined above (which can be considered the projection in one dimension of the two dimensional Point Spread Function), computing its normalized modulus and generating what we are after: the contrast transfer function of the imaging system in that one direction – commonly known as its Modulation Transfer Function.
The MTF is normalized by definition so that its peak is one. One means all possible contrast information present in the scene was transferred, zero means no spatial resolution information (detail) was transferred to the raw file. The MTF curve below shows how much the contrast of a figurative set of ‘lines’ is attenuated as a function of the spatial frequency (one divided by the given spatial separation) indicated on the horizontal axis. As we have seen the units of frequency are naturally cycles per pixel pitch, or often cycles/pixel for short.
In photographic applications MTF curves of unprocessed images always decrease monotonically from a peak at zero frequency (also known as DC), which occurs when the distance between two lines (the period) is infinite – In such a case no matter how wide each individual line spread function is, the system is assumed to be able to transfer all possible contrast to the raw data.
One may be interested to know at what spatial frequency the imaging system is only able to transfer half of the possible contrast available at the scene to the raw data. We can simply read off the curve the frequency that corresponds to an MTF value of 0.5, customarily referred to as MTF50. In this case MTF50 occurs when the imaging system above is presented with figurative lines of detail alternating at a spatial frequency of 0.27 cycles/pixel (that is the peaks of a line pair are separated by about 3.7 pixels). If one does not have access to the whole curve MTF50 is considered to be a decent indicator for perceived sharpness.
The slanted edge method relies on information from the raw data only. It doesn’t know how tall the sensor is or how far apart the pixels are physically. Without additional information it can only produce the MTF curve as a function of the units for distance it knows: samples at pixel spacing in the raw data. So cycles per pixel pitch (often shortened to cycles/pixel,cy/px and c/p) are the natural units of the MTF curve produced by the slanted edge method – recall by the way that it is a one dimensional result which only applies in the direction normal to the edge.
Converting to Useful Units: lp/mm, lw/ph,…
If we have additional information, for instance how far pixels are apart or how many usable pixels there are in the sensor – and we typically do – we can easily convert cycles per pixel pitch into some of the other spatial resolution units often seen in photography. For instance the D800e sensor’s pixel pitch is around 4.8um, so 0.27cycles/pixel from the above MTF50 reading would correspond to 56.3 cycles/mm on the sensor as captured by the given imaging system:
56.3 cy/mm = 0.27cy/px / (4.8um/px) * 1000um/mm.
Watch how the units cancel out to yield cycles per mm. One cycle is equivalent to one peak-to-peak contrast swing – or a line pair (lp). Units of line pairs per mm (lp/mm) are useful when interested in how well an imaging system performs around a specific spot of the capture (say the center), along that one direction.
But in practice, do 110lp/mm in the center of an RX100III capture represent better spatial resolution IQ (aka ‘sharpness’) in final images viewed at the same size than 56.3lp/mm in the center of a D800e capture?
Of course not. The D800e’s bigger sensor (24mm on the short side) will be able to fit more line pairs along its height than the smaller RX100III’s (8.8mm on the short side). More line pairs in a displayed image viewed at the same size mean better spatial resolution. Watch again how the units cancel out to yield line pairs per picture height (ph):
D800e = 1351 lp/ph (= 56.3lp/mm * 24mm/ph)
RX100III = 968 lp/ph (= 110lp/mm * 8.8mm/ph).
Units of line pairs per picture height are useful when comparing the performance of two imaging systems with the final image viewed at the same size. Picture Height (ph) is used interchangeably with Image Height (ih).
Sometimes line widths (lw) are used instead of line pairs (lp) or cycles (cy). It’s easy to convert between the three because there are two line widths in one line pair ( or equivalently one cycle), so 1351 lp/ph correspond to 2702 lw/ph.
The same result could have been obtained simply by multiplying the original measurement in cycles per pixel pitch by the number of pixels on the side. For instance the D800e has 4924 usable pixels on the short side, so in lp/ph that would be
1330 lp/ph = 0.27 c/p * 4924 p/ph [ * 1 lp/c]
which of course would be equivalent to 2659 lw/ph. The figures are not identical because of slight inaccuracies in the information. The earlier figures rely on the D800e’s pixel pitch being exactly 4.80um and its usable sensor height being exactly 24.0mm, either of which dimension could be slightly off. The latter figures for picture height are the more precise of the two because they rely only on the number of effective image pixels available for display, which is an accurately known number.
Convention: Landscape Orientation
By convention the displayed image is assumed to be viewed in landscape orientation, so spatial resolution per picture ‘height’ is normally calculated by multiplying by the shorter sensor dimension. One could make a case that the length of the diagonal should be used instead to somehow level the playing field when aspect ratios differ significantly between sensors – but aspect ratio typically only makes a small difference to the final result so in practice it is often ignored.
One could also make a case that MTF figures should be given for more than one direction and in various key spots in the field of view in order to determine more completely the performance of an imaging system.
Aside from lens astigmatism some current sensors have anti-aliasing filters active in one direction only, so that MTF can be quite different in one direction versus its perpendicular. In such cases if the captured detail is not aligned with either direction the actual spatial resolution performance of the system will vary sinusoidally from maximum to minimum depending on the relative angle of the detail to the strength of the astigmatism/AA. With a one-directional AA the manufacturer is counting on the fact that detail in natural scenes is typically not all aligned in the same direction so the effective resolution tends to average itself out, though this is often not the case with man-made subjects.
In the face of these many variables one could not be faulted for simply averaging orthogonal MTF readings, as many sites do.