The Nikon Z7’s Insane Sharpness

Ever since getting a Nikon Z7 MILC a few months ago I have been literally blown away by the level of sharpness it produces.   I thought that my surprise might be the result of moving up from 24 to 45.7MP, or the excellent pin-point focusing mode, or the lack of an Antialiasing filter.  Well, it turns out that there is probably more at work than that.

This weekend I pulled out the largest cutter blade I could find and set it up rough and tumble near vertically about 10 meters away  to take a peek at what the MTF curves that produce such sharp results might look like.

Un-Believable Spatial Resolution

I was again blown away, clocking MTF50 at over 100 lp/mm in the green raw channel of the Z 50mm/1.8 S at f/4  in the center – that’s over 0.45 cycles per pixel, 2500 lp/ph, 5000 lw/ph, see the article on spatial frequency units.

Figure 1. Green raw channel MTF curve from Nikon Z7 + Z50mm/1.8S @f/4 in the center, vertical slanted edge, courtesy of open source MTF Mapper.

When I first saw these results I thought I did something wrong because, you see, these results are not just insane, they are at first glance theoretically impossible.  This is what theory says is the best a perfect lens and perfect square pixel-pitch footprint can be expected to do:

Figure 2. Solid curves: MTF of otherwise perfect circular aperture lens @ f/4 combined with perfect square pixel of 100% fill factor and increasing perfectly ‘focused’ 3rd order spherical aberrations; Dots: measured MTF curve of Z7 with Z 50mm/1.8S @ f/4 in the center of the FOV.  Units are lp/ph.

The solid lines represent the combined MTF of a perfect pixel and an ‘in-focus’ lens of increasing third order spherical aberrations.  The dots are one of my actually measured Z7 MTF curves at f/4 in the center, set up as described.

You’ll note that where the Z7+50mm MTF meet the 0.5 level (MTF50) the dots are actually above the nearest solid line, which blue line represents a pixel seeing pure diffraction.  And that’s physically impossible, because diffraction puts an upper limit on the actual performance of practical photographic lenses filled with air.

Same Pitch, Smaller Effective Pixel Aperture

Well, it’s impossible if the combined effect of sensor photodiode, microlenses and filters is a perfectly even square filling the entire area of a pixel, what is referred to as pixel aperture.  Such a perfect pixel footprint acts as a low-pass filter, somewhat smearing the continuous image.  But if the effective area of the pixel is reduced, for example by using smaller microlenses, so will be the smearing.  For instance, this is the same perfect lens setup as above, with a square pixel aperture and 75% fill factor:

Figure 3. Same as above but pixel with square 75% effective fill factor.  Units are lp/ph.

Aha, now my Z7+50mm/1.8 @f/4 in the center are back in the realm of (still to me mind boggling) reality.

We could have told that effective fill factor was less than that of the full uniform square suggested by pitch because the first MTF null in Figure 1 occurs at a horizontal spatial frequency higher than 1 cycle/pixel.  Recall that the spatial frequency response of a perfect square pixel alone is a consequence of sinc functions that hit a first zero at 1 c/p horizontally and vertically as follows:

Figure 4. Spatial Frequency Response of 100% fill factor perfect square pixel in isolation, in the horizontal/vertical directions.

Since the MTF of such a pixel gets multiplied frequency by frequency by lens blur MTF, if one of the two is zero so will be the resulting system performance there.  In this case a system first zero above one c/p suggests an effective pixel fill factor of less than 1 in the measured direction (vertical blade = horizontal direction), as better explained in the pixel aperture article.

In Figure 1 it appears that the first null in the horizontal direction occurs around 1.12 c/p, which would be equivalent to an ideal uniform square pixel footprint the inverse of that or 0.89 px linearly on the side, therefore 0.79 times its area. So this capture suggests about a 79% fill factor compared to a perfect uniform square pixel, +/- measurement error, which as mentioned in this case may not be inconsequential because of my rough setup:  in this rough Z7 test set I have seen estimated nulls suggesting possible effective fill factors relative to an ideal square pixel area in the 65-80% range.

Pixel Aperture Effective Shape

Nor does the reduced fill factor necessarily need to come from a reduced square footprint: shape also has an effect.  Alternatively to having simply shrunk the default square footprint, Nikon could instead have designed the resulting pixel aperture PSF to have a rounder shape, thereby reducing effective fill factor and directional effects at the same time.  For instance if effective pixel aperture were to be made to look like a perfect disk just fitting inside the ideal pixel footprint, with diameter equal to pitch, its MTF would look like the orange curve below:

Figure 5. Modulation Transfer Function of uniform square and circular area with diameter equal to pixel pitch.  The disk MTF is the Sombrero, Jinc, Besinc function with first zero at 1.2197c/p.

In this case the effective fill factor calculated as the area of a disk inside the perfect square reference is about \frac{\pi}{4} or 78.5%.   Compared to typical gapless designs, such a disk-like pixel aperture clearly passes more energy at higher frequencies and has the added benefit of being isotropic.  On the other hand it is less efficient at collecting photons falling on the area of a pixel by the effective fill factor and it could result in more aliasing.

In this page effective fill factor figures are however computed assuming that pixel aperture’s PSF is a shrunk version of the reference square, unchanged in shape, which may result in an overestimation of the shrinkage, as better explained in the article linked above.  Without knowing the shape of the actual PSF, the rationale for making this assumption is that I have been told in the past that engineers strive for a square active area and the resulting PSF has the approximate shape of a square pillow, either of which may no longer be true in these days of back side illumination.

No Free Lunch: Lower QE, Higher Aliasing

Nikon is not the first to have thought of this trick, actually having taken a page from some Medium Format backs and Fuji (GFX-50, see Figure 8 in the link above, though its pitch is almost 25% greater than the Z7’s).  If your audience is mainly landscapers, who capture mainly natural scenes where aliasing and moiré are harder to spot – and you are trying to reduce sensitivity to achieve lower base ISOs – it makes sense to reduce the effective area of your pixels.  Not so of course if you are after maximum photon collection efficiency.

This may also explain in part why the Z7’s effective Quantum Efficiency (eQE) – as estimated from dxomark.com’s ‘Full SNR Curves’ – seems to be about 20% lower than the Z6’s and why the Z7’s pixel response non uniformity (PRNU) appears to be higher than the Z6’s:

Figure 6. Analysis of dxomark.com’s Full SNR curve data for Nikon Z7 and Z6 MILCs. The data refers to a weighted average of the performance of each individual color channel.

If the Nikon engineering design brief was to have a lower base ISO, effective fill factor is one of the easiest parameters to vary. I would have preferred that they tweak the CFA instead (perhaps at the same time aiming for better ‘accuracy’) – and maybe they did.  But if more ‘sharpness’ was required, given these outstanding S lenses, there aren’t many other options than reducing effective fill factor.

Responsible Processing

With great sharpness comes great responsibility.  As you can see with the MTF curve in Figure 1 owners of a Z7 get some of the highest spatial resolution possible today in a Full Frame camera below the monochrome Nyquist frequency (0.5 c/p) – but as a consequence they also get some of the highest potential energy beyond that, which can cause visible aliasing and moiré.

Capture sharpening during raw conversion will enhance both good and bad frequencies, bringing aliasing to the fore more quickly than in the past.  I find myself using tiny RL deconvolution radii of around 0.6px for capture sharpening Z7 raw images, and even then.  So it behooves the Z7 owner to be consciously aware of the incredibly sharp tool that Nikon put into our hands and apply sharpening with humility and restraint.

Z7: A Landscaper’s Camera

Conclusions:

  1. The effective fill factor of the Nikon Z7’s pixels appears to be less than 1 in the horizontal direction and likely between 70-80%  of the ideal uniform square area suggested by pixel pitch
  2. This results in insane ‘sharpness’, so go easy with capture sharpening
  3. The downside is lower QE and greater potential for visible aliasing and moirè

The more I play with the Z7, the more it seems to me that the camera was designed with landscapes and Medium Format in mind: lower ISO and better ‘sharpness’, aliasing and QE be darned.  Time will tell, but as a landscaper often working around f/8 that’s been OK with me so far.

 

*BTW, my thoughts here are just conjectures and as mentioned the setup was rough so I welcome everybody’s thoughts and independent validation.

 

13 thoughts on “The Nikon Z7’s Insane Sharpness”

  1. Jack, many thanks for this investigation.

    Are the above measures on all raw channels combined ? . Are the same phenomena true for R & B channels when measured separatelly ?

    1. Hi Ilias,

      Figure 1 refers to the horizontal spatial frequencies of the raw green channel alone. As mentioned I do not have the equipment for a professional test, I used pin-point focusing which correctly maximizes the green channel at the expense of the other two, and I was not able to do a reliable through-focus set: the Focus Shift function in the Z7 is too coarse for it even with spacing at minimum, I don’t have a suitable focus rail and I haven’t figured out how to precisely control the focus by wire manually.

      In addition the edge at a 10+ meter distance is only about 70 pixels long so the R and B pixel response is noisy in the higher frequencies (because of the Bayer layout there are half as many R and B pixels as G to project onto ESF->LSF->MTF). But yes, in the proper conditions I would expect that R and B MTF curves would show similar nulls and therefore effective FF.

  2. Hi Jack,

    As far as Z 7 is considered, effective light-receiving area (in relation to pixel size) is actually not the prime suspect of IMX309’s (D850, Z 7) lower QE. While the PD opening area has been an issue for IMX161 (GFX50, X1D) and many older FSI sensors due to the blockage of wiring, it has ceased to plague the QE after the implementation of BSI.

    Machines today with sub-100 ISO achieve this by allowing a higher full well capacity, in effect, while the fill efficiency stayed the same, the pool of photons got larger. Thus the greater DR and the seemingly slower saturation rate.

    For being an older design (early 2016), the IMX309 does have a tad less QE than IMX410 (early 2018), but not as much as DxO claimed to be. DxO has had a known history of producing unreliable data, and in this case, I suspect they have measured QE per pixel per channel, but not over the same sampling area. Obviously you get more crap as with the sqrt(incident) law as you start to shrink the sample area. If we take that into account, we will find the “20% decrease” is fairly aligned with the difference of the sqrt(per px area).

    I would have called aliasing the same case, where you shrink the Z 7’s output to 24MP, it would actually be less aliased than those native 24MP images taken w/o an OLPF, due to the substantial supersampling. In that case, we can just do channel-specific smoothing in post to remove higher frequencies, to simulate an OLPF. Or just frequency-specific filtering for a more coarse effect, like you’ve explained with deconv.

    1. Hi W,

      Good information, thanks. Effective fill factor does not depend on light-receiving area only: it also depends on the presence, size and shape of microlenses and other structures before it. So for a given exposure, more or less photons are collected depending on effective fill factor even though pixel pitch is the same. This physical quality is what I refer to as effective QE. Since Exposure represents a number of photons per unit area, effective QE is independent of the area defined by pixel pitch. And FWC (the size of the glass) is independent of effective QE (how quickly it fills when it rains).

      Bernard Delley has attempted to measure absolute QE for a Nikon D850, also based on the IMX309, directly and came up with results broadly consistent with what is suggested by the DXO data. To be clear, DXO did not measure QE. They produced what they call ‘Full SNR’ curves from which I extracted a relative estimate of effective QE averaged over the four color channels.

      Good point about the lower aliasing, the main advantage of having more pixels for most of us, all else equal.
      Jack

      1. Hi Jack,

        I’ve confirmed with my source, IMX309 was indeed deliberately cut in QE with a reduced pixel well design to further achieve low ISO at a frugal cost. This was not very smart of Nikon.

        I mistook that previously as this is not the common case with high-res low-ISO sensors. IMX094, IMX251 and IMX455 all had comparable absolute QE with the same generation of lower res sensors.

        OCL are also much of non-issues in modern sensors. Most, if not all BSI sensors has achieved full gapless construction and good waveguide even at edges. CFAs maybe, depending on the dye absorption and thickness.

        The major impacting factor was pixel well design. Including pixel well insulation, diffractors at top to keep photons in, and whether PDs are splitted (for dual-pixel AF). In designing IMX309, Nikon has chosen to take away the diffractors in exchange for better yield rate and cheaper price, but resulting a higher escape rate of incident photons.

        Here is an article explaining these. I only found it in Japanese, but Google Translate seemed to worked sufficiently well pc.watch.impress.co.jp/docs/news/event/1098397.html

        Also thanks for the test data, it was good to see something concrete.

        1. Interesting information W. The article is on the infrared performance of smartphone pixels, I wonder whether it applies to Full Frame sensors under visible light.

          1. Yes, they work the same way under visible and most of UV. Most QE improvements we see to sensors these recent years, low-res and high-res alike, are direct products of these little things.

  3. Perhaps this article will clarify the topic of extraordinary sharpness (requires translation from Polish)

    “www.optyczne.pl/412.4-Test_aparatu-Nikon_Z7_Rozdzielczo%C5%9B%C4%87.html”

    1. Thank you, though I am not a fan of measuring hardware performance after subjective rendering adjustments have been applied.

      Jack

  4. “If your audience is mainly landscapers, who capture mainly natural scenes where aliasing and moiré are harder to spot – and you are trying to reduce sensitivity to achieve those high Dynamic Ranges at sub 100 ISOs – it makes sense to reduce the effective area of your pixels.”

    I am not sure that I follow. How would this help for dynamic range, since it affects photon collection across the board?

    1. Hi Mario,

      In a landscaping situation, where exposure time is somewhat of a flexible variable, a somewhat lower QE doesn’t need to affect DR: simply somewhat increase Exposure to compensate. Likewise, a smaller pixel aperture could result in more aliasing and moirè, which become especially evident with man-made subjects with regular high-frequency patterns, like architecture and fabric. On the other hand such patterns are exceedingly rare in nature so aliasing and its derivatives are much harder to notice. If you know where to go look for it you will find it though and once you do it’s hard to unsee it.

      Jack

      1. Thanks for the response. QE not affecting DR was indeed my understanding, which is why I was surprised to read this, since the sentence almost makes it sound as though reducing sensitivity is a means to achieve the high dynamic range.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.