This article and the following one will discuss the effect on resolution of digitizing a continuous optical image.
The sampling process carried out by the sensor results in digital values corresponding to an intensity at each pixel’s location. These so-called Data Numbers are stored ideally as-is in the raw file and are proportional to infinitesimal point samples of a new continuous image: the optical image smoothed by the characteristics of the pixels’ effective active area, known as the pixel aperture function.

Smoothing by a finite pixel area reduces resolution.
Pixel Aperture, IPS, PSF
A pixel’s effective active area is typically known as its aperture in imaging, though when produced as an intensity map it is also referred to as Intra Pixel Sensitivity or pixel Point Spread Function, depending on context.
The function depends on the sensor’s physical characteristics: whether it is Front or Back Side Illuminated, its architecture, presence and type of microlenses, filters, etc.

As far as we are concerned it is just a detailed two-dimensional map of how likely photons impinging on different portions of the area of a pixel are to be captured. Think of it as a wavelength dependent two-dimensional filter representing sub-pixel Quantum Efficiency weights.
In the rain-drops-in-bucket analogy, the effective aperture of a pixel can be envisioned as an uneven, leaky cheese cloth that funnels photons to a flow meter near its center. Its leakiness and shape can vary, from square to round to pillow-like as determined by its physical make up.
For instance, this is how the folks at RIT measured Intra Pixel Sensitivity of the BSI sensor on the Kepler space mission, you can find the relative paper in the notes:[1]

The intensity map to the left, and the contour map to the right, show unsurprisingly that the pixel is better at collecting photons landing near its center than towards the edges, the rate of change dependent on wavelength. The map is normalized to a relative maximum weight of one which it shows as pure white, worsening through increasingly dark grays, down to black for zero. The shapes become less symmetrical and well behaved towards the edges of the sensor. The scale above is large because that particular sensor is apparently used for infrared work.
Pixel PSF in the Spatial Domain
BSI pixel aperture can be modeled as the convolution of hyperbolic secants or gaussians with the ideal aperture of a square window function.[2] Below I used gaussians with the suggested standard deviation for 4
pitch BSI pixels with light around 550nm:

A single pixel in isolation is shown to emphasize the fact that the pixel’s PSF extends beyond the theoretical square area normally associated with it, because of physical effects like diffusion and crosstalk. These effects can extend to several pixels away but in practice today’s sensor manufacturers are working hard to contain them much closer to a pixel’s boundaries.[3]
Nevertheless it is obvious that the decreasing sensitivity towards the edges means that fewer impinging photons will make it through there than in the center, resulting in a ‘Fill Factor’ of potentially less than 100%, meaning that the effective area of the pixel is less than that implied by pixel pitch.
The ideal model of 100% Fill Factor would look like a perfectly sharp square and top hat respectively in the Figure above, with sides along the edges of the pixel. However, the depicted pixel would also qualify as 100% FF because of how it is modeled, with its effective area wider than the physical one without losses, as we will see in the next section.
Pixel PSF in the Frequency Domain
In the ideal case a perfect square aperture of width
shows a frequency response equal to a sinc with first null at 1/
. The above function is close to ideal so its Spatial Frequency Response is near that of a theoretically perfect square aperture up to 1 cycle per pixel pitch (c/p), though the gaussian used to model diffusion/cross talk abates it substantially at higher frequencies, as can be seen in the Figure below:

The coincident null at 1 c/p is to be expected in this case since the pixel aperture simulation is the result of multiplication of gaussians with a perfect square aperture response, which forces a zero there, see the Appendix.
However that is not a given when manufacturers tweak or remove microlenses in order to obtain an effectively smaller active area with the objective of reducing its smoothing function therefore maximizing ‘sharpness’, as we’ve seen in some recent Medium and Full Format landscape cameras. In that case the effective top hat width
would be less than pitch
and a first null induced by a lower Fill Factor would move to a higher frequency than the ideal 1 c/p.
Front Side Illuminated pixels have smaller sensing areas and their IPSs are less well behaved than Back Side Illuminated ones since they have to deal with interference and blockage by wiring, circuits and other structures in front of the photodiode. However most camera sensors have moved to BSI over the last decade or so, in part because a larger effective area portends a higher Signal to Noise Ratio.
For additional detail and insight on pixel aperture in the frequency domain refer to this earlier article.
The Sampling Process
In what follows we will assume for simplicity flat, perfectly square contiguous pixel apertures filling 100% of a pixel’s area and no more, with effective aperture width
the same as pitch
:

The job of pixels in an imaging sensor is to count (integrate) photons projected onto their area during Exposure – after attenuation by the pixel aperture function and losses due to photoelectric conversion (QE). The photoelectron count corresponding to each pixel is a local area sample of the intensity of the optical image. This in essence is what sampling an image means.
The camera then writes these values to the raw file up to a gain, so-called Data Numbers (DN), to be later retrieved in order to reconstruct the image for display. Ideally the values written to the raw file are unprocessed and maintain a linear relationship with the incoming light.
The integration of the intensity of an image in a moving neighborhood weighted by a function is known as 2D convolution. In the context of an imaging sensor, this mathematical concept is used to model the effect of the pixel’s physical structure.
Specifically, the light collection by the pixel aperture acts as a convolution of the continuous optical image projected by the lens onto the sensing plane, with the continuous pixel aperture function. The smoothed image resulting from this convolution can therefore also be considered to be continuous.
The camera then samples this smoothed, continuous image at discrete points corresponding to the center of each pixel.[*] While physically the pixel collects the total integrated charge across its area, mathematically, this process is modeled as taking an infinitesimally small point sample of the convolved image at each pixel’s center. This is accomplished by a Dirac delta function acting on the smoothed image, as shown in red in Figures 7 and 8.
In other words, the pixel aperture functions as an inherent, image-wide low-pass filter, convolved with the optical image. The smaller the effective photosensitive area, the further away the first null of the filter, and the less the image is smoothed before sampling occurs.

Note above right the smoothing and spreading out of the two perfectly imaged stars to the left, due to the filtering action of pixel aperture. Sampling is then brought home by a delta function picking the value of the intensity resulting from convolution at the location corresponding to the center of the pixel’s aperture (the red dots).
The Meaning of Raw Data
So as far as sampling is concerned, we can ideally think of values stored in the raw file as infinitesimally small point samples of the intensity of the image projected by the lens after smoothing by a single image-wide filter with the same characteristics as the pixel Aperture Function.
Below are the images of the two simulated stars in Figure 7 with their intensity projected to the horizontal axis for better clarity. It shows the effect of convolution by pixel aperture (orange curve) on the continuous optical image (blue curve) and the delta samples of the smoothed curve at the center of each pixel that make up the unprocessed raw data (sampled values are the height of the red deltas).

Image Reconstruction
The separation of the function of a pixel into two independent operations – filtering by its aperture followed by point sampling – has important implications because it allows immediate application of concepts like the
interpolation formula for ideal reconstruction; and more intuitive analysis in the frequency domain. The value of a pixel in the raw data represents a point on a curve, not a little square. [4]
In fact as long as the sampling rate is at least twice the highest spatial frequency present in the optical image, the Nyquist-Shannon theorem tells us that we should be able to reconstitute the smoothed continuous image from sampled data perfectly, at least in this ideal unprocessed, noiseless, monochrome case.[5]
In most photographic scenarios, the quality of the optical image is ultimately limited by diffraction caused by the nearly circular lens aperture. This imposes a maximum spatial frequency, known as the extinction frequency (
), which is equal to
, where
is the wavelength of light (e.g. in microns) and
is the lens working f-number.
This optical limit means that the Modulation Transfer Function (
) of the image is zero beyond
, effectively making any photographic image band limited. Since the act of light collection by the pixel aperture is modeled as a convolution with this optical image, the resulting signal’s spectrum is the multiplication of the optical
by the pixel aperture
. Because multiplication by zero is zero, the final sampled signal is also band limited by the optical
, regardless of the size of the pixel aperture (though a large aperture may impose an even stricter practical limit).
Therefore, to ensure perfect reconstruction independently of the subject (i.e., to satisfy the Nyquist criterion for the diffraction limit), we need the pixel pitch (sampling period) to be no greater than 0.5
.

Assuming mean wavelengths of about 0.55um and f-numbers in the f/4-11 range, that would require pixel pitches around 1-3 microns in typical landscape conditions.
That criterion is not quite met in Figure 8 so we could expect some aliasing in the displayed image, though its amount would be limited because fortunately in photography there is diminishing energy at higher frequencies as a result of imperfections in the setup.
We can therefore often relax that constraint, potentially accepting less than ideal reconstruction, as we all know given the 3-5 micron pitches of current enthusiast digital cameras. Some sensors close the gap by enabling IBIS to shift the sensor by sub-pixel amounts and superimposing multiple captures, something that works well with static subjects, reducing aliasing very effectively while leaving the spatial frequency response unaltered.
In general, absent further processing, the best we can aim for in digital imaging is to reconstruct the continuous image smoothed by pixel aperture, though advanced post processing like deconvolution can often help to get us closer to the geometric image.
The effect of digitization on resolution is next.
Appendix for the Mathematically Inclined
Assuming a continuous optical image
representing photon intensity on the sensing plane and an
x
pixel sensor (e.g. 6000×4000) on the (
) physical axis, Data Numbers written to the raw file at pixel index location (
) are equal to
(1) ![Rendered by QuickLaTeX.com \begin{align*} DN[m, n]_\lambda &= QE\!\cdot\! g\!\cdot\!\left[ I_c \ast h \ast a \right]\!(x, y) \cdot \text{III}\!\left(\frac{x}{p}, \frac{y}{p}\right) \quad \\ & \text{evaluated at physical } (x, y) = (mp, np) \end{align*}](https://i0.wp.com/www.strollswithmydog.com/wordpress/wp-content/ql-cache/quicklatex.com-dd1d80510c24b1182d767af25cbdda19_l3.png?resize=383%2C70&ssl=1)
where:
- all variables are wavelength dependent and
denotes 2D convolution
element-wise multiplication
quantum efficiency in photoelectron/photon
system gain in DN/photoelectron
filter stack effects that we are ignoring here (e.g. AA)
pixel aperture function
-1,
(integer column index)
-1,
(integer row index)
pixel pitch in microns
is the 2D Dirac comb.
In the frequency domain convolutions become multiplications and vice versa, so:
(2) ![]()
with
horizontal and vertical spatial frequencies respectively and 1/
the sampling frequency. See the article on physical units for how they are determined in practice.
Notes and References
1. “Direct measurement of the intra-pixel response function of the Kepler Space Telescope’s CCDs“, Vorobiev et al, 2018
2. “Improved Intra-Pixel Sensitivity Characterization Based on Diffusion and Coupling Model for Infrared Focal Plane Array Photodetector“, Li Zhong et al, 2021.
3. I have looked for IPS maps for current enthusiast and smartphone cameras without success. I would be happy to hear from anyone who could point me to some non-paywalled sources.
4. “A Pixel Is Not A Little Square, A Pixel Is Not A Little Square, A Pixel Is Not A Little Square“, Alvy Ray Smith, January 1995
5. Jim Kasson has an excellent series of posts showing the basics of image reconstruction and its limitations.
6. The code used to produce the above plots can be downloaded by clicking here.
This is a very nice article.
As pixel sizes decrease, and approach the wavelength in size, another approach has to be taken based on full electromagnetic theory. Some of the latest phone cameras can have pixels as small as 0.6 micron square. To carry out a similar analysis to the one above for these small pixels, one needs EM finite element modelling analysis, unfortunately. There are novel effects, one aspect of which is described in the open access paper by Catrysse et al, “Subwavelength Bayer RGB color routers with perfect optical efficiency”, Nanophotonics 11(10) 2381-2387 (2022).
Thanks Chris, reference appreciated, interesting new tech. One of the sources I looked at for current tech suggested a smaller standard deviation for smaller pixels, around 0.1 for 1 micron pixels. But it seemed short on facts so I look forward to more robust sources.