Optical resolution describes the ability of an imaging system to resolve detail in the object that is being imaged.
An imaging system may have many individual components including a lens and recording and display components. Each of these contributes to the optical resolution of the system, as will the environment in which the imaging is done.
The ability of a lens to resolve detail is usually determined by the quality of the lens but is ultimately limited by diffraction. Light coming from a point in the object diffracts through the lens aperture such that it forms a diffraction pattern in the image which has a central spot and surrounding bright rings, separated by dark nulls; this pattern is known as an Airy pattern, and the central bright lobe as an Airy disk. The angular radius of the Airy disk (measured from the center to the first null) is given by
where
θ is the angular resolution, λ is the wavelength of light, and D is the diameter of the lens aperture.Two adjacent points in the object give rise to two diffraction patterns. If the angular separation of the two points is significantly less than the Airy disk angular radius, then the two points cannot be resolved in the image, but if their angular separation is much greater than this, distinct images of the two points are formed and they can therefore be resolved. Rayleigh defined the somewhat arbitrary "Rayleigh criterion" that two points whose angular separation is equal to the Airy disk radius to first null can be considered to be resolved. It can be seen that the greater the diameter of the lens or its aperture, the finer the resolution. Astronomical telescopes have increasingly large lenses so they can 'see' ever finer detail in the stars.
Only the very highest quality lenses have diffraction limited resolution, however, and normally the quality of the lens limits its ability to resolve detail. This ability is expressed by the Optical Transfer Function which describes the spatial (angular) variation of the light signal as a function of spatial (angular) frequency. When the image is projected onto a flat plane, such as photographic film or a solid state detector, spatial frequency is the preferred domain, but when the image is referred to the lens alone, angular frequency is preferred. OTF may be broken down into the magnitude and phase components as follows:
where
and (ξ,η) are spatial frequency in the x- and y-plane, respectively.
The OTF accounts for aberration, which the limiting frequency expression above does not. The magnitude is known as the Modulation Transfer Function (MTF) and the phase portion is known as the Phase Transfer Function (PTF).
In imaging systems, the phase component is typically not captured by the sensor. Thus, the important measure with respect to imaging systems is the MTF.
Phase is critically important to adaptive optics and holographic systems.
Some optical sensors are designed to detect spatial differences in EM (electro-magnetic) energy. These include photographic film, solid-state devices (CCD, CMOS detectors, and infrared detectors like PtSi and InSb), tube detectors (vidicon, plumbicon, and photomultiplier tubes used in night-vision devices), scanning detectors (mainly used for IR), pyroelectric detectors, and microbolometer detectors. The ability of such a detector to resolve those differences depends mostly on the size of the detecting elements.
Spatial resolution is typically expressed in line pairs per millimeter (lppmm), lines (of resolution, mostly for analog video), contrast vs. cycles/mm, or MTF (the modulus of OTF)). The MTF may be found by taking the two-dimensional Fourier transform of the spatial sampling function. Smaller pixels result in wider MTF curves and thus better detection of higher frequency energy.
This is analogous to taking the Fourier transform of a signal sampling function; as in that case, the dominant factor is the sampling period, which is analogous to the size of the picture element (pixel).
Other factors include pixel noise, pixel cross-talk, substrate penetration, and fill factor.
A common problem among non-technicians is the use of the number of pixels on the detector to describe the resolution. If all sensors were the same size, this would be acceptable. Since they are not, the use of the number of pixels can be misleading. For example, a 2 megapixel camera of 20 micrometre square pixels will have worse resolution than a 1 megapixel camera with 8 micrometre pixels, all else being equal.
For resolution measurement, film manufacturers typically publish a plot of Response (%) vs. Spatial Frequency (cycles per millimeter). The plot is derived experimentally. Solid state sensor and camera manufacturers normally publish specifications from which the user may derive a theoretical MTF according to the procedure outlined below. A few may also publish MTF curves, while others (especially intensifier manufacturers) will publish the response (%) at the Nyquist limiting frequency, or, alternatively, publish the frequency at which the response is 50%.
To find a theoretical MTF curve for a sensor, it is necessary to know three characteristics of the sensor: the active sensing area, the area comprising the sensing area and the interconnection and support structures ("real estate"), and the total number of those areas (the pixel count). The total pixel count is almost always given. Sometimes the overall sensor dimensions are given, from which the real estate area can be calculated. Whether the real estate area is given or derived, if the active pixel area is not given, it may be derived from the real estate area and the fill factor, where fill factor is the ratio of the active area to the dedicated real estate area.

where
the active area of the pixel has dimensions a x b the pixel real estate has dimensions c x dIn Gaskill's notation, the sensing area is a 2D comb (x, y) function of the distance between pixels (the pitch), convolved with a 2D rect (x, y) function of the active area of the pixel, bounded by a 2D rect (x, y) function of the overall sensor dimension. The Fourier transform of this is a comb(ξ,η) function governed by the distance between pixels, convolved with a sinc(ξ,η) function governed by the number of pixels, and multiplied by the sinc(ξ,η) function corresponding to the active area. That last function serves as an overall envelope to the MTF function; so long as the number of pixels is much greater than one (1), then the active area size dominates the MTF.
Sampling function:
![\mathbf{ S(x,y) =
[comb(\frac{x}{c},\frac{y}{d}) \star \star
rect(\frac{x}{a}, \frac{y}{b})] \cdot
rect (\frac{x}{M \cdot c}, \frac{y}{N \cdot d}) }](http://upload.wikimedia.org/math/e/4/e/e4edffcf0898dfccd9445ea8771ef5c4.png)
where
the sensor has M x N pixels
An imaging system running at 24 frames per second is essentially a discrete sampling system that samples a 2D area. The same limitations described by Nyquist apply to this system as to any signal sampling system.
All sensors have a characteristic time response. Film is limited at both the short resolution and the long resolution extremes by reciprocity breakdown. These are typically held to be anything longer than 1 second and shorter than 1/10,000 second. Furthermore, film requires a mechanical system to advance it through the exposure mechanism, or a moving optical system to expose it. These limit the speed at which successive frames may be exposed.
CCD and CMOS are the modern preferences for video sensors. CCD is speed-limited by the rate at which the charge can be moved from one site to another. CMOS has the advantage of having individually addressable cells, and this has led to its advantage in the high speed photography industry.
Vidicons, Plumbicons, and image intensifiers have specific applications. The speed at which they can be sampled depends upon the decay rate of the phosphor used. For example, the P46 phosphor has a decay time of less than 2 microseconds, while the P43 decay time is on the order of 2-3 milliseconds. The P43 is therefore unusable at frame rates above 1000 frames per second ( frame/s). See External links for links to phosphor information.
Pyroelectric detectors respond to changes in temperature. Therefore, a static scene will not be detected, so they require choppers. They also have a decay time, so the pyroelectric system temporal response will be a bandpass, while the other detectors discussed will be a lowpass.
If objects within the scene are in motion relative to the imaging system, the resulting motion blur will result in lower spatial resolution. Short integration times will minimize the blur, but integration times are limited by sensor sensitivity. Furthermore, motion between frames in motion pictures will impact digital movie compression schemes (e.g. MPEG-1, MPEG-2). Finally, there are sampling schemes that require real or apparent motion inside the camera (scanning mirrors, rolling shutters) that may result in incorrect rendering of image motion. Therefore, sensor sensitivity and other time-related factors will have a direct impact on spatial resolution.
The spatial resolution of digital systems (e.g. HDTV and VGA) are fixed independently of the analog bandwidth because each pixel is digitized, transmitted, and stored as a discrete value. Digital cameras, recorders, and displays must be selected so that the resolution is identical from camera to display. However, in analog systems, the resolution of the camera, recorder, cabling, amplifiers, transmitters, receivers, and display may all be independent and the overall system resolution is governed by the bandwidth of the lowest performing component.
In analog systems, each horizontal line is transmitted as a high-frequency analog signal. Each picture element (pixel) is therefore converted to an analog electrical value (voltage), and changes in values between pixels therefore become changes in voltage. The transmission standards require that the sampling be done in a fixed time (outlined below), so more pixels per line becomes a requirement for more voltage changes per unit time, i.e. higher frequency. Since such signals are typically band-limited by cables, amplifiers, recorders, transmitters, and receivers, the band-limitation on the analog signal acts as an effective low-pass filter on the spatial resolution. The difference in resolutions between VHS (240 discernible lines per scanline), Betamax (280 lines), and the newer ED Beta format (500 lines) is explained primarily by the difference in the recording bandwidth.
In the NTSC transmission standard, each field contains 262.5 lines, and 59.94 fields are transmitted every second. Each line must therefore take 63 microseconds, 10.7 of which are for reset to the next line. Thus, the retrace rate is 15.734 kHz. For the picture to appear to have approximately the same horizontal and vertical resolution (see Kell factor), it should be able to display 228 cycles per line, requiring a bandwidth of 4.28 MHz. If the line (sensor) width is known, this may be converted directly into cycles per millimeter, the unit of spatial resolution.
B/G/I/K television system signals (usually used with PAL colour encoding) transmit frames less often (50 Hz), but the frame contains more lines and is wider, so bandwidth requirements are similar.
Note that a "discernible line" forms one half of a cycle (a cycle requires a dark and a light line), so "228 cycles" and "456 lines" are equivalent measures.
Display resolution is the one place that pixels are a legitimate measure, though a very technical analysis would include pixel pitch and size.
Analog tape recorders are also band limited. VHS, for example, is limited to 3 MHz, which results in a horizontal resolution of 240 discernible lines per scanline ("lines" in analog television parlance). VHS HQ, Betamax, and U-matic are also band limited to about 250 lines. Slower recording speeds also affect bandwidth, such that a VHS-EP tape or Betamax-III tape has a reduced resolution of 230 lines. Digital transmission and recording systems have a spatial resolution also, but it is more variable due to compression considerations.
Here's a list of modern-day, digital-type measurements (and traditional, analog horizontal resolutions) for various media. The list only includes popular formats, not rare formats, and all values are approximate (rounded to the nearest 10), since the actual quality can vary machine-to-machine or tape-to-tape. For ease-of-comparison all values are for the NTSC system, and listed in ascending order from lowest quality to highest quality.
There are two methods by which to determine system resolution. The first is to perform a series of two dimensional convolutions, first with the image and the lens, then the result of that procedure with the sensor, and so on through all of the components of the system. This is computationally expensive, and must be performed anew for each object to be imaged.
The other method is to transform each of the components of the system into the spatial frequency domain, and then to multiply the 2-D results. A system response may be determined without reference to an object. Although this method is considerably more difficult to comprehend conceptually, it becomes easier to use computationally, especially when different design iterations or imaged objects are to be tested.