Robotics/Sensors/Computer Vision

< Robotics < Sensors

A digitally acquired image comes in the form of a matrix of vectors. Each of these vectors is called a pixel and represents a specific color. If the matrix is displayed with each cell filled in with the corresponding color, it creates a picture of the scene from the point of view of the camera.

Camera Calibration

Cameras are devices which are used to convert 2-d or 3-d reality into 2-d representations. While post processing can re-create 3-d representations from the initial 2-d images, the camera output is a projection of reality on a 2-d plane. Because light is used as the projection mechanism, anything that alters the path or character of the light rays from the object to the sensor plane will affect the fidelity of that initial 2-d representation on the sensor plane.

Cameras are not perfect devices. Because they are built with real-world components having real-world characteristics, their ability to represent reality is limited by the physical properties of their constituent parts. Broadly speaking, there are three kinds of distortions that can occur:

The focus of this chapter is on Geometric Distortion – its causes and remedies.

Exterior Orientation

Exterior Orientation is defined as the position (x,y,z) and angle (tip, tilt, yaw) of the camera relative to the object scene. Even with a perfect camera, distortions due to the exterior orientation will be introduced. These can easily be removed by knowing these six elements of exterior orientation and employing techniques of solid analytical geometry. Solving for unknown exterior orientation elements is covered under the topic of photogrammetry and is generally considered part of the calibration of the entire camera system (which includes its object scene environment). Camera calibration, per se, usually refers to correcting for a camera's interior orientation.

Interior Orientation

Interior Orientation is defined as the relative orientation and characteristics of components within the camera proper. These include:

Lens Distortion

Lens distortion causes image points to be displaced from their ideal “pin hole model” locations on the sensor plane. These displacements can be further described as:

Ideal Pin-hole Model

The pin hole model is used to represent the ideal lens. It simply enforces the idea that rays of light travel in straight lines from the object, though the pin hole, to the image (sensor) plane.

Expensive lenses approximate this pin-hole model behavior.

Goals of Camera Calibration

The goal of camera calibration is to correct the image displacements which occur due to elements of the camera's interior orientation. There are two general approaches used for camera calibration:

Model-based Approaches

With a Model-based approach, one attempts to identify a few predominant factors contributing to error, model them, measure them, and correct for them. For each contributing factor, a mathematical equation is proposed to model the error. For example, radial lens distortion can be modeled with a four-term polynomial of the form:

delta r = (k1 * r) + (k2*r^3) + (k3*r^5)+ (k4*r^7)+ (k5*r^9) + .............

Usually, the first two or three terms are sufficient to completely describe the radial error. Note that this is a model of the error, not the actual error. Models can approximate the error, but never fully correct for error effects.

After determining the appropriate model (assume we pick the first three terms above to model radial lens distortion), the next step is to determine the values of the coefficients which best model the observed error. This can be accomplished in one of two ways:

Summary of Model-based Approaches

Pros
Cons

Mapping-based Approaches

With a mapping-based approach, no attempt is made to understand the individual contributing causes of error. The entire focus is on generating a comprehensive reality-to-image (or image-to-reality) mapping function. Simple rubber-sheeting would be an example of such a transformation.

For example, imagine setting up a high precision x-y plotter in front of a camera, oriented so it is perpendicular to the camera's optical axis. Next, a pin-hole light source is mounted on the plotter pen holder in such a way as it can be moved to any location in its plotter-based x-y plane. Further, imagine that for every possible position of the light source, we can capture the row and column of the single pixel illuminated.

For a VGA format image (640 columns by 480 rows), there are 307,200 pixels. If we were to drive the plotter to each of 307,200 positions, and record what real world (x,y) coordinate mapped to each and every pixel, we would have achieved the building of an explicit mapping function.

With this approach, all potential causes of errors come out in the wash – whether they're known or not. All that matters is having the explicit image to reality mapping preserved.

In reality, nobody bothers to separately illuminate 307,200 pixels. However, the process can be approximated by collecting similar measurements on several hundred patches (16 * 16 pixels in size, for example) and employing a piece-wise transformation for each patch. This process could be further automated by driving the plotter to the hundreds of control point locations.

Summary of Mapping-based Approaches

Pros
Cons

Summary of Camera Calibration

By employing either a model-based or mapping-based approach to camera calibration, most of the image displacement errors caused by elements of interior camera orientation can be removed prior to further processing.

Image Segmentation

The purpose of image segmentation is to split a source image into multiple destination images or Regions of Interest based on certain criteria. For example, it may be beneficial to find a single part out of a bin. For navigation systems, it may prove useful to extract only floor lines from an image.

Algorithm: Region Growing

Start by finding a single set pixel. Search all the pixels around it. For every set pixel around one that is set, search all the pixels around it, and so on. This algorithm is not very efficient in terms of computational power, but it does extract regions with no post-processing required.

Algorithm: Edge Detection

Begin by searching for disparities in the image. Once a disparity surpasses a certain minimum size and threshold in luminosity or other characteristic, it is an edge. After all the edges have been found, look for regions bounded by edges. The example image has very well-defined edges, making it simple to process. However, many real world images have smoother gradients, making their edges harder to detect. Edge detection is also vulnerable to many types of noise, which will disrupt the edge detection.

Algorithm: MultiScaling

MultiScaling is a useful technique whenever multiple scales of an image can be obtained. Many camera image processors can emit a thumbnail along with the main image. This low-resolution thumbnail can be used as a plan for searching the main image. Specifically, any areas in the low-resolution image lacking in pixels likely represent empty or very sparse areas in the full image. If we are willing to sacrifice detection of small blobs for speed, MultiScaling is an efficient approach.

Algorithm: Sequential Searches

The goal of a sequential search is to examine each and every pixel once and only once. When new pixels are found, they are compared against previously found groups of pixels, and inserted into that group. If a pixel is found to be in two groups, the groups must be combined.

References

A field of study known as computer vision has formed around looking for patterns in matrices of this type that correspond to certain objects, such as faces.

For a comprehensive overview of computer vision see:

This article is issued from Wikibooks. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.