Computer vision is an interdisciplinary activity that deals with how computers can gain high-level understanding from digital images or videos. It seeks to understand and automate tasks that the human visual system can do.
Computer vision tasks include acquiring, processing, analyzing and understanding digital images, and extracting high-dimensional data from the real world. The purpose is to produce numerical or symbolic information, in the forms of decisions for example. 'Understanding' in this context means the transformation of visual images into descriptions of the world that make sense to thought processes and can elicit appropriate action. This understanding of images may involve models constructed with the aid of geometry, physics, statistics, and learning theory.
The scientific discipline of computer vision is concerned with the theory behind artificial systems that extract information from images. The image data can take many forms, such as video sequences, views from multiple cameras, multi-dimensional data from a 3D scanner, or from a medical scanning device, for example. The technological discipline of computer vision seeks to apply its theories and models to the construction of computer vision systems. With many years of experience in virtually all areas of computer vision, APIXA is your preferred partner for any specific application.
In many applications, valuable information can be extracted from a single image. Typical examples include knitting fault detection, print quality assessment, and microchip inspection. However, image processing is not limited to the inspection of flat objects or 2D scenes. Modern image processing technologies, such as deep learning, facilitate a better understanding of complicated scenes. This enables things like pedestrian detection, scene classification, and complex object segmentation.
Where multiple images of the same scene are available, these views can be combined to extract even more information, such as a 360-degree view of an object, its 3D shape, or even a 3D map of a large environment.
In situations where motion is important, computer vision technologies can be applied to video data. One typical example is object tracking. Whereas the position and orientation of an object can be determined from a single image at a specific point in time, a video also allows for estimating its speed and acceleration. Furthermore, tracking can continue even if the object temporarily disappears from view, which greatly improves reliability.
Object and feature tracking in videos can be a first step in more complex motion analysis applications, such as gesture recognition, visual odometry, or even autonomous driving.
3D point cloud processing
The output of a 3D scanning device or a structure-from-motion algorithm is typically a set of 3D points. Extracting information from point clouds requires specialized algorithms, such as RANSAC (random sample consensus), ICP (iterative closest point), and 3D object segmentation. Applications that process 3D point clouds can provide high-precision measurements of objects, or intricate 3D shape analyses.
Simultaneous localization and mapping
When a sequence of images is generated by a moving camera, attached to a drone or an autonomous vehicle for example, SLAM (simultaneous localization and mapping) algorithms can be applied to construct a map of the environment and compute the location of the vehicle at the same time. This technique can be used in applications such as warehouse logistics, autonomous navigation, and drone-based crop inspection.
The results of many computer vision technologies can be improved considerably if the cameras are calibrated. Two forms of camera calibration can be distinguished:
- Geometric calibration aims to predict where a point in 3D space will appear in an image produced by the camera. This involves measuring both the internal (focal length, lens distortion, sensor skew, etc.) and external (camera position and orientation) camera parameters. This is particularly important if the camera will be used to measure objects, or if images from multiple cameras need to be combined. The calibration procedure typically requires images of a well-defined high-contrast object such as a checkerboard or a charuco board. If geometric calibration is necessary in your application, APIXA will design the calibration objects and procedures that are most suitable.
- Photometric calibration is needed in applications where the brightness and/or color of an object is critical. The goal is to quantify for each pixel in the image precisely how much light was reflected (or emitted) by the observed object, and (where applicable) also its color. Photometrically calibrated systems can be used in light metrology applications, RGB object classifiers, and where images from multiple cameras need to be combined.
In applications where differences in shape and color are not enough to distinguish objects or materials, hyperspectral imaging may provide a solution. This requires a special type of camera, a hyperspectal camera which captures not only the light that is reflected from each part of the object, but also its spectrum. Just like a color camera can make a distinction between regions of the same brightness by looking at color differences, a hyperspectral camera can distinguish regions of the same color and brightness by looking at differences in the spectrum. Spectral differences are often caused by differences in chemical composition. This technology is therefore ideally suited to problems where different materials of the same color need to be separated, or where (bio)chemical differences are important. Typical examples include waste sorting, inspection of geological samples, and early disease detection in crops.