Jeremy Cook | Arrow
The short answer to the question, “How do robots see?” is via machine vision or industrial vision systems. The details are much more involved. In this article, we’ll frame the question around physical robots that accomplish a real-world task, rather than software-only applications used for filtering visual materials on the internet.
Machine vision systems capture images with a digital camera (or multiple cameras), processing this data on a frame-by-frame basis. The robot uses this interpreted data to interact with the physical world via a robotic arm, mobile agricultural system, automated security setup, or any number of other applications.
Computer vision became prominent in the latter part of the twentieth century, using a range of hard-coded criteria to determine simple facts about captured visual data. Text recognition is one such basic application. Inspection for the presence of component x or the size of hole y in an industrial assembly application are others. Today, computer vision applications have expanded dramatically by incorporating AI and machine learning.
Importance of machine vision
While vision systems based on specific criteria are still in use, machine vision is now capable of much more, thanks to AI-based processing. In this paradigm, robot vision systems are no longer programmed explicitly to recognize conditions like a collection of pixels (a so-called “blob”) in the correct position. A robot vision system can instead be trained with a dataset of bad and good parts, conditions, or scenarios to allow it to generate its own rules. So equipped, it can manage tasks like unlocking a door for humans and not animals, watering plants that look dry, or moving an autonomous vehicle when the stoplight is green.
While we can use cloud-based computing to train an AI model, for real-time decision-making, edge processing is typically preferable. Processing robotic vision tasks locally can reduce latency and means that you are not dependent on cloud infrastructure for critical tasks. Autonomous vehicles provide a great example of why this is important, as a half-second machine vision delay can lead to an accident. Additionally, no one wants to stop driving when network resources are unavailable.
Cutting-edge robotic vision technologies: multi-camera, 3D, AI techniques
While one camera allows the capture of 2D visual information, two cameras working together enable depth perception. For example, the NXP i.MX 8 family of processors can use two cameras at a 1080P resolution for stereo input. With the proper hardware, multiple cameras and camera systems can be integrated via video stitching and other techniques. Other sensor types, such as LIDAR, IMU, and sound, can be incorporated, giving a picture of a robot’s surroundings in 3D space and beyond.
The same class of technology that allows a robot to interpret captured images also allows a computer to generate new images and 3D models. One application of combining these two sides of the robotics vision coin is the field of augmented reality. Here, the visual camera and other inputs are interpreted, and the results are displayed for human consumption.
How to get started with machine vision
We now have a wide range of options for getting started with machine vision. From a software standpoint, OpenCV is a great place to start. It is available for free, and it can work with rules-based machine vision, as well as newer deep learning models. You can get started with your computer and webcam, but specialized industrial vision system equipment like the Jetson Nano Developer Kit or the Google Coral line of products are well suited to vision and machine learning. The NVIDIA Jetson Orin NX 16GB offers 100 TOPS of AI performance in the familiar Jetson form factor.
Companies like NVIDIA have a range of software assets available, including training datasets. If you would like to implement an AI application but would rather not source the needed pictures of people, cars, or other objects, this can give you a massive head start. Look for datasets to improve in the future, with cutting-edge AI techniques like attention and vision transformers enhancing how we use them.
Robot vision algorithms
Robots see via the constant interpretation of a stream of images, processing that data via human-coded algorithms or interpretation via an AI-generated ruleset. Of course, on a philosophical level, one might flip the question and ask, “How do robots see themselves?” Given our ability to peer inside the code—as convoluted as an AI model maybe—it could be a more straightforward question than how we see ourselves!