View all newsletters
Receive our newsletter - data, insights and analysis delivered to you

An Introduction to Computer Vision

For humans, vision is a primary sense that enables us to perceive our world – we use sight to navigate, identify threats and interpret behaviour. The eyes are by far our most important sense organ; while many other species rely heavily on their sense of smell to gather information, humans perceive up to 80% of all impressions through sight.

Computer vision aims to give machines the ability to see. This field has become increasingly important as our expectations for modern machines rise. If we want self-driving cars, industrial pick-and-place robots, and lifelike assistants that perform natural speech patterns, then we need to build machines with the same visual capacities that humans enjoy.

So, how do we enable machines to see? At their most basic, computer vision systems analyze each pixel in an image to determine whether a given feature is present. This process is called feature extraction. Generally, approaches to feature extraction fall under two broad categories: model-driven and data-driven methods.

Traditional computer vision techniques are model-driven and involve hand-coding features one at a time.  However, the rise of deep learning in the last decade has prompted a shift towards data-driven methodologies.  In the following sections, we’ll describe the strengths and weaknesses of each approach and indicate the path forward for modern computer vision systems.

Model-Driven Approaches to Computer Vision

Traditional model-driven algorithms look for specified features hand-coded by an expert engineer and may contain many parameters.  The idea is to identify all the features that define one class of object, and then use this set of features as a definition of the object.  We can then use that definition to search for the object in other images.

For example, if we want to identify images that contain dogs, we would first identify all the features that dogs have in common such as fur or ears. We would also need to think of all the features that distinguish dogs from cats or horses. Once we specify which features define dogs, we can look for them in images.

Traditional model-driven approaches to computer vision are powerful because they rely on a strong understanding of the system and don’t require a large dataset to implement. However, making an exhaustive list of all the rules, exceptions and scenarios you need to accurately identify an object is hard and time-consuming.

To identify dogs, for example, we would need to hand label the features of each breed, including size, shape and fur characteristics. It sounds overwhelming, but the advent of artificial neural networks turned this traditional approach on its head – today’s computer vision systems are so close to 100% accuracy that they sometimes beat the human eye!

Data-Driven Approaches to Computer Vision

Neural networks are algorithms that are loosely modelled on the structure of the human brain. They consist of millions of simple processing nodes that are organized in layers and deeply interconnected. To train a network, each node is initialized to a random weight and data is passed through the network in one direction. The network output is then compared to ground truth, and the weights are adjusted until the network closes the gap. Through this process, the network ‘learns’ to correctly label input data.

This method introduced the concept of end-to-end learning, where the machine works out the most descriptive features for each object definition on its own. With neural networks, you don’t have to manually decide which features are important – the machine does the work for you.

If you want to teach a neural network to recognize a dog, you don’t tell it to look for fur or a tail. Instead, you show it thousands of images containing dogs, and eventually, the network learns by example what a dog looks like. This process is called training and requires a human supervisor. If your network misclassifies cats as dogs, you simply label more training images and feed those to the network until its prediction accuracy improves.

Modern neural networks independently uncover patterns in an image by learning from labelled training images.

While adding complexity to a traditional computer vision model requires additional code, only the data and annotations change when amending a neural network – the framework remains the same. For this reason, deep neural networks are considered a data-driven approach to computer vision.

Which Approach to Use

There are clear compromises between model- and data-driven computer vision systems. In data-driven models, engineers can feed raw pixels to a deep neural network to classify objects of interest in images with higher accuracy and lower overhead than traditional computer vision systems. They’re far more versatile and can more readily accommodate complexity than traditional systems.

But since deep learning algorithms discover features by learning from examples, they need large data sets to achieve accurate results. To match the performance of a well-designed traditional computer vision algorithm, a neural network needs enough labelled training data to cover the full range of base cases and expected variations.

At Motion Metrics, we use deep neural networks to detect missing teeth on mining shovels. These networks are trained to account for occlusion and handle expected variations in lighting, pose, etc.

In comparison, traditional approaches are mature and proven and don’t need high-end hardware to get the job done. These systems are also fully transparent, whereas a deep neural network is a black box containing millions of parameters that are tuned during training.

Although deep neural networks are fabulous tools that have rapidly advanced the field of computer vision, they are not a panacea. Whether or not a computer vision problem is best solved with a deep neural network architecture or a more traditional algorithm depends on several factors, including your access to data and hardware or team resources.

Deep neural networks are better at handling large datasets with high dimensionality, but problems with limited expected variation are often best solved with traditional computer vision techniques that don’t consume excessive computing resources. In many cases, a hybrid approach can offer the best of both worlds through higher performance and better computing efficiency.

Caitlin McKinnon, Motion Metrics, provides an overview of shovel monitoring solutions and the potential they have for increasing safety and efficiency at mine sites.
We provide a range of camera and sensor-based solutions for mining shovels, loaders, conveyor belts, and portable devices.
Image-based rock fragmentation sensing in mining and quarry applications includes an important rock boundary delineation step, which is commonly referred to as rock segmentation. This...
NEWSLETTER Sign up Tick the boxes of the newsletters you would like to receive. The top stories of the day delivered to you every weekday. A weekly roundup of the latest news and analysis, sent every Friday. The mining industry's most comprehensive news and information delivered every month. The mining industry's most comprehensive news and information delivered every month.
I consent to GlobalData UK Limited collecting my details provided via this form in accordance with the Privacy Policy
SUBSCRIBED

THANK YOU

Thank you for subscribing to Mining Technology