Embedded Computer Vision

Video Bytes

Current vision computing systems are often constrained by low-level processing of image data. For example, many object detection algorithms rely on a form of feature detection (e.g., Haar features, points, edges, corners, etc.) that requires the majority of the processing power. We believe that offloading the low-level processing onto an easy to use, dedicated hardware system will enable a revolution in the development of vision processing algorithms.

We are exploring how hardware pre-computation of low-level image features will ease the software processing requirements of vision algorithms, and hence increase their ability to correctly perform the task at hand. This is achieved through real-time hardware processing on the video stream using a variety of image processing techniques. Then instead of sending the raw image, we transmit video bytes – asynchronous, pre-processed, spatiotemporal pieces of information derived from a video stream. By sending only video bytes, we achieve significant reduction in the data space and free up the CPU to focus only on the essential image information.

The figure shows how four video bytes are used to detect faces. The initial video is streamed into a skin classifier. This detects skin colored pixels and turns the remaining pixels black. Morphological opening removes noise and the image is converted to grayscale and feed to a face classification engine. The detected face locations are shown with green boxes in the final image. We developed a hardware acceleration system that accurately detects faces at 120 frames/sec, over 150 times faster than software.

Object Detection

Object detection is the act of identifying objects of interest in images regardless of size, position and circumstance. A successful algorithm will find the locations and sizes of all objects in the image stream that belong to a given class without incorrectly identifying any false positives. We have developed a prototype hardware system for object detection using Haar classifiers. Our prototype system employs the Voila-Jones algorithm, similar to the one found in the OpenCV library. This algorithm requires considerable computational power due to the sheer number of Haar features that must be identified to detect an object face. For example, the detection of a face requires the computation of over 2000 features, which typically computed over a window of approximately 20 by 20 pixels. And while every window does not typically require the computation of all 2000 features – the detection is performed in stages and the absence of something that looks similar to a face is designed to terminate quickly – this is still a substantial amount of computation. Therefore, this constitutes a bottleneck for real-time object detection. Our system can perform face detection at approximately 120 frames/sec, which is over 150 times faster than the equivalent software implementation.

Multi-view object detection identifies an object even if it is rotated. This is important in applications where the object does not always occur at the same orientation. We developed a multi-view face detection hardware system. It generates rotated image windows and their integral image windows for each classifier, which perform parallel classification operations to detect non-upright (rotated) and non-frontal (profile) faces in the images. Our multi-view face detection hardware system is capable of processing the images at speeds over 14 frames/sec, which is approximately 10x faster than the equivalent software implementation.

We have shared a portion of this code with at the ERCBench website

An example video showing the system functioning in real-time.

Color Classification System

Color classification is an important feature in image processing as it allows for fast processing and robustness to geometric variations, however it requires a high frame processing rate and low latency to provide quick decisions. Color classification algorithms use a set of descriptors to discriminate between colors of interest. Once we have these descriptors, we look at each pixel and determine if it lies within the color space(s) of interest. The major difficulty in color classification is accurately formulating the descriptors. We have an automated method to create these descriptors using the AdaBoost machine learning technique, and can automatically translate these descriptors into a functional hardware system. Our color classification hardware system is capable of processing an image over 230 frames/sec while its equivalent software implementation runs at .5 frames/sec, i.e., the hardware is over 460 times faster than the software.

Relevant Publications

Jung Uk Cho, Shahnam Mirzaei, Jason Oberg and Ryan Kastner “FPGA-Based Face Detection System Using Haar Classifiers”, International Symposium on Field Programmable Gate Arrays (FPGA), February 2009 (pdf)

Jung Uk Cho, Bridget Benson, Shahnam Mirzaei and Ryan Kastner, “Parallelized Architecture of Multiple Classifiers for Face Detection“, IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP), July 2009 (pdf)

Jung Uk Cho, Bridget Benson and Ryan Kastner, “Hardware Acceleration of Multi-view Face Detection“, IEEE Symposium on Application Specific Processors (SASP), July 2009 (pdf)

Daniel Hefenbrock, Jason Oberg, Nhat Tan Nguyen Thanh, Ryan Kastner and Scott Baden, “Accelerating Viola-Jones Face Detection to FPGA-Level using GPUs“, IEEE International Symposium on Field-Programmable Custom Computing Machines, May 2010

Jung Uk Cho, Bridget Benson, Sunsern Cheamanukul and Ryan Kastner, “Increased Performace of FPGA-Based Color Classification System“, IEEE International Symposium on Field-Programmable Custom Computing Machines, May 2010 (pdf)