Real-time generalized convolutional neural network for point cloud object tracking

Thumbnail Image
Date
2020-01-01
Authors
Garrett, Timothy
Major Professor
Advisor
Rafael Radkowski
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Altmetrics
Authors
Research Projects
Organizational Units
Journal Issue
Is Version Of
Versions
Series
Department
Electrical and Computer Engineering
Abstract

In recent years, Convolutional Neural Networks (CNNs) have been widely successful for numerous computer vision tasks using 2D color and 3D point data, such as object detection and classification, and 3D pose estimation. Despite these successes, it comes with a price --- CNNs require a vast amount of labeled training data to be successful. Many training datasets are now available, however, they are mostly limited to general objects in the public domain such as people, cars, buildings, fruit, and more. Thus, not all fields have sufficient training data. Even if an adequate number of data samples can be built into a dataset, manually collecting and labeling the training data remains a laborious task.

Several approaches to reduce or eradicate the need for training data have been proposed by researchers. The two most popular approaches are training with synthetic data and training on generalized object features. Training with synthetic data refers to the use of computer graphics to generate training data. While this approach has demonstrated success, it comes with its own challenges. In contrast, training a CNN on generalized object features identifies specific local 3D features or control points on objects. Local 3D features and their descriptors have proven in the past to be universally applicable and effective for general object detection, and thus, this method allows for a generalized CNN.

A generalized CNN refers to a network architecture that, after initial training, can be specialized for a new task with a minimum amount or no new training data. This dissertation investigates an encoder-decoder architecture and determines the capabilities of it to generalize the CNN architecture for object detection tasks in point cloud data. A CNN is trained to recognize 3D features and demonstrates, without retraining, its successful applicability for point cloud object detection using disparate data. Additionally, the geometric consistency of the descriptors is evaluated to obtain further insight. The results demonstrate that the proposed 3D feature descriptors, and their increased geometric consistency contribute to the increased encoder-decoder architecture performance. In summary, this dissertation contributes a state-of-the-art, generalized, feature descriptor-based CNN architecture that can be transferred to different objects without retraining. Furthermore, it provides insight explaining the increased performance of the CNN compared to the state-of-the-art.

Comments
Description
Keywords
Citation
Source
Copyright
Fri May 01 00:00:00 UTC 2020