Real-time generalized convolutional neural network for point cloud object tracking

Garrett, Timothy

Real-time generalized convolutional neural network for point cloud object tracking

File

Garrett_iastate_0097E_18681.pdf (143 MB)

Date

2020-01-01

Authors

Garrett, Timothy

Advisor

Rafael Radkowski

Altmetrics

Organizational Units

Organizational Unit

Electrical and Computer Engineering

Department

Electrical and Computer Engineering

Abstract

In recent years, Convolutional Neural Networks (CNNs) have been widely successful for numerous computer vision tasks using 2D color and 3D point data, such as object detection and classification, and 3D pose estimation. Despite these successes, it comes with a price --- CNNs require a vast amount of labeled training data to be successful. Many training datasets are now available, however, they are mostly limited to general objects in the public domain such as people, cars, buildings, fruit, and more. Thus, not all fields have sufficient training data. Even if an adequate number of data samples can be built into a dataset, manually collecting and labeling the training data remains a laborious task.

Several approaches to reduce or eradicate the need for training data have been proposed by researchers. The two most popular approaches are training with synthetic data and training on generalized object features. Training with synthetic data refers to the use of computer graphics to generate training data. While this approach has demonstrated success, it comes with its own challenges. In contrast, training a CNN on generalized object features identifies specific local 3D features or control points on objects. Local 3D features and their descriptors have proven in the past to be universally applicable and effective for general object detection, and thus, this method allows for a generalized CNN.

A generalized CNN refers to a network architecture that, after initial training, can be specialized for a new task with a minimum amount or no new training data. This dissertation investigates an encoder-decoder architecture and determines the capabilities of it to generalize the CNN architecture for object detection tasks in point cloud data. A CNN is trained to recognize 3D features and demonstrates, without retraining, its successful applicability for point cloud object detection using disparate data. Additionally, the geometric consistency of the descriptors is evaluated to obtain further insight. The results demonstrate that the proposed 3D feature descriptors, and their increased geometric consistency contribute to the increased encoder-decoder architecture performance. In summary, this dissertation contributes a state-of-the-art, generalized, feature descriptor-based CNN architecture that can be transferred to different objects without retraining. Furthermore, it provides insight explaining the increased performance of the CNN compared to the state-of-the-art.

Copyright

Fri May 01 00:00:00 UTC 2020

Collections

Theses and Dissertations

Full item page