Degree Type

Thesis

Date of Award

2020

Degree Name

Doctor of Philosophy

Department

Electrical and Computer Engineering

Major

Human Computer Interactionand Computer Engineering(Software Systems)

First Advisor

Rafael Radkowski

Abstract

In recent years, Convolutional Neural Networks (CNNs) have been widely successful for numerous computer vision tasks using 2D color and 3D point data, such as object detection and classification, and 3D pose estimation. Despite these successes, it comes with a price --- CNNs require a vast amount of labeled training data to be successful. Many training datasets are now available, however, they are mostly limited to general objects in the public domain such as people, cars, buildings, fruit, and more. Thus, not all fields have sufficient training data. Even if an adequate number of data samples can be built into a dataset, manually collecting and labeling the training data remains a laborious task.

Several approaches to reduce or eradicate the need for training data have been proposed by researchers. The two most popular approaches are training with synthetic data and training on generalized object features. Training with synthetic data refers to the use of computer graphics to generate training data. While this approach has demonstrated success, it comes with its own challenges. In contrast, training a CNN on generalized object features identifies specific local 3D features or control points on objects. Local 3D features and their descriptors have proven in the past to be universally applicable and effective for general object detection, and thus, this method allows for a generalized CNN.

A generalized CNN refers to a network architecture that, after initial training, can be specialized for a new task with a minimum amount or no new training data. This dissertation investigates an encoder-decoder architecture and determines the capabilities of it to generalize the CNN architecture for object detection tasks in point cloud data. A CNN is trained to recognize 3D features and demonstrates, without retraining, its successful applicability for point cloud object detection using disparate data. Additionally, the geometric consistency of the descriptors is evaluated to obtain further insight. The results demonstrate that the proposed 3D feature descriptors, and their increased geometric consistency contribute to the increased encoder-decoder architecture performance. In summary, this dissertation contributes a state-of-the-art, generalized, feature descriptor-based CNN architecture that can be transferred to different objects without retraining. Furthermore, it provides insight explaining the increased performance of the CNN compared to the state-of-the-art.

DOI

https://doi.org/10.31274/etd-20200624-88

Copyright Owner

Timothy Garrett

Language

en

File Format

application/pdf

File Size

137 pages

Available for download on Tuesday, June 15, 2021

Share

COinS