Topics in model heterogeneity study and decentralized consensus learning over networks

Thumbnail Image
Date
2021-01-01
Authors
Zhang, Xin
Major Professor
Advisor
Zhengyuan Zhu
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Altmetrics
Authors
Research Projects
Organizational Units
Organizational Unit
Statistics
As leaders in statistical research, collaboration, and education, the Department of Statistics at Iowa State University offers students an education like no other. We are committed to our mission of developing and applying statistical methods, and proud of our award-winning students and faculty.
Journal Issue
Is Version Of
Versions
Series
Department
Statistics
Abstract

This dissertation will focus on developing novel methods for model heterogeneity study and efficient consensus learning algorithms to analyze data over spatial networks.On the one hand, the recent development of remote sensing technologies has made it possible to collect complex and massive spatial data sets. These complex data motivate us to explore the model heterogeneity (i.e. find the model-based spatial clusters) over the spatial domain/network. In Chapter \ref{Chapter 2}, we first focus on the linear regression model and study the scenario where the location number is fixed and the local model is identifiable. We propose an adaptive spanning tree-based fusion lasso approach, which can simultaneously estimate the model and cluster the data sets over the spatial networks. It is shown that by simplifying the complex network topology to a tree structure, both the estimation and computation efficiency are significantly improved. Furthermore, in Chapter \ref{Chapter 3}, we extend the study to a spatial partial linear model and study the cases where each location only has one observation. In our model, a nonparametric intercept is adapted to absorb the spatial random effect and estimated by the bivariate spline over triangulation. Additionally, for the coefficient clustering part, we propose a novel forest lasso penalty, in which an adaptive clustering tree structure is constructed by the average of a multitude of initial random spanning trees. we show that the proposed fusion penalty could improve the estimation accuracy within a limited computation.

On the other hand, the massive volume of distributed dataset makes it hard to be analyzed with a centralized method.This fact encourages us to develop the distributed learning methods, which allow the computation to be implemented in the network. Although various network consensus learning algorithms have been proposed, the performances of the existing methods are not satisfactory in terms of communication efficiency and data privacy. In Chapter \ref{Chapter 4}-\ref{Chapter 5}, we focus on improving the communication performance of the network consensus learning methods. Inspired by the widely adopted compression method, we propose two differential-coded decentralized gradient descent algorithms, in which the sparsified/quantized message is communicated among the computation nodes for efficient communication. In Chapter \ref{Chapter 6}, to address the privacy concern, we design a privacy-preserving decentralized method under the framework of differential privacy. In our work, we invent a sparse differential Gaussian-Masking decentralized stochastic gradient descent algorithm. It is shown that by combining the Gaussian mechanism with sparsification, we gain a better privacy guarantee. Thorough numerical experiments have been conducted to verify the performance of our algorithms.

Comments
Description
Keywords
Citation
Source
Copyright
Sat May 01 00:00:00 UTC 2021