Time-varying optimization using primal-dual type methods: Solution tracking and model training

Thumbnail Image
Date
2019-01-01
Authors
Zhang, Yijian
Major Professor
Advisor
Mingyi . Hong
Lizhi . Wang
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Authors
Research Projects
Organizational Units
Organizational Unit
Industrial and Manufacturing Systems Engineering
The Department of Industrial and Manufacturing Systems Engineering teaches the design, analysis, and improvement of the systems and processes in manufacturing, consulting, and service industries by application of the principles of engineering. The Department of General Engineering was formed in 1929. In 1956 its name changed to Department of Industrial Engineering. In 1989 its name changed to the Department of Industrial and Manufacturing Systems Engineering.
Journal Issue
Is Version Of
Versions
Series
Department
Industrial and Manufacturing Systems Engineering
Abstract

This dissertation considers general time-varying optimization problems that arise in many network control and machine learning applications. The problem takes the form of minimizing the sum of separable convex/non-convex cost functions subject to equality constraints, and both the objective and constraints are parameterized by some time-varying parameters. To cope with the problem dynamics, we design dynamic/stochastic algorithms based on primal-dual type methods, e.g. alternating direction method of multipliers (ADMM) and primal dual decomposition (PDD). Depending on specific application, our algorithms can accomplish two major tasks: i) continuously track optimal solutions for each time instance; ii) learn the general pattern of given data and produce a unique solution that fits all time-varying parameters.

The first part of the dissertation focuses on problems changing optimal solutions. We leverage proximal gradient in the primal steps, and modify the dual steps by adding some perturbation to accomondate the time-varying nature of the problem. We show that, under mild assumptions, the proposed algorithm is able to track the change of problem, meaning it will always stay in a neighborhood around the optimal or approximate optimal solutions for each time instance. Moreover, our analysis indicates an interesting trade-off between solution accuracy and convergence speed. In cases where gradient information is not available or difficult to compute, we develop a suitable time-varying algorithm by only using the function value information (also known as the zeroth-order information). We observe similar performance as gradient based methods and convergence in expectation is proved under suitable assumptions. As an extension of this time-varying framework, static optimization with randomness in updates are discussed with applications in power systems. Specifically, an ADMM-based distributed optimal power flow (OPF) controller for renewable energy source (RES) is designed to steer RES output powers to approximate AC OPF solutions.

The second part of the dissertation, we further discover that the time varying framework is also applicable to cases where all changing parameters can fit one unique solution, i.e. training. This type of problem is the core of many machine learning models that aiming at extracting data pattern. We specifically focus on deep neural network (DNN) and model the training of DNN into an equality constrained optimization problem by introducing auxiliary variables for each hidden layer. The resulting formulation is highly nonconvex and time varying in that each time only part of the data is available and as time goes by data comes in sequentially. We design another primal-dual type method called stochastic primal-dual decomposition (PDD) for the neural network training problem. We demonstrate that the developed algorithm is effective by: 1) performing theoretical analysis to show that the stochastic PDD algorithm can compute stationary solution of the training problem; 2) conducting comprehensive comparison with state-of-the-art algorithms and show that the proposed algorithm achieves some early stage advantage, that is, the training error decreases faster in the first few epoches.

Comments
Description
Keywords
Citation
DOI
Source
Copyright
Sun Dec 01 00:00:00 UTC 2019