Title
Randomized Allocation with Nonparametric Estimation for a Multi-Armed Bandit Problem with Covariates
Campus Units
Supply Chain and Information Systems
Document Type
Article
Publication Version
Published Version
Publication Date
2-2002
Journal or Book Title
The Annals of Statistics
Volume
30
Issue
1
First Page
100
Last Page
122
DOI
10.1214/aos/1015362186
Abstract
We study a multi-armed bandit problem in a setting where covariates are available. We take a nonparametric approach to estimate the functional relationship between the response (reward) and the covariates. The estimated relationships and appropriate randomization are used to select a good arm to play for a greater expected reward. Randomization helps balance the tendency to trust the currently most promising arm with further exploration of other arms. It is shown that, with some familiar nonparametric methods (e.g., histogram), the proposed strategy is strongly consistent in the sense that the accumulated reward is asymptotically equivalent to that based on the best arm (which depends on the covariates) almost surely.
Copyright Owner
Institute of Mathematical Statistics
Copyright Date
2002
Language
en
File Format
application/pdf
Recommended Citation
Yang, Yuhong and Zhu, Dan, "Randomized Allocation with Nonparametric Estimation for a Multi-Armed Bandit Problem with Covariates" (2002). Supply Chain Management Publications. 77.
https://lib.dr.iastate.edu/scm_pubs/77
Comments
This article is published as Yang, Yuhong; Zhu, Dan. Randomized Allocation with nonparametric estimation for a multi-armed bandit problem with covariates. Ann. Statist. 30 (2002), no. 1, 100--121. doi: 10.1214/aos/1015362186. Posted with permission.