Campus Units

Supply Chain and Information Systems

Document Type

Article

Publication Version

Published Version

Publication Date

2-2002

Journal or Book Title

The Annals of Statistics

Volume

30

Issue

1

First Page

100

Last Page

122

DOI

10.1214/aos/1015362186

Abstract

We study a multi-armed bandit problem in a setting where covariates are available. We take a nonparametric approach to estimate the functional relationship between the response (reward) and the covariates. The estimated relationships and appropriate randomization are used to select a good arm to play for a greater expected reward. Randomization helps balance the tendency to trust the currently most promising arm with further exploration of other arms. It is shown that, with some familiar nonparametric methods (e.g., histogram), the proposed strategy is strongly consistent in the sense that the accumulated reward is asymptotically equivalent to that based on the best arm (which depends on the covariates) almost surely.

Comments

This article is published as Yang, Yuhong; Zhu, Dan. Randomized Allocation with nonparametric estimation for a multi-armed bandit problem with covariates. Ann. Statist. 30 (2002), no. 1, 100--121. doi: 10.1214/aos/1015362186. Posted with permission.

Copyright Owner

Institute of Mathematical Statistics

Language

en

File Format

application/pdf

Share

COinS