Uniform TitleProbably Approximately Correct (PAC) exploration in reinforcement learning
NameStrehl, Alexander L. (author), Littman, Michael (chair), Hirsh, Haym (internal member), Szegedy, Mario (internal member), Kearns, Michael (outside member), Rutgers University, Graduate School - New Brunswick,
Learning models (Stochastic processes),
DescriptionReinforcement Learning (RL) in finite state and action Markov Decision Processes is studied with an emphasis on the well-studied exploration problem. We provide a general RL framework that applies to all results in this thesis and to other results in RL that generalize the finite MDP assumption. We present two new versions of the Model-Based Interval Estimation (MBIE) algorithm and prove that they are both PAC-MDP. These algorithms are provably more efficient any than previously studied RL algorithms. We prove that many model-based algorithms (including R-MAX and MBIE) can be modified so that their worst-case per-step computational complexity is vastly improved without sacrificing their attractive theoretical guarantees. We show that it is possible to obtain PAC-MDP bounds with a model-free algorithm called Delayed Q-learning.
NoteIncludes bibliographical references (p. 133-136).
CollectionGraduate School - New Brunswick Electronic Theses and Dissertations
Organization NameRutgers, The State University of New Jersey
RightsThe author owns the copyright to this work.