Del 17 - IEEE Press Series on Computational Intelligence

Reinforcement Learning and Approximate Dynamic Programming for Feedback Control

Inbunden, Engelska, 2013

Av Frank L. Lewis, Derong Liu, Frank L. (Georgia Institute of Technology) Lewis, Frank L Lewis

2 459 kr

Beställningsvara. Skickas inom 11-20 vardagar

Fri frakt för medlemmar vid köp för minst 249 kr.

Reinforcement learning (RL) and adaptive dynamic programming (ADP) has been one of the most critical research fields in science and engineering for modern complex systems. This book describes the latest RL and ADP techniques for decision and control in human engineered systems, covering both single player decision and control and multi-player games. Edited by the pioneers of RL and ADP research, the book brings together ideas and methods from many fields and provides an important and timely guidance on controlling a wide variety of systems, such as robots, industrial processes, and economic decision-making.

Produktinformation

Utgivningsdatum2013-02-12
Mått164 x 241 x 36 mm
Vikt1 066 g
FormatInbunden
SpråkEngelska
SerieIEEE Press Series on Computational Intelligence
Antal sidor648
FörlagJohn Wiley & Sons Inc
ISBN9781118104200

Tillhör följande kategorier

Dr. Frank Lewis is a Professor of Electrical Engineering at The University of Texas at Arlington, where he was awarded the Moncrief-O'Donnell Endowed Chair in 1990 at the Automation & Robotics Research Institute. He has served as Visiting Professor at Democritus University in Greece, Hong Kong University of Science and Technology, Chinese University of Hong Kong, City University of Hong Kong, National University of Singapore, Nanyang Technological University Singapore. Elected Guest Consulting Professor at Shanghai Jiao Tong University and South China University of Technology.Derong Liu received the B.S. degree in mechanical engineering from the East China Institute of Technology (now Nanjing University of Science and Technology), Nanjing, China, in 1982, the M.S. degree in automatic control theory and applications from the Institute of Automation, Chinese Academy of Sciences, Beijing, China, in 1987, and the Ph.D. degree in electrical engineering from the University of Notre Dame, Notre Dame, IN, in 1994.

PREFACE xix CONTRIBUTORS xxiiiPART I FEEDBACK CONTROL USING RL AND ADP1. Reinforcement Learning and Approximate Dynamic Programming (RLADP)—Foundations, Common Misconceptions, and the Challenges Ahead 3Paul J. Werbos1.1 Introduction 31.2 What is RLADP? 41.3 Some Basic Challenges in Implementing ADP 142. Stable Adaptive Neural Control of Partially Observable Dynamic Systems 31J. Nate Knight and Charles W. Anderson2.1 Introduction 312.2 Background 322.3 Stability Bias 352.4 Example Application 383. Optimal Control of Unknown Nonlinear Discrete-Time Systems Using the Iterative Globalized Dual Heuristic Programming Algorithm 52Derong Liu and Ding Wang3.1 Background Material 533.2 Neuro-Optimal Control Scheme Based on the Iterative ADP Algorithm 553.3 Generalization 673.4 Simulation Studies 683.5 Summary 744. Learning and Optimization in Hierarchical Adaptive Critic Design 78Haibo He, Zhen Ni, and Dongbin Zhao4.1 Introduction 784.2 Hierarchical ADP Architecture with Multiple-Goal Representation 804.3 Case Study: The Ball-and-Beam System 874.4 Conclusions and Future Work 945. Single Network Adaptive Critics Networks—Development, Analysis, and Applications 98Jie Ding, Ali Heydari, and S.N. Balakrishnan5.1 Introduction 985.2 Approximate Dynamic Programing 1005.3 SNAC 1025.4 J-SNAC 1045.5 Finite-SNAC 1085.6 Conclusions 1166. Linearly Solvable Optimal Control 119K. Dvijotham and E. Todorov6.1 Introduction 1196.2 Linearly Solvable Optimal Control Problems 1236.3 Extension to Risk-Sensitive Control and Game Theory 1306.4 Properties and Algorithms 1346.5 Conclusions and Future Work 1397. Approximating Optimal Control with Value Gradient Learning 142Michael Fairbank, Danil Prokhorov, and Eduardo Alonso7.1 Introduction 1427.2 Value Gradient Learning and BPTT Algorithms 1447.3 A Convergence Proof for VGL(1) for Control with Function Approximation 1487.4 Vertical Lander Experiment 1547.5 Conclusions 1598. A Constrained Backpropagation Approach to Function Approximation and Approximate Dynamic Programming 162Silvia Ferrari, Keith Rudd, and Gianluca Di Muro8.1 Background 1638.2 Constrained Backpropagation (CPROP) Approach 1638.3 Solution of Partial Differential Equations in Nonstationary Environments 1708.4 Preserving Prior Knowledge in Exploratory Adaptive Critic Designs 1748.5 Summary 1799. Toward Design of Nonlinear ADP Learning Controllers with Performance Assurance 182Jennie Si, Lei Yang, Chao Lu, Kostas S. Tsakalis, and Armando A. Rodriguez9.1 Introduction 1839.2 Direct Heuristic Dynamic Programming 1849.3 A Control Theoretic View on the Direct HDP 1869.4 Direct HDP Design with Improved Performance Case 1—Design Guided by a Priori LQR Information 1939.5 Direct HDP Design with Improved Performance Case 2—Direct HDP for Coorindated Damping Control of Low-Frequency Oscillation 1989.6 Summary 20110. Reinforcement Learning Control with Time-Dependent Agent Dynamics 203Kenton Kirkpatrick and John Valasek10.1 Introduction 20310.2 Q-Learning 20510.3 Sampled Data Q-Learning 20910.4 System Dynamics Approximation 21310.5 Closing Remarks 21811. Online Optimal Control of Nonaffine Nonlinear Discrete-Time Systems without Using Value and Policy Iterations 221Hassan Zargarzadeh, Qinmin Yang, and S. Jagannathan11.1 Introduction 22111.2 Background 22411.3 Reinforcement Learning Based Control 22511.4 Time-Based Adaptive Dynamic Programming-Based Optimal Control 23411.5 Simulation Result 24712. An Actor–Critic–Identifier Architecture for Adaptive Approximate Optimal Control 258S. Bhasin, R. Kamalapurkar, M. Johnson, K.G. Vamvoudakis, F.L. Lewis, and W.E. Dixon12.1 Introduction 25912.2 Actor–Critic–Identifier Architecture for HJB Approximation 26012.3 Actor–Critic Design 26312.4 Identifier Design 26412.5 Convergence and Stability Analysis 27012.6 Simulation 27412.7 Conclusion 27513. Robust Adaptive Dynamic Programming 281Yu Jiang and Zhong-Ping Jiang13.1 Introduction 28113.2 Optimality Versus Robustness 28313.3 Robust-ADP Design for Disturbance Attenuation 28813.4 Robust-ADP for Partial-State Feedback Control 29213.5 Applications 29613.6 Summary 300PART II LEARNING AND CONTROL IN MULTIAGENT GAMES14. Hybrid Learning in Stochastic Games and Its Application in Network Security 305Quanyan Zhu, Hamidou Tembine, and Tamer Basar14.1 Introduction 30514.2 Two-Person Game 30814.3 Learning in NZSGs 31014.4 Main Results 31414.5 Security Application 32214.6 Conclusions and Future Works 32615. Integral Reinforcement Learning for Online Computation of Nash Strategies of Nonzero-Sum Differential Games 330Draguna Vrabie and F.L. Lewis15.1 Introduction 33115.2 Two-Player Games and Integral Reinforcement Learning 33315.3 Continuous-Time Value Iteration to Solve the Riccati Equation 33715.4 Online Algorithm to Solve Nonzero-Sum Games 33915.5 Analysis of the Online Learning Algorithm for NZS Games 34215.6 Simulation Result for the Online Game Algorithm 34515.7 Conclusion 34716. Online Learning Algorithms for Optimal Control and Dynamic Games 350Kyriakos G. Vamvoudakis and Frank L. Lewis16.1 Introduction 35016.2 Optimal Control and the Continuous Time Hamilton–Jacobi–Bellman Equation 35216.3 Online Solution of Nonlinear Two-Player Zero-Sum Games and Hamilton–Jacobi–Isaacs Equation 36016.4 Online Solution of Nonlinear Nonzero-Sum Games and Coupled Hamilton–Jacobi Equations 366PART III FOUNDATIONS IN MDP AND RL17. Lambda-Policy Iteration: A Review and a New Implementation 381Dimitri P. Bertsekas17.1 Introduction 38117.2 Lambda-Policy Iteration without Cost Function Approximation 386 17.3 Approximate Policy Evaluation Using Projected Equations 38817.4 Lambda-Policy Iteration with Cost Function Approximation 39517.5 Conclusions 40618. Optimal Learning and Approximate Dynamic Programming 410Warren B. Powell and Ilya O. Ryzhov18.1 Introduction 41018.2 Modeling 41118.3 The Four Classes of Policies 41218.4 Basic Learning Policies for Policy Search 41618.5 Optimal Learning Policies for Policy Search 42118.6 Learning with a Physical State 42719. An Introduction to Event-Based Optimization: Theory and Applications 432Xi-Ren Cao, Yanjia Zhao, Qing-Shan Jia, and Qianchuan Zhao19.1 Introduction 43219.2 Literature Review 43319.3 Problem Formulation 43419.4 Policy Iteration for EBO 43519.5 Example: Material Handling Problem 44119.6 Conclusions 44820. Bounds for Markov Decision Processes 452Vijay V. Desai, Vivek F. Farias, and Ciamac C. Moallemi20.1 Introduction 45220.2 Problem Formulation 45520.3 The Linear Programming Approach 45620.4 The Martingale Duality Approach 45820.5 The Pathwise Optimization Method 46120.6 Applications 46320.7 Conclusion 47021. Approximate Dynamic Programming and Backpropagation on Timescales 474John Seiffertt and Donald Wunsch21.1 Introduction: Timescales Fundamentals 47421.2 Dynamic Programming 47921.3 Backpropagation 48521.4 Conclusions 49222. A Survey of Optimistic Planning in Markov Decision Processes 494Lucian Busoniu, Remi Munos, and Robert Babu¡ska22.1 Introduction 49422.2 Optimistic Online Optimization 49722.3 Optimistic Planning Algorithms 50022.4 Related Planning Algorithms 50922.5 Numerical Example 51023. Adaptive Feature Pursuit: Online Adaptation of Features in Reinforcement Learning 517Shalabh Bhatnagar, Vivek S. Borkar, and L.A. Prashanth23.1 Introduction 51723.2 The Framework 52023.3 The Feature Adaptation Scheme 52223.4 Convergence Analysis 52523.5 Application to Traffic Signal Control 52723.6 Conclusions 53224. Feature Selection for Neuro-Dynamic Programming 535Dayu Huang, W. Chen, P. Mehta, S. Meyn, and A. Surana24.1 Introduction 53524.2 Optimality Equations 53624.3 Neuro-Dynamic Algorithms 54224.4 Fluid Models 55124.5 Diffusion Models 55424.6 Mean Field Games 55624.7 Conclusions 55725. Approximate Dynamic Programming for Optimizing Oil Production 560Zheng Wen, Louis J. Durlofsky, Benjamin Van Roy, and Khalid Aziz25.1 Introduction 56025.2 Petroleum Reservoir Production Optimization Problem 56225.3 Review of Dynamic Programming and Approximate Dynamic Programming 56425.4 Approximate Dynamic Programming Algorithm for Reservoir Production Optimization 56625.5 Simulation Results 57325.6 Concluding Remarks 57823.6 Conclusions 53224. Feature Selection for Neuro-Dynamic Programming 535Dayu Huang, W. Chen, P. Mehta, S. Meyn, and A. Surana24.1 Introduction 53524.2 Optimality Equations 53624.3 Neuro-Dynamic Algorithms 54224.4 Fluid Models 55124.5 Diffusion Models 55424.6 Mean Field Games 55624.7 Conclusions 55725. Approximate Dynamic Programming for Optimizing Oil Production 560Zheng Wen, Louis J. Durlofsky, Benjamin Van Roy, and Khalid Aziz25.1 Introduction 56025.2 Petroleum Reservoir Production Optimization Problem 56225.3 Review of Dynamic Programming and Approximate Dynamic Programming 56425.4 Approximate Dynamic Programming Algorithm for Reservoir Production Optimization 56625.5 Simulation Results 57325.6 Concluding Remarks 57826. A Learning Strategy for Source Tracking in Unstructured Environments 582Titus Appel, Rafael Fierro, Brandon Rohrer, Ron Lumia, and John Wood26.1 Introduction 58226.2 Reinforcement Learning 58326.3 Light-Following Robot 58926.4 Simulation Results 59226.5 Experimental Results 59526.6 Conclusions and Future Work 599References 599INDEX 601