Preface xi
Acknowledgments xix
About the Authors xxi
1 Introduction: Multi-agent Coordination by Reinforcement Learning and Evolutionary Algorithms 1
1.1 Introduction 2
1.2 Single Agent Planning 4
1.2.1 Terminologies Used in Single Agent Planning 4
1.2.2 Single Agent Search-Based Planning Algorithms 10
1.2.2.1 Dijkstra's Algorithm 10
1.2.2.2 A (A-star) Algorithm 11
1.2.2.3 D (D-star) Algorithm 15
1.2.2.4 Planning by STRIPS-Like Language 15
1.2.3 Single Agent RL 17
1.2.3.1 Multiarmed Bandit Problem 17
1.2.3.2 DP and Bellman Equation 20
1.2.3.3 Correlation Between RL and DP 21
1.2.3.4 Single Agent Q-Learning 21
1.2.3.5 Single Agent Planning Using Q-Learning 24
1.3 Multi-agent Planning and Coordination 25
1.3.1 Terminologies Related to Multi-agent Coordination 25
1.3.2 Classification of MAS 26
1.3.3 Game Theory for Multi-agent Coordination 28
1.3.3.1 Nash Equilibrium 31
1.3.3.2 Correlated Equilibrium 36
1.3.3.3 Static Game Examples 38
1.3.4 Correlation Among RL, DP, and GT 40
1.3.5 Classification of MARL 40
1.3.5.1 Cooperative MARL 42
1.3.5.2 Competitive MARL 56
1.3.5.3 Mixed MARL 59
1.3.6 Coordination and Planning by MAQL 84
1.3.7 Performance Analysis of MAQL and MAQL-Based Coordination 85
1.4 Coordination by Optimization Algorithm 87
1.4.1 PSO Algorithm 88
1.4.2 Firefly Algorithm 91
1.4.2.1 Initialization 92
1.4.2.2 Attraction to Brighter Fireflies 92
1.4.2.3 Movement of Fireflies 93
1.4.3 Imperialist Competitive Algorithm 93
1.4.3.1 Initialization 94
1.4.3.2 Selection of Imperialists and Colonies 95
1.4.3.3 Formation of Empires 95
1.4.3.4 Assimilation of Colonies 96
1.4.3.5 Revolution 96
1.4.3.6 Imperialistic Competition 97
1.4.4 Differential Evolution Algorithm 98
1.4.4.1 Initialization 99
1.4.4.2 Mutation 99
1.4.4.3 Recombination 99
1.4.4.4 Selection 99
1.4.5 Off-line Optimization 99
1.4.6 Performance Analysis of Optimization Algorithms 99
1.4.6.1 Friedman Test 100
1.4.6.2 Iman-Davenport Test 100
1.5 Summary 101
References 101
2 Improve Convergence Speed of Multi-Agent Q-Learning for Cooperative Task Planning 111
2.1 Introduction 112
2.2 Literature Review 116
2.3 Preliminaries 118
2.3.1 Single Agent Q-learning 119
2.3.2 Multi-agent Q-learning 119
2.4 Proposed MAQL 123
2.4.1 Two Useful Properties 124
2.5 Proposed FCMQL Algorithms and Their Convergence Analysis 128
2.5.1 Proposed FCMQL Algorithms 129
2.5.2 Convergence Analysis of the Proposed FCMQL Algorithms 130
2.6 FCMQL-Based Cooperative Multi-agent Planning 131
2.7 Experiments and Results 134
2.8 Conclusions 141
2.9 Summary 143
2.A More Details on Experimental Results 144
2.A.1 Additional Details of Experiment 2.1 144
2.A.2 Additional Details of Experiment 2.2 159
2.A.3 Additional Details of Experiment 2.4 161
References 162
3 Consensus Q-Learning for Multi-agent Cooperative Planning 167
3.1 Introduction 167
3.2 Preliminaries 169
3.2.1 Single Agent Q-Learning 169
3.2.2 Equilibrium-Based Multi-agent Q-Learning 170
3.3 Consensus 171
3.4 Proposed CoQL and Planning 173
3.4.1 Consensus Q-Learning 173
3.4.2 Consensus-Based Multi-robot Planning 175
3.5 Experiments and Results 176
3.5.1 Experimental Setup 176
3.5.2 Experiments for CoQL 177
3.5.3 Experiments for Consensus-Based Planning 177
3.6 Conclusions 179
3.7 Summary 180
References 180
4 An Efficient Computing of Correlated Equilibrium for Cooperative Q-Learning-Based Multi-Robot Planning 183
4.1 Introduction 183
4.2 Single-Agent Q-Learning and Equilibrium-Based MAQL 186
4.2.1 Single Agent Q-Learning 187
4.2.2 Equilibrium-Based MAQL 187
4.3 Proposed Cooperative MAQL and Planning 188
4.3.1 Proposed Schemes with Their Applicability 189
4.3.2 Immediate Rewards in Scheme-I and -II 190
4.3.3 Scheme-I-Induced MAQL 190
4.3.4 Scheme-II-Induced MAQL 193
4.3.5 Algorithms for Scheme-I and II 200
4.3.6 Constraint QL-I/ QL-II(C QL-I/C QL-II) 201
4.3.7 Convergence 201
4.3.8 Multi-agent Planning 207
4.4 Complexity Analysis 209
4.4.1 Complexity of CQL 210
4.4.1.1 Space Complexity 210
4.4.1.2 Time Complexity 210
4.4.2 Complexity of the Proposed Algorithms 210
4.4.2.1 Space Complexity 211
4.4.2.2 Time Complexity 211
4.4.3 Complexity Comparison 213
4.4.3.1 Space Complexity 213
4.4.3.2 Time Complexity 214
4.5 Simulation and Experimental Results 215
4.5.1 Experimental Platform 215
4.5.1.1 Simulation 215
4.5.1.2 Hardware 216
4.5.2 Experimental Approach 217
4.5.2.1 Learning Phase 217
4.5.2.2 Planning Phase 217
4.5.3 Experimental Results 218
4.6 Conclusion 226
4.7 Summary 226
4.A Supporting Algorithm and Mathematical Analysis 227
References 228
5 A Modified Imperialist Competitive Algorithm for Multi-Robot Stick-Carrying Application 233
5.1 Introduction 234
5.2 Problem Formulation for Multi-Robot Stick-Carrying 239
5.3 Proposed Hybrid Algorithm 242
5.3.1 An Overview of ICA 242
5.3.1.1 Initialization 242
5.3.1.2 Selection of Imperialists and Colonies 243
5.3.1.3 Formation of Empires 243
5.3.1.4 Assimilation of Colonies 244
5.3.1.5 Revolution 244
5.3.1.6 Imperialistic Competition 245
5.4 An Overview of FA 247
5.4.1 Initialization 247
5.4.2 Attraction to Brighter Fireflies 247
5.4.3 Movement of Fireflies 248
5.5 Proposed ICFA 248
5.5.1 Assimilation of Colonies 251
5.5.1.1 Attraction to Powerful Colonies 251
5.5.1.2 Modification of Empire Behavior 251
5.5.1.3 Union of Empires 252
5.6 Simulation Results 254
5.6.1 Comparative Framework 254
5.6.2 Parameter Settings 254
5.6.3 Analysis on Explorative Power of ICFA 254
5.6.4 Comparison of Quality of the Final Solution 255
5.6.5 Performance Analysis 258
5.7 Computer Simulation and Experiment 265
5.7.1 Average Total Path Deviation (ATPD) 265
5.7.2 Average Uncovered Target Distance (AUTD) 265
5.7.3 Experimental Setup in Simulation Environment 265
5.7.4 Experimental Results in Simulation Environment 266
5.7.5 Experimental Setup with Khepera Robots 268
5.7.6 Experimental Results with Khepera Robots 269
5.8 Conclusion 270
5.9 Summary 272
5.A Additional Comparison of ICFA 272
References 275
6 Conclusions and Future Directions 281
6.1 Conclusions 281
6.2 Future Directions 283
Index 285