Learning from Imbalanced Data Sets by Alberto Fernandez
This book provides a general and comprehensibleoverview of imbalanced learning. It contains a formal description of a problem, and focuses on its main features, and the most relevant proposed solutions. Additionally, it considersthe different scenarios in Data Science for which the imbalanced classification cancreate a real challenge.
This book stresses the gap with standard classification tasks by reviewing the casestudies and ad-hoc performance metrics that are applied in this area. It also covers thedifferent approaches that have been traditionally applied to address the binaryskewed class distribution. Specifically, it reviews cost-sensitive learning, data-levelpreprocessing methods and algorithm-level solutions, taking also into account thoseensemble-learning solutions that embed any of the former alternatives. Furthermore, itfocuses on the extension of the problem for multi-class problems, where the formerclassical methods are no longer to be applied in a straightforward way.This book also focuses on the data intrinsic characteristics that are the main causeswhich, added to the uneven class distribution, truly hinders the performance ofclassification algorithms in this scenario. Then, some notes on data reduction areprovided in order to understand the advantages related to the use of this type of approaches.
Finally this book introduces some novel areas of study that are gathering a deeper attentionon the imbalanced data issue. Specifically, it considers the classification of data streams,non-classical classification problems, and the scalability related to Big Data. Examplesof software libraries and modules to address imbalanced classification are provided.
This book is highly suitable for technical professionals, seniorundergraduate and graduatestudents in the areas of data science,computer science and engineering.It will also be useful for scientists and researchers to gain insight on the currentdevelopments in this area of study, as well as future research directions.