Beställningsvara. Skickas inom 5-8 vardagar. Fri frakt för medlemmar vid köp för minst 249 kr.
Big Data is a new field, with many technological challenges to be understood in order to use it to its full potential. These challenges arise at all stages of working with Big Data, beginning with data generation and acquisition. The storage and management phase presents two critical challenges: infrastructure, for storage and transportation, and conceptual models. Finally, to extract meaning from Big Data requires complex analysis. Here the authors propose using metaheuristics as a solution to these challenges; they are first able to deal with large size problems and secondly flexible and therefore easily adaptable to different types of data and different contexts. The use of metaheuristics to overcome some of these data mining challenges is introduced and justified in the first part of the book, alongside a specific protocol for the performance evaluation of algorithms. An introduction to metaheuristics follows. The second part of the book details a number of data mining tasks, including clustering, association rules, supervised classification and feature selection, before explaining how metaheuristics can be used to deal with them. This book is designed to be self-contained, so that readers can understand all of the concepts discussed within it, and to provide an overview of recent applications of metaheuristics to knowledge discovery problems in the context of Big Data.
Clarisse DHAENENS is Professor at the University of Lille in France and belongs to a research team working with both CRIStAL Laboratory (UMR CNRS) and Inria. Laetitia JOURDAN is Professor at the University of Lille in France and belongs to a research team working with both CRIStAL Laboratory (UMR CNRS) and Inria.
Acknowledgments xiIntroduction xiiiChapter 1 Optimization and Big Data 11.1 Context of Big Data 11.1.1 Examples of situations 21.1.2 Definitions 31.1.3 Big Data challenges 51.1.4 Metaheuristics and Big Data 81.2 Knowledge discovery in Big Data 101.2.1 Data mining versus knowledge discovery 101.2.2 Main data mining tasks 121.2.3 Data mining tasks as optimization problems 161.3 Performance analysis of data mining algorithms 171.3.1 Context 171.3.2 Evaluation among one or several dataset(s) 181.3.3 Repositories and datasets 201.4 Conclusion 21Chapter 2 Metaheuristics – A Short Introduction 232.1 Introduction 242.1.1 Combinatorial optimization problems 242.1.2 Solving a combinatorial optimization problem 252.1.3 Main types of optimization methods 252.2 Common concepts of metaheuristics 262.2.1 Representation/encoding 272.2.2 Constraint satisfaction 282.2.3 Optimization criterion/objective function 282.2.4 Performance analysis 292.3 Single solution-based/local search methods 312.3.1 Neighborhood of a solution 312.3.2 Hill climbing algorithm 332.3.3 Tabu Search 342.3.4 Simulated annealing and threshold acceptance approach 352.3.5 Combining local search approaches 362.4 Population-based metaheuristics 382.4.1 Evolutionary computation 382.4.2 Swarm intelligence 412.5 Multi-objective metaheuristics 432.5.1 Basic notions in multi-objective optimization 442.5.2 Multi-objective optimization using metaheuristics 472.5.3 Performance assessment in multi-objective optimization 512.6 Conclusion 52Chapter 3 Metaheuristics and Parallel Optimization 533.1 Parallelism 533.1.1 Bit-level 533.1.2 Instruction-level parallelism 543.1.3 Task and data parallelism 543.2 Parallel metaheuristics 553.2.1 General concepts 553.2.2 Parallel single solution-based metaheuristics 553.2.3 Parallel population-based metaheuristics 573.3 Infrastructure and technologies for parallel metaheuristics 573.3.1 Distributed model 573.3.2 Hardware model 583.4 Quality measures 603.4.1 Speedup 603.4.2 Efficiency 613.4.3 Serial fraction 613.5 Conclusion 61Chapter 4 Metaheuristics and Clustering 634.1 Task description 634.1.1 Partitioning methods 654.1.2 Hierarchical methods 664.1.3 Grid-based methods 674.1.4 Density-based methods 674.2 Big Data and clustering 684.3 Optimization model 684.3.1 A combinatorial problem 694.3.2 Quality measures 694.3.3 Representation 764.4 Overview of methods 814.5 Validation 824.5.1 Internal validation 844.5.2 External validation 844.6 Conclusion 86Chapter 5 Metaheuristics and Association Rules 875.1 Task description and classical approaches 885.1.1 Initial problem 885.1.2 A priori algorithm 895.2 Optimization model 905.2.1 A combinatorial problem 905.2.2 Quality measures 905.2.3 A mono- or a multi-objective problem? 915.3 Overview of metaheuristics for the association rules mining problem 935.3.1 Generalities 935.3.2 Metaheuristics for categorical association rules 945.3.3 Evolutionary algorithms for quantitative association rules 995.3.4 Metaheuristics for fuzzy association rules 1025.4 General table 1055.5 Conclusion 107Chapter 6 Metaheuristics and (Supervised) Classification 1096.1 Task description and standard approaches 1106.1.1 Problem description 1106.1.2 K-nearest neighbor 1106.1.3 Decision trees 1116.1.4 Naive Bayes 1126.1.5 Artificial neural networks 1136.1.6 Support vector machines 1146.2 Optimization model 1146.2.1 A combinatorial problem 1146.2.2 Quality measures 1146.2.3 Methodology of performance evaluation in supervised classification 1176.3 Metaheuristics to build standard classifiers 1186.3.1 Optimization of K-NN 1186.3.2 Decision tree 1196.3.3 Optimization of ANN 1226.3.4 Optimization of SVM 1246.4 Metaheuristics for classification rules 1266.4.1 Modeling 1266.4.2 Objective function(s) 1276.4.3 Operators 1296.4.4 Algorithms 1306.5 Conclusion 132Chapter 7 On the Use of Metaheuristics for Feature Selection in Classification 1357.1 Task description 1367.1.1 Filter models 1367.1.2 Wrapper models 1377.1.3 Embedded models 1377.2 Optimization model 1387.2.1 A combinatorial optimization problem 1387.2.2 Representation 1397.2.3 Operators 1407.2.4 Quality measures 1407.2.5 Validation 1437.3 Overview of methods 1437.4 Conclusion 144Chapter 8 Frameworks 1478.1 Frameworks for designing metaheuristics 1478.1.1 Easylocal++ 1488.1.2 HeuristicLab 1488.1.3 jMetal 1498.1.4 Mallba 1498.1.5 ParadisEO 1508.1.6 ECJ 1508.1.7 OpenBeagle 1518.1.8 JCLEC 1518.2 Framework for data mining 1518.2.1 Orange 1528.2.2 R and Rattle GUI 1538.3 Framework for data mining with metaheuristics 1538.3.1 RapidMiner 1548.3.2 Weka 1548.3.3 Keel 1558.3.4 MO-Mine 1578.4 Conclusion 157Conclusion 159Bibliography 161Index 187