Del 96 - Wiley Series on Parallel and Distributed Computing

High-Performance Computing on Complex Environments

Inbunden, Engelska, 2014

1 819 kr

Beställningsvara. Skickas inom 7-10 vardagar

Fri frakt för medlemmar vid köp för minst 249 kr.

With recent changes in multicore and general-purpose computing on graphics processing units, the way parallel computers are used and programmed has drastically changed. It is important to provide a comprehensive study on how to use such machines written by specialists of the domain. The book provides recent research results in high-performance computing on complex environments, information on how to efficiently exploit heterogeneous and hierarchical architectures and distributed systems, detailed studies on the impact of applying heterogeneous computing practices to real problems, and applications varying from remote sensing to tomography. The content spans topics such as Numerical Analysis for Heterogeneous and Multicore Systems; Optimization of Communication for High Performance Heterogeneous and Hierarchical Platforms; Efficient Exploitation of Heterogeneous Architectures, Hybrid CPU+GPU, and Distributed Systems; Energy Awareness in High-Performance Computing; and Applications of Heterogeneous High-Performance Computing. • Covers cutting-edge research in HPC on complex environments, following an international collaboration of members of the ComplexHPC • Explains how to efficiently exploit heterogeneous and hierarchical architectures and distributed systems • Twenty-three chapters and over 100 illustrations cover domains such as numerical analysis, communication and storage, applications, GPUs and accelerators, and energy efficiency

Produktinformation

Utgivningsdatum2014-07-01
Mått163 x 241 x 28 mm
Vikt839 g
FormatInbunden
SpråkEngelska
SerieWiley Series on Parallel and Distributed Computing
Antal sidor512
FörlagJohn Wiley & Sons Inc
ISBN9781118712054

Tillhör följande kategorier

Systemvetenskap och AI inom Data och it

EMMANUEL JEANNOT is a Senior Research Scientist at INRIA. He received his PhD in computer science from École Normale Supérieur de Lyon. His main research interests are processes placement, scheduling for heterogeneous environments and grids, data redistribution, algorithms, and models for parallel machines. JULIUS ILINSKAS is a Principal Researcher and a Head of Department at Vilnius University, Lithuania. His research interests include parallel computing, optimization, data analysis, and visualization.

Contributors xxiiiPreface xxviiPart I Introduction 11. Summary of the Open European Network for High-Performance Computing in Complex Environments 3Emmanuel Jeannot and Julius ilinskas1.1 Introduction and Vision 41.2 Scientific Organization 61.2.1 Scientific Focus 61.2.2 Working Groups 61.3 Activities of the Project 61.3.1 Spring Schools 61.3.2 International Workshops 71.3.3 Working Groups Meetings 71.3.4 Management Committee Meetings 71.3.5 Short-Term Scientific Missions 71.4 Main Outcomes of the Action 71.5 Contents of the Book 8Acknowledgment 10Part II Numerical Analysis for Heterogeneous and Multicore Systems 112. On the Impact of the Heterogeneous Multicore and Many-Core Platforms on Iterative Solution Methods and Preconditioning Techniques 13Dimitar Lukarski and Maya Neytcheva2.1 Introduction 142.2 General Description of Iterative Methods and Preconditioning 162.2.1 Basic Iterative Methods 162.2.2 Projection Methods: CG and GMRES 182.3 Preconditioning Techniques 202.4 Defect-Correction Technique 212.5 Multigrid Method 222.6 Parallelization of Iterative Methods 222.7 Heterogeneous Systems 232.7.1 Heterogeneous Computing 242.7.2 Algorithm Characteristics and Resource Utilization 252.7.3 Exposing Parallelism 262.7.4 Heterogeneity in Matrix Computation 262.7.5 Setup of Heterogeneous Iterative Solvers 272.8 Maintenance and Portability 292.9 Conclusion 30Acknowledgments 31References 313. Efficient Numerical Solution of 2D Diffusion Equation on Multicore Computers 33Matja Depolli, Gregor Kosec, and Roman Trobec3.1 Introduction 343.2 Test Case 353.2.1 Governing Equations 353.2.2 Solution Procedure 363.3 Parallel Implementation 393.3.1 Intel PCM Library 393.3.2 OpenMP 403.4 Results 413.4.1 Results of Numerical Integration 413.4.2 Parallel Efficiency 423.5 Discussion 453.6 Conclusion 47Acknowledgment 47References 474. Parallel Algorithms for Parabolic Problems on Graphs in Neuroscience 51Natalija Tumanova and Raimondas Ciegis4.1 Introduction 514.2 Formulation of the Discrete Model 534.2.1 The 𝜃-Implicit Discrete Scheme 554.2.2 The Predictor–Corrector Algorithm I 574.2.3 The Predictor–Corrector Algorithm II 584.3 Parallel Algorithms 594.3.1 Parallel 𝜃-Implicit Algorithm 594.3.2 Parallel Predictor–Corrector Algorithm I 624.3.3 Parallel Predictor–Corrector Algorithm II 634.4 Computational Results 634.4.1 Experimental Comparison of Predictor–Corrector Algorithms 664.4.2 Numerical Experiment of Neuron Excitation 684.5 Conclusions 69Acknowledgments 70References 70Part III Communication and Storage Considerations in High-Performance Computing 735. An Overview of Topology Mapping Algorithms and Techniques in High-Performance Computing 75Torsten Hoefler, Emmanuel Jeannot, and Guillaume Mercier5.1 Introduction 765.2 General Overview 765.2.1 A Key to Scalability: Data Locality 775.2.2 Data Locality Management in Parallel Programming Models 775.2.3 Virtual Topology: Definition and Characteristics 785.2.4 Understanding the Hardware 795.3 Formalization of the Problem 795.4 Algorithmic Strategies for Topology Mapping 815.4.1 Greedy Algorithm Variants 815.4.2 Graph Partitioning 825.4.3 Schemes Based on Graph Similarity 825.4.4 Schemes Based on Subgraph Isomorphism 825.5 Mapping Enforcement Techniques 825.5.1 Resource Binding 835.5.2 Rank Reordering 835.5.3 Other Techniques 845.6 Survey of Solutions 855.6.1 Algorithmic Solutions 855.6.2 Existing Implementations 855.7 Conclusion and Open Problems 89Acknowledgment 90References 906. Optimization of Collective Communication for Heterogeneous HPC Platforms 95Kiril Dichev and Alexey Lastovetsky6.1 Introduction 956.2 Overview of Optimized Collectives and Topology-Aware Collectives 976.3 Optimizations of Collectives on Homogeneous Clusters 986.4 Heterogeneous Networks 996.4.1 Comparison to Homogeneous Clusters 996.5 Topology- and Performance-Aware Collectives 1006.6 Topology as Input 1016.7 Performance as Input 1026.7.1 Homogeneous Performance Models 1036.7.2 Heterogeneous Performance Models 1056.7.3 Estimation of Parameters of Heterogeneous Performance Models 1066.7.4 Other Performance Models 1066.8 Non-MPI Collective Algorithms for Heterogeneous Networks 1066.8.1 Optimal Solutions with Multiple Spanning Trees 1076.8.2 Adaptive Algorithms for Efficient Large-Message Transfer 1076.8.3 Network Models Inspired by BitTorrent 1086.9 Conclusion 111Acknowledgments 111References 1117. Effective Data Access Patterns on Massively Parallel Processors 115Gabriele Capannini, Ranieri Baraglia, Fabrizio Silvestri, and Franco Maria Nardini7.1 Introduction 1157.2 Architectural Details 1167.3 K-Model 1177.3.1 The Architecture 1177.3.2 Cost and Complexity Evaluation 1187.3.3 Efficiency Evaluation 1197.4 Parallel Prefix Sum 1207.4.1 Experiments 1257.5 Bitonic Sorting Networks 1267.5.1 Experiments 1317.6 Final Remarks 132Acknowledgments 133References 1338. Scalable Storage I/O Software for Blue Gene Architectures 135Florin Isaila, Javier Garcia, and Jesús Carretero8.1 Introduction 1358.2 Blue Gene System Overview 1368.2.1 Blue Gene Architecture 1368.2.2 Operating System Architecture 1368.3 Design and Implementation 1388.3.1 The Client Module 1398.3.2 The I/O Module 1418.4 Conclusions and Future Work 142Acknowledgments 142References 142Part IV Efficient Exploitation af Heterogeneous Architectures 1459. Fair Resource Sharing for Dynamic Scheduling of Workflows on Heterogeneous Systems 147Hamid Arabnejad, Jorge G. Barbosa, and Frédéric Suter9.1 Introduction 1489.1.1 Application Model 1489.1.2 System Model 1519.1.3 Performance Metrics 1529.2 Concurrent Workflow Scheduling 1539.2.1 Offline Scheduling of Concurrent Workflows 1549.2.2 Online Scheduling of Concurrent Workflows 1559.3 Experimental Results and Discussion 1609.3.1 DAG Structure 1609.3.2 Simulated Platforms 1609.3.3 Results and Discussion 1629.4 Conclusions 165Acknowledgments 166References 16610. Systematic Mapping of Reed–Solomon Erasure Codes on Heterogeneous Multicore Architectures 169Roman Wyrzykowski, Marcin Wozniak, and Lukasz Kuczynski10.1 Introduction 16910.2 Related Works 17110.3 Reed–Solomon Codes and Linear Algebra Algorithms 17210.4 Mapping Reed–Solomon Codes on Cell/B.E. Architecture 17310.4.1 Cell/B.E. Architecture 17310.4.2 Basic Assumptions for Mapping 17410.4.3 Vectorization Algorithm and Increasing its Efficiency 17510.4.4 Performance Results 17710.5 Mapping Reed–Solomon Codes on Multicore GPU Architectures 17810.5.1 Parallelization of Reed–Solomon Codes on GPU Architectures 17810.5.2 Organization of GPU Threads 18010.6 Methods of Increasing the Algorithm Performance on GPUs 18110.6.1 Basic Modifications 18110.6.2 Stream Processing 18210.6.3 Using Shared Memory 18410.7 GPU Performance Evaluation 18510.7.1 Experimental Results 18510.7.2 Performance Analysis using the Roofline Model 18710.8 Conclusions and Future Works 190Acknowledgments 191References 19111. Heterogeneous Parallel Computing Platforms and Tools for Compute-Intensive Algorithms: A Case Study 193Daniele D’Agostino, Andrea Clematis, and Emanuele Danovaro11.1 Introduction 19411.2 A Low-Cost Heterogeneous Computing Environment 19611.2.1 Adopted Computing Environment 19911.3 First Case Study: The N-Body Problem 20011.3.1 The Sequential N-Body Algorithm 20111.3.2 The Parallel N-Body Algorithm for Multicore Architectures 20311.3.3 The Parallel N-Body Algorithm for CUDA Architectures 20411.4 Second Case Study: The Convolution Algorithm 20611.4.1 The Sequential Convolver Algorithm 20611.4.2 The Parallel Convolver Algorithm for Multicore Architectures 20711.4.3 The Parallel Convolver Algorithm for GPU Architectures 20811.5 Conclusions 211Acknowledgments 212References 21212. Efficient Application of Hybrid Parallelism in Electromagnetism Problems 215Alejandro Álvarez-Melcón, Fernando D. Quesada, Domingo Giménez, Carlos Pérez-Alcaraz, José-Ginés Picón, and Tomás Ramírez12.1 Introduction 21512.2 Computation of Green’s functions in Hybrid Systems 21612.2.1 Computation in a Heterogeneous Cluster 21712.2.2 Experiments 21812.3 Parallelization in Numa Systems of a Volume Integral Equation Technique 22212.3.1 Experiments 22212.4 Autotuning Parallel Codes 22612.4.1 Empirical Autotuning 22712.4.2 Modeling the Linear Algebra Routines 22912.5 Conclusions and Future Research 230Acknowledgments 231References 232Part V CPU + GPU Coprocessing 23513. Design and Optimization of Scientific Applications for Highly Heterogeneous and Hierarchical HPC Platforms Using Functional Computation Performance Models 237David Clarke, Aleksandar Ilic, Alexey Lastovetsky, Vladimir Rychkov, Leonel Sousa, and Ziming Zhong13.1 Introduction 23813.2 Related Work 24113.3 Data Partitioning Based on Functional Performance Model 24313.4 Example Application: Heterogeneous Parallel Matrix Multiplication 24513.5 Performance Measurement on CPUs/GPUs System 24713.6 Functional Performance Models of Multiple Cores and GPUs 24813.7 FPM-Based Data Partitioning on CPUs/GPUs System 25013.8 Efficient Building of Functional Performance Models 25113.9 FPM-Based Data Partitioning on Hierarchical Platforms 25313.10 Conclusion 257Acknowledgments 259References 25914. Efficient Multilevel Load Balancing on Heterogeneous CPU + GPU Systems 261Aleksandar Ilic and Leonel Sousa14.1 Introduction: Heterogeneous CPU + GPU Systems 26214.1.1 Open Problems and Specific Contributions 26314.2 Background and Related Work 26514.2.1 Divisible Load Scheduling in Distributed CPU-Only Systems 26514.2.2 Scheduling in Multicore CPU and Multi-GPU Environments 26814.3 Load Balancing Algorithms for Heterogeneous CPU + GPU Systems 26914.3.1 Multilevel Simultaneous Load Balancing Algorithm 27014.3.2 Algorithm for Multi-Installment Processing with Multidistributions 27314.4 Experimental Results 27514.4.1 MSLBA Evaluation: Dense Matrix Multiplication Case Study 27514.4.2 AMPMD Evaluation: 2D FFT Case Study 27714.5 Conclusions 279Acknowledgments 280References 28015. The All-Pair Shortest-Path Problem in Shared-Memory Heterogeneous Systems 283Hector Ortega-Arranz, Yuri Torres, Diego R. Llanos, and Arturo Gonzalez-Escribano15.1 Introduction 28315.2 Algorithmic Overview 28515.2.1 Graph Theory Notation 28515.2.2 Dijkstra’s Algorithm 28615.2.3 Parallel Version of Dijkstra’s Algorithm 28715.3 CUDA Overview 28715.4 Heterogeneous Systems and Load Balancing 28815.5 Parallel Solutions to The APSP 28915.5.1 GPU Implementation 28915.5.2 Heterogeneous Implementation 29015.6 Experimental Setup 29115.6.1 Methodology 29115.6.2 Target Architectures 29215.6.3 Input Set Characteristics 29215.6.4 Load-Balancing Techniques Evaluated 29215.7 Experimental Results 29315.7.1 Complete APSP 29315.7.2 512-Source-Node-to-All Shortest Path 29515.7.3 Experimental Conclusions 29615.8 Conclusions 297Acknowledgments 297References 297Part VI Efficient Exploitation of Distributed Systems 30116. Resource Management for HPC on the Cloud 303Marc E. Frincu and Dana Petcu16.1 Introduction 30316.2 On the Type of Applications for HPC and HPC2 30516.3 HPC on the Cloud 30616.3.1 General PaaS Solutions 30616.3.2 On-Demand Platforms for HPC 31016.4 Scheduling Algorithms for HPC2 31116.5 Toward an Autonomous Scheduling Framework 31216.5.1 Autonomous Framework for RMS 31316.5.2 Self-Management 31516.5.3 Use Cases 31716.6 Conclusions 319Acknowledgment 320References 32017. Resource Discovery in Large-Scale Grid Systems 323Konstantinos Karaoglanoglou and Helen Karatza17.1 Introduction and Background 32317.1.1 Introduction 32317.1.2 Resource Discovery in Grids 32417.1.3 Background 32517.2 The Semantic Communities Approach 32517.2.1 Grid Resource Discovery Using Semantic Communities 32517.2.2 Grid Resource Discovery Based on Semantically Linked Virtual Organizations 32717.3 The P2P Approach 32917.3.1 On Fully Decentralized Resource Discovery in Grid Environments Using a P2P Architecture 32917.3.2 P2P Protocols for Resource Discovery in the Grid 33017.4 The Grid-Routing Transferring Approach 33317.4.1 Resource Discovery Based on Matchmaking Routers 33317.4.2 Acquiring Knowledge in a Large-Scale Grid System 33517.5 Conclusions 337Acknowledgment 338References 338Part VII Energy Awareness in High-Performance Computing 34118. Energy-Aware Approaches for HPC Systems 343Robert Basmadjian, Georges Da Costa, Ghislain Landry Tsafack Chetsa, Laurent Lefevre, Ariel Oleksiak, and Jean-Marc Pierson18.1 Introduction 34418.2 Power Consumption of Servers 34518.2.1 Server Modeling 34618.2.2 Power Prediction Models 34718.3 Classification and Energy Profiles of HPC Applications 35418.3.1 Phase Detection 35618.3.2 Phase Identification 35818.4 Policies and Leverages 35918.5 Conclusion 360Acknowledgements 361References 36119. Strategies for Increased Energy Awareness in Cloud Federations 365Gabor Kecskemeti, AttilaKertesz, Attila Cs. Marosi, and Zsolt Nemeth19.1 Introduction 36519.2 Related Work 36719.3 Scenarios 36919.3.1 Increased Energy Awareness Across Multiple Data Centers within a Single Administrative Domain 36919.3.2 Energy Considerations in Commercial Cloud Federations 37219.3.3 Reduced Energy Footprint of Academic Cloud Federations 37419.4 Energy-Aware Cloud Federations 37419.4.1 Availability of Energy-Consumption-Related Information 37519.4.2 Service Call Scheduling at the Meta-Brokering Level of FCM 37619.4.3 Service Call Scheduling and VM Management at the Cloud-Brokering Level of FCM 37719.5 Conclusions 379Acknowledgments 380References 38020. Enabling Network Security in HPC Systems Using Heterogeneous CMPs 383Ozcan Ozturk and Suleyman Tosun20.1 Introduction 38420.2 Related Work 38620.3 Overview of Our Approach 38720.3.1 Heterogeneous CMP Architecture 38720.3.2 Network Security Application Behavior 38820.3.3 High-Level View 38920.4 Heterogeneous CMP Design for Network Security Processors 39020.4.1 Task Assignment 39020.4.2 ILP Formulation 39120.4.3 Discussion 39320.5 Experimental Evaluation 39420.5.1 Setup 39420.5.2 Results 39520.6 Concluding Remarks 397Acknowledgments 397References 397Part VIII Applications of Heterogeneous High-Performance Computing 40121. Toward a High-Performance Distributed CBIR System for Hyperspectral Remote Sensing Data: A Case Study in Jungle Computing 403Timo van Kessel, NielsDrost, Jason Maassen, Henri E. Bal, Frank J. Seinstra, and Antonio J. Plaza21.1 Introduction 40421.2 CBIR For Hyperspectral Imaging Data 40721.2.1 Spectral Unmixing 40721.2.2 Proposed CBIR System 40921.3 Jungle Computing 41021.3.1 Jungle Computing: Requirements 41121.4 IBIS and Constellation 41221.5 System Design and Implementation 41521.5.1 Endmember Extraction 41821.5.2 Query Execution 41821.5.3 Equi-Kernels 41921.5.4 Matchmaking 42021.6 Evaluation 42021.6.1 Performance Evaluation 42121.7 Conclusions 426Acknowledgments 426References 42622. Taking Advantage of Heterogeneous Platforms in Image and Video Processing 429Sidi A. Mahmoudi, Erencan Ozkan, Pierre Manneback, and Suleyman Tosun22.1 Introduction 43022.2 Related Work 43122.2.1 Image Processing on GPU 43122.2.2 Video Processing on GPU 43222.2.3 Contribution 43322.3 Parallel Image Processing on GPU 43322.3.1 Development Scheme for Image Processing on GPU 43322.3.2 GPU Optimization 43422.3.3 GPU Implementation of Edge and Corner Detection 43422.3.4 Performance Analysis and Evaluation 43422.4 Image Processing on Heterogeneous Architectures 43722.4.1 Development Scheme for Multiple Image Processing 43722.4.2 Task Scheduling within Heterogeneous Architectures 43822.4.3 Optimization Within Heterogeneous Architectures 43822.5 Video Processing on GPU 43822.5.1 Development Scheme for Video Processing on GPU 43922.5.2 GPU Optimizations 44022.5.3 GPU Implementations 44022.5.4 GPU-Based Silhouette Extraction 44022.5.5 GPU-Based Optical Flow Estimation 44022.5.6 Result Analysis 44322.6 Experimental Results 44422.6.1 Heterogeneous Computing for Vertebra Segmentation 44422.6.2 GPU Computing for Motion Detection Using a Moving Camera 44522.7 Conclusion 447Acknowledgment 448References 44823. Real-Time Tomographic Reconstruction Through CPU + GPU Coprocessing 451José Ignacio Agulleiro, Francisco Vazquez, Ester M. Garzon, and Jose J. Fernandez23.1 Introduction 45223.2 Tomographic Reconstruction 45323.3 Optimization of Tomographic Reconstruction for CPUs and for GPUs 45523.4 Hybrid CPU + GPU Tomographic Reconstruction 45723.5 Results 45923.6 Discussion and Conclusion 461Acknowledgments 463References 463Index 467