Del 82 - Wiley Series on Parallel and Distributed Computing

Algorithms and Parallel Computing

Inbunden, Engelska, 2011

1 739 kr

Beställningsvara. Skickas inom 10-15 vardagar. Fri frakt för medlemmar vid köp för minst 249 kr.

There is a software gap between the hardware potential and the performance that can be attained using today's software parallel program development tools. The tools need manual intervention by the programmer to parallelize the code. Programming a parallel computer requires closely studying the target algorithm or application, more so than in the traditional sequential programming we have all learned. The programmer must be aware of the communication and data dependencies of the algorithm or application. This book provides the techniques to explore the possible ways to program a parallel computer for a given application.

Produktinformation

Utgivningsdatum2011-03-18
Mått163 x 241 x 24 mm
Vikt644 g
FormatInbunden
SpråkEngelska
SerieDel 82 i Wiley Series on Parallel and Distributed Computing
Antal sidor368
FörlagJohn Wiley & Sons Inc
ISBN9780470902103

Tillhör följande kategorier

Hårdvara inom Data och it

Fayez Gebali, PhD, has taught at the University of Victoria since 1984 and has served as the Associate Dean of Engineering for undergraduate programs since 2002. He has contributed to dozens of journals and technical reports and has completed four books. Dr. Gebali's primary research interests include VLSI design, processor array design, algorithms for computer arithmetic, and communication system modeling.

Preface xiiiList of Acronyms xix1 Introduction 11.1 Introduction 11.2 Toward Automating Parallel Programming 21.3 Algorithms 41.4 Parallel Computing Design Considerations 121.5 Parallel Algorithms and Parallel Architectures 131.6 Relating Parallel Algorithm and Parallel Architecture 141.7 Implementation of Algorithms: A Two-Sided Problem 141.8 Measuring Benefi ts of Parallel Computing 151.9 Amdahl’s Law for Multiprocessor Systems 191.10 Gustafson–Barsis’s Law 211.11 Applications of Parallel Computing 222 Enhancing Uniprocessor Performance 292.1 Introduction 292.2 Increasing Processor Clock Frequency 302.3 Parallelizing ALU Structure 302.4 Using Memory Hierarchy 332.5 Pipelining 392.6 Very Long Instruction Word (VLIW) Processors 442.7 Instruction-Level Parallelism (ILP) and Superscalar Processors 452.8 Multithreaded Processor 493 Parallel Computers 533.1 Introduction 533.2 Parallel Computing 533.3 Shared-Memory Multiprocessors (Uniform Memory Access [UMA]) 543.4 Distributed-Memory Multiprocessor (Nonuniform Memory Access [NUMA]) 563.5 SIMD Processors 573.6 Systolic Processors 573.7 Cluster Computing 603.8 Grid (Cloud) Computing 603.9 Multicore Systems 613.10 SM 623.11 Communication Between Parallel Processors 643.12 Summary of Parallel Architectures 674 Shared-Memory Multiprocessors 694.1 Introduction 694.2 Cache Coherence and Memory Consistency 704.3 Synchronization and Mutual Exclusion 765 Interconnection Networks 835.1 Introduction 835.2 Classification of Interconnection Networks by Logical Topologies 845.3 Interconnection Network Switch Architecture 916 Concurrency Platforms 1056.1 Introduction 1056.2 Concurrency Platforms 1056.3 Cilk++ 1066.4 OpenMP 1126.5 Compute Unifi ed Device Architecture (CUDA) 1227 Ad Hoc Techniques for Parallel Algorithms 1317.1 Introduction 1317.2 Defining Algorithm Variables 1337.3 Independent Loop Scheduling 1337.4 Dependent Loops 1347.5 Loop Spreading for Simple Dependent Loops 1357.6 Loop Unrolling 1357.7 Problem Partitioning 1367.8 Divide-and-Conquer (Recursive Partitioning) Strategies 1377.9 Pipelining 1398 Nonserial–Parallel Algorithms 1438.1 Introduction 1438.2 Comparing DAG and DCG Algorithms 1438.3 Parallelizing NSPA Algorithms Represented by a DAG 1458.4 Formal Technique for Analyzing NSPAs 1478.5 Detecting Cycles in the Algorithm 1508.6 Extracting Serial and Parallel Algorithm Performance Parameters 1518.7 Useful Theorems 1538.8 Performance of Serial and Parallel Algorithms on Parallel Computers 1569 z-Transform Analysis 1599.1 Introduction 1599.2 Definition of z-Transform 1599.3 The 1-D FIR Digital Filter Algorithm 1609.4 Software and Hardware Implementations of the z-Transform 1619.5 Design 1: Using Horner’s Rule for Broadcast Input and Pipelined Output 1629.6 Design 2: Pipelined Input and Broadcast Output 1639.7 Design 3: Pipelined Input and Output 16410 Dependence Graph Analysis 16710.1 Introduction 16710.2 The 1-D FIR Digital Filter Algorithm 16710.3 The Dependence Graph of an Algorithm 16810.4 Deriving the Dependence Graph for an Algorithm 16910.5 The Scheduling Function for the 1-D FIR Filter 17110.6 Node Projection Operation 17710.7 Nonlinear Projection Operation 17910.8 Software and Hardware Implementations of the DAG Technique 18011 Computational Geometry Analysis 18511.1 Introduction 18511.2 Matrix Multiplication Algorithm 18511.3 The 3-D Dependence Graph and Computation Domain D 18611.4 The Facets and Vertices of D 18811.5 The Dependence Matrices of the Algorithm Variables 18811.6 Nullspace of Dependence Matrix: The Broadcast Subdomain B 18911.7 Design Space Exploration: Choice of Broadcasting versus Pipelining Variables 19211.8 Data Scheduling 19511.9 Projection Operation Using the Linear Projection Operator 20011.10 Effect of Projection Operation on Data 20511.11 The Resulting Multithreaded/Multiprocessor Architecture 20611.12 Summary of Work Done in this Chapter 20712 Case Study: One-Dimensional IIR Digital Filters 20912.1 Introduction 20912.2 The 1-D IIR Digital Filter Algorithm 20912.3 The IIR Filter Dependence Graph 20912.4 z-Domain Analysis of 1-D IIR Digital Filter Algorithm 21613 Case Study: Two- and Three-Dimensional Digital Filters 21913.1 Introduction 21913.2 Line and Frame Wraparound Problems 21913.3 2-D Recursive Filters 22113.4 3-D Digital Filters 22314 Case Study: Multirate Decimators and Interpolators 22714.1 Introduction 22714.2 Decimator Structures 22714.3 Decimator Dependence Graph 22814.4 Decimator Scheduling 23014.5 Decimator DAG for s1 = [1 0] 23114.6 Decimator DAG for s2 = [1 −1] 23314.7 Decimator DAG for s3 = [1 1] 23514.8 Polyphase Decimator Implementations 23514.9 Interpolator Structures 23614.10 Interpolator Dependence Graph 23714.11 Interpolator Scheduling 23814.12 Interpolator DAG for s1 = [1 0] 23914.13 Interpolator DAG for s2 = [1 −1] 24114.14 Interpolator DAG for s3 = [1 1] 24314.15 Polyphase Interpolator Implementations 24315 Case Study: Pattern Matching 24515.1 Introduction 24515.2 Expressing the Algorithm as a Regular Iterative Algorithm (RIA) 24515.3 Obtaining the Algorithm Dependence Graph 24615.4 Data Scheduling 24715.5 DAG Node Projection 24815.6 DESIGN 1: Design Space Exploration When s n[1 1]t 24915.7 DESIGN 2: Design Space Exploration When s n[1 −1]t 25215.8 DESIGN 3: Design Space Exploration When s = [1 0]t 25316 Case Study: Motion Estimation for Video Compression 25516.1 Introduction 25516.2 FBMAs 25616.3 Data Buffering Requirements 25716.4 Formulation of the FBMA 25816.5 Hierarchical Formulation of Motion Estimation 25916.6 Hardware Design of the Hierarchy Blocks 26117 Case Study: Multiplication over GF(2m) 26717.1 Introduction 26717.2 The Multiplication Algorithm in GF(2m) 26817.3 Expressing Field Multiplication as an RIA 27017.4 Field Multiplication Dependence Graph 27017.5 Data Scheduling 27117.6 DAG Node Projection 27317.7 Design 1: Using d1 = [1 0]t 27517.8 Design 2: Using d2 = [1 1]t 27517.9 Design 3: Using d3 = [1 −1]t 27717.10 Applications of Finite Field Multipliers 27718 Case Study: Polynomial Division over GF(2) 27918.1 Introduction 27918.2 The Polynomial Division Algorithm 27918.3 The LFSR Dependence Graph 28118.4 Data Scheduling 28218.5 DAG Node Projection 28318.6 Design 1: Design Space Exploration When s1 = [1 −1] 28418.7 Design 2: Design Space Exploration When s2 = [1 0] 28618.8 Design 3: Design Space Exploration When s3 = [1 −0.5] 28918.9 Comparing the Three Designs 29119 The Fast Fourier Transform 29319.1 Introduction 29319.2 Decimation-in-Time FFT 29519.3 Pipeline Radix-2 Decimation-in-Time FFT Processor 29819.4 Decimation-in-Frequency FFT 29919.5 Pipeline Radix-2 Decimation-in-Frequency FFT Processor 30320 Solving Systems of Linear Equations 30520.1 Introduction 30520.2 Special Matrix Structures 30520.3 Forward Substitution (Direct Technique) 30920.4 Back Substitution 31220.5 Matrix Triangularization Algorithm 31220.6 Successive over Relaxation (SOR) (Iterative Technique) 31720.7 Problems 32121 Solving Partial Differential Equations Using Finite Difference Method 32321.1 Introduction 32321.2 FDM for 1-D Systems 324References 331Index 337