Software Optimization for High Performance Computing
(NOTE: Each chapter begins with an Introduction and concludes with a Summary.) 1. Introduction. Hardware Overview-Your Work Area. Software Techniques-The Tools. Applications-Using the Tools. I. HARDWARE OVERVIEW-YOUR WORK AREA. 2. Processors: The Core of High Performance Computing. Types. Pipelining. Instruction Length. Registers. Functional Units. CISC and RISC Processors. Vector Processors. VLIW. 3. Data Storage. Caches. Virtual Memory Issues. Memory. Input/Output Devices. I/O Performance Tips for Application Writers. 4. An Overview of Parallel Processing. Parallel Models. Hardware Infrastructures for Parallelism. Control of Your Own Locality. II. SOFTWARE TECHNIQUES-THE TOOLS. 5. How the Compiler Can Help and Hinder Performances. Compiler Terminology. Compiler Options. Compiler Directives and Pragmas. Metrics. Compiler Optimizations. Interprocedural Optimization. Change of Algorithm. 6. Predicting and Measuring Performance. Timers. Profilers. Predicting Performance. 7. Is High Performance Computing Language Dependent Pointers and Aliasing. Complex Numbers. Subroutine or Function Call Overhead. Standard Library Routines. Odds and Ends. 8. Parallel Processing-An Algorithmic Approach. Process Parallelism. Thread Parallelism. Parallelism and I/O. Memory Allocation, ccNUMA, and Performance. Compiler Directives. The Message Passing Interface (MPI). III. APPLICATIONS-USING THE TOOLS. 9. High Performance Libraries. Linear Algebra Libraries and APIs. Signal Processing Libraries and APIs. Self-Tuning Libraries. Commercial Libraries. 10. Mathematical Kernels: The Building Blocks of High Performance. Building Blocks. BLAS. Scalar Optimization. Vector Operations. Matrix Copy and Transpose. BLAS and Performance. Winograds Matrix-Matrix Multiplication. Complex Matrix-Matrix Multiplication with Three Real Multiplications. Strassens Matrix-Matrix Multiplication. 11. Faster Solutions for Systems of Equations. A Simple Example. LU Factorization. Cholesky Factorization. Factorization and Parallelization. Forward-Backward Substitution (FBS). Sparse Direct Systems of Equations. Iterative Techniques. 12. High Performance Algorithms and Approaches for Signal Processing. Convolutions and Correlations. DFTs/FFTs. The Relationship Between Convolutions and FFTs. Index.