Optimizing Memory Layout and Instruction Order of a Finite Difference Code
Abstract
Time: 9:20 - 9:40 Current processor and graphics processor architectures heavily usedata and instruction parallelism at different levels. Floating pointoperations are grouped in vector instructions. Memory is organized ina... [ view full abstract ]
Time: 9:20 - 9:40
Current processor and graphics processor architectures heavily use
data and instruction parallelism at different levels. Floating point
operations are grouped in vector instructions. Memory is organized in
a hierarchy of registers, caches and local and distributed
memories. Many numerical algorithms tend to be memory bandwidth
limited. In this talk a finite difference stencil computation is
discussed along with several techniques to optimize the implementation
such as modified interleaved non-standard data layout, cache aware
algorithms, loop unrolling, vectorization, parallelization and
parameter tuning. This leads to performance levels much closer to
compute peak performance than automatic compiler vectorization and
optimization.
Authors
-
Gerhard Zumbusch
(Friedrich-Schiller-Universität Jena)
Topic Area
Scientific Software
Session
» Scientific Software / Education in CSE (09:00 - Tuesday, 24th October, 12th floor - Stratos)