by Tom Deakin and Timothy G. Mattson
GPUs are a standard fixture in high performance computing systems. You just aren't a well rounded HPC programmer if you cannot write code that runs on a GPU. Unfortunately, the landscape of GPU programming models is confusing. There is CUDA, SYCL, HIP, OpenACC, OpenCL, and more. Compared with multithreading and cluster computing, where there are single dominant programming models (OpenMP and MPI respectively), GPU programming is just plain confusing. We believe the solution to this GPU-confusion is OpenMP.
OpenMP is widely supported across different GPU platforms. OpenMP, of course, also supports multithreading. It is unique among GPU programming models in that you can support multithreading and GPU programming from a single programming model. Given trends to combine GPUs and CPUs in a single package, this feature of OpenMP will help it displace other approaches for GPU programming. With luck, OpenMP for GPU programming will dominate the GPU software landscape and remove the confusion surrounding GPU programming.
You can program your GPU with OpenMP. This book will show you how, starting with basic constructs to map loops onto the GPU and then moving to more complex GPU programming with asynchronous computing across multiple streams of kernel executions. It's all here in the latest book to help you master OpenMP: Programming you GPU with OpenMP: Performance Portability for GPUs.
Dr Tom Deakin: Find out more at hpc.tomdeakin.com.
Dr Timothy G. Mattson: Find out more at timmattson.com.
The example programs and figures in the book are available on GitHub.
Versions of those programs and figures in C++ and Fortran will be available soon.
Available soon.
Available soon.
Tim and Tom regularly teach OpenMP for GPUs using a set of tutorial slides available on GitHub.
A set of lecture slides aligned to each chapter in the book will be available soon.
The OpenMP website contains a regularly updated list which details the support for OpenMP in different compilers.
We tried our best, however, it it almost inevitable that minor errors and mistakes crept into the printed text.
double Awrk[Bsize*Bsize]; // Team-local copies of tiles double Bwrk[Bsize*Bsize];