Your kernel runs. It's just not fast enough.
It compiles. It produces correct results. But it's running at a fraction of what the hardware can deliver — and every rule of thumb you've tried hasn't closed the gap.
This book does.
AI-Optimized GPU Computing is the resource that takes you past documentation and surface-level advice, into the physical mechanisms that actually govern GPU performance. You'll develop the mental model to read a profiler trace and know — before touching a line of code — exactly what is wrong and what will fix it.
Then you'll go further: building learning systems that search optimization spaces no engineer can navigate manually, adapting to the production inputs your hand-tuned configurations never anticipated.
You'll learn why occupancy optimization sometimes destroys performance. Why your coalesced kernel is still bandwidth-limited. What Tensor Cores silently do wrong when alignment is off. Why the kernel that hits 91% in benchmarks delivers 54% in production.
The explanations are precise. The code is real. The failure cases come from production systems, not textbook examples. Every concept is grounded in how the hardware actually behaves — which means the knowledge transfers when architectures change, workloads shift, and the rules everyone else is following stop working.
This is not a book of rules. It is a book of mechanisms — written for engineers who are done guessing and ready to understand.
The hardware is capable of far more than your configurations are currently extracting. This book shows you exactly how to close that gap.
Le informazioni nella sezione "Riassunto" possono far riferimento a edizioni diverse di questo titolo.
Da: California Books, Miami, FL, U.S.A.
Condizione: New. Print on Demand. Codice articolo I-9798259219373
Quantità: Più di 20 disponibili