AI Inference Optimization Engineering: Quantization, Speculative Decoding, and Hardware-Specific LLM Deployment - Brossura

Libro 6 di 11: Production AI Engineering Series

Team, ChatVariety

9798199720021: AI Inference Optimization Engineering: Quantization, Speculative Decoding, and Hardware-Specific LLM Deployment

Brossura

ISBN 13: 9798199720021

Casa editrice: Independently published, 2026

Vedi tutte le copie di questa edizione con ISBN

0 Usato

5 Nuovo

Da: EUR 13,56

Slash LLM Deployment Costs and Latency

Deploying Large Language Models (LLMs) in production is a massive economic and engineering hurdle. AI Inference Optimization Engineering is your comprehensive, hands-on guide to mastering the full stack of modern LLM optimization techniques. From memory-bandwidth solutions to hardware-specific compilation, this book bridges the gap between research-level models and enterprise-grade execution.

What you will master inside this book:

Hardware-Aware Optimization: Dive deep into KV cache mechanics, autoregressive decoding, and GPU memory hierarchies to eliminate latency bottlenecks.
State-of-the-Art Quantization: Apply GPTQ, AWQ, and GGUF compression algorithms to scale down massive neural networks without sacrificing model accuracy.
Advanced Acceleration Methods: Implement speculative decoding with draft models (like Medusa and Eagle), PagedAttention, and FlashAttention to boost throughput by 2-3x.
Production-Grade Serving: Build ultra-low-latency deployment infrastructures using vLLM, Triton Inference Server, and continuous batching.
Cross-Platform Deployment: Optimize models for specific target hardware, including NVIDIA H100 (TensorRT-LLM), Apple Silicon (llama.cpp/Metal), and Qualcomm mobile/edge accelerators.

Whether you are an ML infrastructure engineer, an AI platform architect, or a technical leader looking to scale LLMs cost-effectively, this book provides the production-ready code, equations, and architectural patterns you need to build hyper-efficient AI pipelines.

Le informazioni nella sezione "Riassunto" possono far riferimento a edizioni diverse di questo titolo.

Editore: Independently published
Data di pubblicazione: 2026
Lingua: Inglese
ISBN 13: 9798199720021
Rilegatura: Copertina flessibile
Numero di pagine: 95
Contatto del produttore: Manufactured by Amazon on behalf of the author
https://www.amazon.it/hz/contact-us

c/o Amazon Media EU S.�.r.l., 38 Avenue John F. Kennedy
Luxembourg
L-1855
Lussemburgo

Risultati della ricerca per AI Inference Optimization Engineering: Quantization,...

Foto dell'editore

AI Inference Optimization Engineering: Quantization, Speculative Decoding, and Hardware-Specific LLM Deployment (Production AI Engineering Series)

Team, ChatVariety

Editore: Independently published, 2026

ISBN 13: 9798199720021

Nuovo Brossura

Print on Demand

Da: California Books, Miami, FL, U.S.A.

Valutazione del venditore 4 su 5 stelle

Condizione: New. Print on Demand. Codice articolo I-9798199720021

Contatta il venditore

Compra nuovo

EUR 13,56

Spedizione gratuita
Spedito in U.S.A.

Quantit�: Pi� di 20 disponibili

Aggiungi al carrello

Foto dell'editore

AI Inference Optimization Engineering

Team, Chatvariety

Editore: Independently published, 2026

ISBN 13: 9798199720021

Nuovo PAP

Da: PBShop.store US, Wood Dale, IL, U.S.A.

Valutazione del venditore 5 su 5 stelle

PAP. Condizione: New. New Book. Shipped from UK. Established seller since 2000. Codice articolo L2-9798199720021

Contatta il venditore

Compra nuovo

EUR 14,11

Spedizione gratuita
Spedito in U.S.A.

Quantit�: Pi� di 20 disponibili

Aggiungi al carrello

Foto dell'editore

AI Inference Optimization Engineering

Team, Chatvariety

Editore: Independently published, 2026

ISBN 13: 9798199720021

Nuovo PAP

Da: PBShop.store UK, Fairford, GLOS, Regno Unito

Valutazione del venditore 5 su 5 stelle

PAP. Condizione: New. New Book. Shipped from UK. Established seller since 2000. Codice articolo L2-9798199720021

Contatta il venditore

Compra nuovo

EUR 13,45

Spedizione EUR 3,85
Spedito da Regno Unito a U.S.A.

Quantit�: Pi� di 20 disponibili

Aggiungi al carrello

Foto dell'editore

AI Inference Optimization Engineering (Paperback)

Chatvariety Team

Editore: Independently Published, 2026

ISBN 13: 9798199720021

Nuovo Paperback

Print on Demand

Da: CitiRetail, Stevenage, Regno Unito

Valutazione del venditore 5 su 5 stelle

Paperback. Condizione: new. Paperback. Slash LLM Deployment Costs and LatencyDeploying Large Language Models (LLMs) in production is a massive economic and engineering hurdle. AI Inference Optimization Engineering is your comprehensive, hands-on guide to mastering the full stack of modern LLM optimization techniques. From memory-bandwidth solutions to hardware-specific compilation, this book bridges the gap between research-level models and enterprise-grade execution.What you will master inside this book: Hardware-Aware Optimization: Dive deep into KV cache mechanics, autoregressive decoding, and GPU memory hierarchies to eliminate latency bottlenecks.State-of-the-Art Quantization: Apply GPTQ, AWQ, and GGUF compression algorithms to scale down massive neural networks without sacrificing model accuracy.Advanced Acceleration Methods: Implement speculative decoding with draft models (like Medusa and Eagle), PagedAttention, and FlashAttention to boost throughput by 2-3x.Production-Grade Serving: Build ultra-low-latency deployment infrastructures using vLLM, Triton Inference Server, and continuous batching.Cross-Platform Deployment: Optimize models for specific target hardware, including NVIDIA H100 (TensorRT-LLM), Apple Silicon (llama.cpp/Metal), and Qualcomm mobile/edge accelerators.Whether you are an ML infrastructure engineer, an AI platform architect, or a technical leader looking to scale LLMs cost-effectively, this book provides the production-ready code, equations, and architectural patterns you need to build hyper-efficient AI pipelines. This item is printed on demand. Shipping may be from our UK warehouse or from our Australian or US warehouses, depending on stock availability. Codice articolo 9798199720021

Contatta il venditore

Compra nuovo

EUR 16,88

Spedizione EUR 43,35
Spedito da Regno Unito a U.S.A.

Quantit�: 1 disponibili

Aggiungi al carrello

Foto dell'editore

AI Inference Optimization Engineering : Quantization, Speculative Decoding, and Hardware-Specific LLM Deployment

Chatvariety Team

Editore: Independently Published Jun 2026, 2026

ISBN 13: 9798199720021

Nuovo Taschenbuch

Da: AHA-BUCH GmbH, Einbeck, Germania

Valutazione del venditore 5 su 5 stelle

Taschenbuch. Condizione: Neu. Neuware - Slash LLM Deployment Costs and LatencyDeploying Large Language Models (LLMs) in production is a massive economic and engineering hurdle. AI Inference Optimization Engineering is your comprehensive, hands-on guide to mastering the full stack of modern LLM optimization techniques. From memory-bandwidth solutions to hardware-specific compilation, this book bridges the gap between research-level models and enterprise-grade execution.What you will master inside this book: - Hardware-Aware Optimization: Dive deep into KV cache mechanics, autoregressive decoding, and GPU memory hierarchies to eliminate latency bottlenecks.- State-of-the-Art Quantization: Apply GPTQ, AWQ, and GGUF compression algorithms to scale down massive neural networks without sacrificing model accuracy.- Advanced Acceleration Methods: Implement speculative decoding with draft models (like Medusa and Eagle), PagedAttention, and FlashAttention to boost throughput by 2-3x.- Production-Grade Serving: Build ultra-low-latency deployment infrastructures using vLLM, Triton Inference Server, and continuous batching.- Cross-Platform Deployment: Optimize models for specific target hardware, including NVIDIA H100 (TensorRT-LLM), Apple Silicon (llama.cpp/Metal), and Qualcomm mobile/edge accelerators.Whether you are an ML infrastructure engineer, an AI platform architect, or a technical leader looking to scale LLMs cost-effectively, this book provides the production-ready code, equations, and architectural patterns you need to build hyper-efficient AI pipelines. Codice articolo 9798199720021

Contatta il venditore

Compra nuovo

EUR 13,00

Spedizione EUR 60,71
Spedito da Germania a U.S.A.

Quantit�: 2 disponibili

Aggiungi al carrello

AI Inference Optimization Engineering: Quantization, Speculative Decoding, and Hardware-Specific LLM Deployment - Brossura

Team, ChatVariety

Sinossi

Risultati della ricerca per AI Inference Optimization Engineering: Quantization,...

AI Inference Optimization Engineering: Quantization, Speculative Decoding, and Hardware-Specific LLM Deployment (Production AI Engineering Series)

Compra nuovo

AI Inference Optimization Engineering

Compra nuovo

AI Inference Optimization Engineering

Compra nuovo

AI Inference Optimization Engineering (Paperback)

Compra nuovo

AI Inference Optimization Engineering : Quantization, Speculative Decoding, and Hardware-Specific LLM Deployment

Compra nuovo