Envío gratis a partir de 69,99 euros.

«Comprobar estado del pedido»

Entre a formar parte de una comunidad de amantes de los libros del mundo entero y acceda a un sinfín de ventajas. Crear una cuenta gratis

Envío gratuito con Zásilkovna para compras superiores a 59.99 €

Mensajería SEUR 4.99 € Mensajería GLS 7.99 € Mensajería Correos 5.49 € Mensajería DHL 5.49 € Punto SEUR 3.99 €

Contacto

Cómo comprar

Ayuda

Mi cuenta

▸ Vacío :-(

Envío gratis a partir de 69,99 euros.

AI Inference Optimization Engineering

Name: AI Inference Optimization Engineering
Brand: Independently published
SKU: 52770465
Price: 12.09 EUR
Availability: InStock
Author: ChatVariety Team
ISBN: 9798199720021

Quantization, Speculative Decoding, and Hardware-Specific LLM Deployment

ChatVariety Team

Idioma

Inglés

Libro Tapa blanda

Código Libristo: 52770465

Editores Independently published, junio 2026

Slash LLM Deployment Costs and LatencyDeploying Large Language Models (LLMs) in production is a mass... Descripción completa

Código Libristo: 52770465

30 b

Próximamente Nuevo

Nuevo

12.09 €

Reaprovisionamiento previsto Lanzamiento 07. 06. 2026

Política de devolución de 30 días

Slash LLM Deployment Costs and Latency

Deploying Large Language Models (LLMs) in production is a massive economic and engineering hurdle. AI Inference Optimization Engineering is your comprehensive, hands-on guide to mastering the full stack of modern LLM optimization techniques. From memory-bandwidth solutions to hardware-specific compilation, this book bridges the gap between research-level models and enterprise-grade execution.

What you will master inside this book:

Hardware-Aware Optimization: Dive deep into KV cache mechanics, autoregressive decoding, and GPU memory hierarchies to eliminate latency bottlenecks.
State-of-the-Art Quantization: Apply GPTQ, AWQ, and GGUF compression algorithms to scale down massive neural networks without sacrificing model accuracy.
Advanced Acceleration Methods: Implement speculative decoding with draft models (like Medusa and Eagle), PagedAttention, and FlashAttention to boost throughput by 2-3x.
Production-Grade Serving: Build ultra-low-latency deployment infrastructures using vLLM, Triton Inference Server, and continuous batching.
Cross-Platform Deployment: Optimize models for specific target hardware, including NVIDIA H100 (TensorRT-LLM), Apple Silicon (llama.cpp/Metal), and Qualcomm mobile/edge accelerators.

Whether you are an ML infrastructure engineer, an AI platform architect, or a technical leader looking to scale LLMs cost-effectively, this book provides the production-ready code, equations, and architectural patterns you need to build hyper-efficient AI pipelines.

Actriz & Políglota

EWA KASP para

Visualizar el vídeo

Libristo tiene la oferta más extensa de literatura en idiomas extranjeros. Por eso compran aquí sus libros.

Sobre el libro

Nombre y apellidos AI Inference Optimization Engineering

Autor ChatVariety Team

Idioma

Inglés

Encuadernación Libro - Tapa blanda

Fecha de publicación 2026

Número de páginas 96

EAN 9798199720021

Código Libristo 52770465

Editores Independently published

Peso 142

Dimensiones 152 x 229 x 5

Búsquedas frecuentes

Categories

Authors

Publishers

Búsquedas frecuentes

Artículos

Categories

Authors

Publishers

Entrega

Guía de compras

AI Inference Optimization Engineering

Quantization, Speculative Decoding, and Hardware-Specific LLM Deployment

Sobre el libro

Categorías

Regale este libro hoy

Es fácil

Búsquedas frecuentes

Categories

Authors

Publishers

AI Inference Optimization Engineering

Quantization, Speculative Decoding, and Hardware-Specific LLM Deployment

Sobre el libro

Categorías

Regale este libro hoy

Es fácil

¿No tiene cuenta? Descubra las ventajas de tener una cuenta Libristo.