Skip to main content
πŸŽ“ Claude Code Masterclass Learn AI-assisted development on Udemy β€” plus the companion book on Leanpub & Amazon. Start Learning
Architecting Generative AI Applications book by Leonid Kuligin Packt
AI

Book Review: Architecting Generative AI Applications by

A review of Architecting Generative AI Applications (Packt, 2026) covering LLMOps, RAG architectures, agentic systems, and the engineering reality of.

LB
Luca Berton
Β· 3 min read

I have started reading Architecting Generative AI Applications by Leonid Kuligin, published by Packt β€” and it tackles one of the most important shifts in AI today: going from a GenAI prototype to a production-ready application.

πŸ“– Book link: Architecting Generative AI Applications

Why This Book Matters

What I like about the book is the practical angle. It is not just about prompts or demos. It covers the engineering reality behind GenAI systems β€” the gap that most teams struggle with after the initial excitement of a working prototype fades.

This resonates strongly with what I see in enterprise environments: building with LLMs is easy to start, but hard to operationalize well. The real value comes when teams can make GenAI applications reliable, measurable, secure, and scalable β€” not just impressive in a demo.

Key Topics Covered

The book addresses the full lifecycle of production GenAI systems:

Evaluation of LLM Outputs

One of the hardest problems in production AI. How do you know your model’s responses are actually good? The book covers:

  • Automated evaluation frameworks
  • Human-in-the-loop assessment
  • Regression testing for prompt changes
  • Quality metrics beyond simple accuracy

RAG and Agentic Architectures

Retrieval-Augmented Generation has become the standard pattern for grounding LLM outputs in enterprise data. The book goes beyond basic RAG to cover:

  • Advanced retrieval strategies (hybrid search, reranking)
  • Agentic architectures where LLMs orchestrate multi-step workflows
  • Tool use and function calling patterns
  • Context architecture decisions that determine agent accuracy

From DevOps and MLOps to LLMOps

This is where platform engineering meets AI. The transition from traditional MLOps to LLMOps introduces new challenges:

  • Model versioning is different when you are versioning prompts, not weights
  • A/B testing LLM outputs requires different statistical approaches
  • Cost management becomes a first-class concern (token economics)
  • Latency budgets must account for multi-hop agent chains

For teams already practicing platform engineering, LLMOps extends the internal developer platform with AI-specific capabilities.

Kubernetes, Infrastructure as Code, and Deployment

Production GenAI applications need infrastructure that can:

The book covers deployment patterns on Kubernetes, which aligns with real-world enterprise infrastructure where most AI workloads land.

Security, Privacy, and Observability

GenAI introduces unique security challenges:

  • Prompt injection and OWASP Top 10 for LLMs
  • Data leakage through model outputs
  • PII handling in RAG pipelines
  • Audit trails for AI-generated decisions

Observability for LLM applications goes beyond traditional metrics β€” you need to trace the reasoning chain, measure hallucination rates, and monitor cost per interaction.

A/B Testing and Online Experiments

Running experiments on LLM outputs is fundamentally different from testing button colors:

  • Outputs are non-deterministic
  • Quality is subjective and multi-dimensional
  • Statistical significance requires different sample sizes
  • User satisfaction doesn’t always correlate with output accuracy

Who Should Read This

For engineers, architects, platform teams, and technical leaders working on GenAI applications, this is a very relevant resource. Specifically:

  • Platform engineers building internal AI platforms
  • ML engineers transitioning from traditional ML to LLM-based systems
  • Architects designing production GenAI at enterprise scale
  • Engineering managers who need to understand operational complexity

The Production Gap

The book addresses what I call the β€œdemo to production gap” β€” the 90% of engineering effort that happens after you get a prototype working:

PrototypeProduction
Single promptPrompt versioning + evaluation
Local modelMulti-region inference fleet
Manual testingAutomated regression suites
Ignore costsToken budget management
Trust outputsGuardrails + human review
Single userMulti-tenant isolation

This gap is where most GenAI projects fail. Having a structured reference for navigating it is valuable.

If you are working on production AI systems, these related articles cover specific aspects in depth:


Thanks to Packt, Nimisha Dua, and Leonid Kuligin for sharing this release with the community.

πŸ“– Get the book: Architecting Generative AI Applications

Free 30-min AI & Cloud consultation

Book Now