Skip to main content
πŸŽ“ Claude Code Masterclass Learn AI-assisted development on Udemy β€” plus the companion book on Leanpub & Amazon. Start Learning
Iris Dyrmishi presenting Observability for Engineers at Miro Amsterdam office β€” SRE NL meetup
AI

Who's Watching the Agents? AI Observability Meetup at

SRE NL and Coralogix hosted 'Who's Watching the Agents?' at Miro's Amsterdam office β€” covering AI-powered observability, MCP servers for telemetry, custom.

LB
Luca Berton
Β· 13 min read

On May 21, 2026, SRE NL and Coralogix joined forces for β€œWho’s Watching the Agents?” β€” a community event hosted at Miro’s Amsterdam office exploring what happens when AI becomes your engineering partner, and who is responsible for observing the observers.

Iris Dyrmishi presenting "Empowering teams to solve their Observability problems using AI" to a packed room at Miro Amsterdam

Observability for Engineers: When AI Becomes Your Partner

Iris Dyrmishi β€” Senior Observability Engineer at Miro, CNCF Ambassador, KCD Porto Organizer, and Cloud Native Porto Organizer β€” kicked off the evening with a talk that reframed the entire AI-in-observability narrative.

Iris Dyrmishi introduction slide β€” Senior Observability Engineer at Miro, CNCF Ambassador

Observability is a Human Experience

Iris opened with a powerful perspective: β€œWe build systems, they gave us signals, but understanding still comes from people.”

Observability is a Human Experience β€” we build systems, they gave us signals, but understanding still comes from people

This is the core tension. With tools like OpenTelemetry, collecting logs, metrics, and traces is no longer the hard part. In practice, this is where things get messy: telemetry keeps growing, dashboards get stale, costs go up, and alerts do not really help you understand what is going on. The problem is not more data β€” it is making sense of it.

In most cases, understanding what is happening still takes too much effort: jumping between tools, writing queries (often in different languages), and manually connecting the dots.

LLMs Are Good, But We Are Better

LLMs are good, but we are better β€” collaborative AI illustration showing humans and AI working together

Iris made a clear distinction: AI is a partner for engineers working with observability, helping navigate telemetry data, correlate signals, and get to answers faster. Not magic β€” just something practical that fits into day-to-day work.

Empowering Teams to Solve Observability Problems Using AI

The practical framework Iris presented is refreshingly simple β€” β€œWhat do we need? Let’s start small”:

Empowering teams to solve Observability problems using AI β€” 3 pillars: MCP servers, LLM, Skills

  1. MCP servers β€” Vendor, Open Source (BYO), VictoriaMetrics, Grafana, etc.
  2. LLM of your choice β€” bring whatever model works for your organization
  3. SKILLS, Custom GPTs, Gems, etc. β€” tailored AI capabilities for your specific problems

This layered approach means teams can start with existing infrastructure (their MCP-compatible observability backends), connect any LLM, and build domain-specific skills on top β€” no rip-and-replace required.

Creating Custom Skills for Every Problem

Creating Custom Skills β€” Engineer asks "I need to create an alert", AI responds considering PromQL best practices, MCP validation, and internal guidelines

The goal: go beyond the LLM general knowledge, build skills based on internal organizational guidelines.

The example was perfect β€” an engineer says β€œI need to create an alert” and the AI Assistant responds: β€œLet me do that for you, considering PromQL best practices, validating the query using the MCP, and opening the PR as per internal guidelines.”

This is the difference between a generic chatbot and an AI that actually knows your stack: it validates queries against your MCP server, applies your team’s PromQL conventions, and follows your PR workflow β€” all in one interaction.

Repeat for Every Challenge

Repeat for every challenge β€” 7 reusable observability skill categories

The framework scales across the entire observability lifecycle. Iris showed 7 skill categories that teams can build once and reuse:

  • Dashboards β€” guides users through end-to-end dashboard creation, ensuring observability standards are followed
  • Alerts β€” covers alert creation and query troubleshooting, embedding best practices for reliable alert conditions
  • Finding Your Data β€” helps teams efficiently locate telemetry data for their service or team
  • Root Cause Investigation β€” teaches engineers to correlate data beyond the application layer using patterns from previous post-mortems
  • Tool Awareness β€” a living skill covering everything in the O11y toolset, kept in sync with documentation
  • Visualization β€” focused on creating, exploring, and editing panels and visualizations effectively
  • Observability from Scratch β€” for teams wanting to improve observability but do not know where to start

Each skill encodes organizational knowledge that would otherwise live in wikis nobody reads or in the heads of senior engineers who leave.

Headless Observability

Headless Observability β€” visualization layer decoupled from the backend, MCP superhero

Iris introduced the concept of Headless Observability β€” the idea that observability is changing thanks to AI models, and the visualization layer is now decoupled from the backend:

  1. Unified storage for telemetry data is no longer a necessity β€” which will change the scope for observability vendors
  2. Users are able to β€œquery” multiple streams of data using natural language β€” no more PromQL expertise required for every team member
  3. It is our job to facilitate the transition of our teams to this new model β€” to make sure it is easy, accessible for every one, and as cost-effective as possible

The MCP (Model Context Protocol) superhero illustration drove the point home: MCP servers become the bridge between your observability backends and AI models, enabling natural language access to telemetry without requiring a single unified data store.

The Basics Still Matter

The basics of Observability β€” a lot is changing, but the basic principles are still the same

Even with AI transforming observability, Iris grounded the talk with a reminder: β€œA lot is changing, but the basic principles are still the same”:

  • Data quality is still important β€” garbage out means high-quality telemetry data remains essential for meaningful insights
  • Safe and reliable transport β€” data must move from source to destination without loss, corruption, or unauthorized access
  • Good instrumentation β€” still critical, but modern frameworks and auto-instrumentation make it easier than ever
  • Scalable platform β€” a reliable, highly available observability platform remains a non-negotiable priority

The message: AI augments observability, it does not replace its foundations.

A Useful Framework: Observability Team vs Developers

A useful framework β€” Observability Team vs Devs arm wrestling, collaboration over competition

Iris closed with a framework for making observability enablement work in practice:

  • Listen to your engineers β€” nobody knows their struggles better than them; always rely on your observability friends and champions
  • Understand the specific needs of your organization β€” organizations have a lot in common, but the details matter
  • Enablement is a never-ending process β€” requires constant iteration through several cycles
  • Always keep up with the industry standards β€” the ecosystem moves fast

The arm-wrestling illustration between β€œObservability Team” and β€œDevs” perfectly captured the dynamic: the goal is not for one side to win, but for both to work together toward better outcomes.

Who’s Watching the Agents? Observability for AI-Assisted Development

Thank You β€” Iris Dyrmishi and Lewis Isaac closing the talks to a packed room at Miro Amsterdam

Lewis Isaac β€” Developer Advocate at Coralogix (previously NS&I, IBM, BP) β€” presented the second talk, addressing a critical blind spot: code agents are reshaping how engineering teams write, review, and ship software, but most organizations adopting these tools are flying blind.

Lewis Isaac β€” Developer Advocate at Coralogix, previously NS&I, IBM, BP

Chapter One: My First PR Review

Lewis opened with a story every engineer knows β€” the first pull request review. The rule his mentor taught him: β€œNever submit code you cannot explain.”

My First PR Review β€” "Never submit code you cannot explain" with retry helper code example

The example showed a session retry helper function β€” simple, readable, explainable. But what happens when AI generates code for you? Can you still explain every line? This sets up the central tension of the talk.

A Familiar Story: Observability Always Follows Complexity

A Familiar Story β€” distributed systems needed tracing, dynamic infra needed metrics, UX needed RUM

Lewis drew a historical parallel:

  • The moment systems became distributed β†’ we needed distributed tracing
  • The moment infrastructure became dynamic β†’ we needed better metrics and logs
  • The moment user experience became critical β†’ we needed RUM

The pattern is clear: every time complexity increases, observability must follow. AI agents are the next complexity frontier.

The Observability Gap

The observability gap β€” we see production in detail, we barely see the agent that helped build it

The core insight: β€œWe see production in detail. We barely see the agent that helped build it.”

What we already observe in Production (signal density: HIGH):

  • APIs, Infrastructure, RUM
  • Databases, CI/CD
  • Logs + Metrics + Traces

What we barely observe about Agents (signal density: LOW):

  • Tokens, Files touched, Tool calls, Regressions
  • Cost, Commands run, PR Impact, CI Impact

Most teams have no shared view yet of what their AI agents are actually doing.

The Constraint Has Moved

The engine is bigger, the brakes and dashboard have not changed β€” bottleneck shifted from writing to reviewing, testing, shipping

Lewis’s most powerful slide: β€œThe engine is bigger. The brakes and dashboard haven’t changed.”

Before AI: Writing code was the bottleneck β†’ Review β†’ Testing β†’ CI/CD β†’ Deploy β†’ Stability

After AI-assisted development: Writing shrinks dramatically, but the pressure shifts downstream to Review, Testing, CI/CD, Deploy, and Stability.

β€œAI has not removed the fundamentals of software delivery β€” it has moved the bottleneck from authoring to reviewing, testing, shipping, and stability.”

Five Questions Every Team Needs to Answer

If you cannot answer these, you do not have agent observability β€” 5 questions on Cost, Quality, Flow, Security, Outcome

Lewis distilled agent observability into five essential questions: β€œIf you can’t answer these, you don’t have agent observability.”

  1. Cost β€” How much are we spending? (Tokens, models, sessions, repos, teams β€” and the spend trajectory)
  2. Quality β€” Are agents introducing defects? (Reverts, regressions, CI failures, post-merge incidents)
  3. Flow β€” Are PRs slower to review? (PR size, review latency, churn, hand-offs β€” pre-merge throughput)
  4. Security β€” Are agents touching sensitive files? (Secrets, credentials, infra config, risky commands, policy violations)
  5. Outcome β€” Is AI improving delivery β€” or just activity? (The one question that ties cost, quality, flow, and security together)

The Real Question

Are developers faster? Can the engineering system safely absorb the work that agents now produce?

Lewis reframed the productivity debate: the question is not β€œAre developers faster?” β€” it is β€œCan the engineering system safely absorb the work that agents now produce?”

This shift from individual speed to system capacity is what separates teams that successfully adopt AI from those that accumulate technical debt faster than before.

Different Agents, One Observable Surface Area

Different agents, one observable surface area β€” Claude Code, Gemini CLI, Codex CLI all share common telemetry surfaces

Lewis showed that regardless of which agent you use β€” Claude Code, Gemini CLI, Codex CLI, or custom agents β€” the observable surfaces are shared: sessions, LLM calls (model, tokens, latency, cost), tool calls (framework, pass/fail, duration), file reads/writes (paths, diffs, scopes), PR creation (files, lines, scope, owners), terminal commands (exit codes, stderr, stdin), and CI/CD outcomes (build, deploy, rollback).

β€œThe more autonomous the agent, the more critical observability becomes.”

OpenTelemetry: The Same Pattern That Solved Distributed Systems

We solved distributed systems with shared telemetry β€” same pattern for AI. Sources via OTLP to OpenTelemetry to Coralogix

The architecture is elegant: β€œWe solved distributed systems with shared telemetry. Same pattern for AI.”

Sources β†’ OTLP β†’ OpenTelemetry β†’ Destination:

  • Claude Code (native OTel), Gemini CLI (native OTel), Codex CLI (native OTel), Custom agents (SDK/wrapper)
  • OpenTelemetry: vendor-neutral protocol carrying traces, metrics, logs across the agent surface (traces, metrics, logs, gen_ai)
  • Coralogix: Code Agents Observability + AI CLI β€” unified across cost, quality, security, DX, delivery (sessions and tool-call traces, token and cost metrics, audit logs and policy alerts)

Shared instrumentation removes vendor lock-in β€” and lets one backend correlate every agent in your fleet.

Introducing cx-cli: Production Context for AI Agents

cx-cli β€” production context for AI agents, query Coralogix from the terminal

Lewis closed with a product announcement: cx-cli β€” a CLI tool that gives AI agents direct access to production context:

  • Query Coralogix from the terminal: logs + spans + metrics + more
  • Logs and spans via DataPrime, metrics via PromQL
  • Dashboards, alerts, SLOs, incidents β€” all reachable
  • Agent-friendly output (-o) avoids context flooding
  • Bundled skills for Claude Code, Cursor, Codex, OpenCode and 40+ agents
  • cx schema β€” machine-readable command tree for self-discovery

This closes the loop: first observe what agents do, then empower them with the production data they need to make better decisions.

Live Demo: cx-cli in Action

Lewis Isaac live demo β€” cx-cli command tree showing alerts, incidents, notifications, parsing rules, enrichments, SLOs, quotas, integrations

Lewis showed a live demo with the cx-cli command tree β€” the full Coralogix API surface exposed as a CLI: alerts, incidents, notifications, webhooks, parsing-rules, enrichments, SLOs, quotas, archive, integrations, access management, profiles, and cleanup. He then demonstrated querying real Kubernetes pod logs with structured JSON output showing request IDs, session contexts, deployment metadata, and error messages.

Live cx-cli JSON output β€” structured Kubernetes pod logs with request IDs, session context, and error messages

Six Things to Take Back to Your Team

Six things to take back to your team β€” key takeaways from Lewis Isaac's talk

Lewis closed with six actionable takeaways:

  1. Code agents are part of the software delivery system
  2. AI has moved the constraint, not removed it
  3. More code is not the same as better delivery
  4. OTel gives a vendor-neutral way to observe agent workflows
  5. Code Agents Observability shows what agents are doing
  6. cx-cli gives agents and engineers production context for evidence-based investigation

Agent Sessions as Traces

A session is a trace, every action is a span β€” Coralogix waterfall showing codex_cli.rs spans

β€œA session is a trace. Every action is a span.” Lewis showed a live Coralogix trace waterfall from a Codex CLI session β€” each agent action (user_prompt, open_github_issue, create_sql_request) appears as a span with duration, allowing you to see exactly what the agent did, how long each step took, and where failures occurred.

Three Ways to Instrument an Agent

Three ways to instrument an agent β€” wrapper, SDK/API, and native OTel (increasingly built-in)

Lewis presented the instrumentation spectrum:

  • Path 1: Wrapper instrumentation β€” shim the agent CLI from outside, capture stdin/stdout, exit codes, file changes. Works with any agent, no code changes, but low fidelity (no token cost) and brittle across versions
  • Path 2: SDK/API instrumentation β€” custom agents call OTel SDK directly with full control. Rich custom attributes, but tied to agent internals, requires building and maintaining, schema drift across teams
  • Path 3: Native OTel from the agent β€” Claude Code, Gemini CLI, Codex CLI emit OTLP. Zero code, full fidelity, tokens/cost/tools/files maintained by the agent, vendor-neutral OTLP

The trend is clear: native OTel is winning. export OTEL_EXPORTER_OTLP_... and you are done.

Logs as the Audit Trail

Every decision the agent made β€” queryable, after the fact. Audit trail with Coralogix log explorer

β€œEvery decision the agent made β€” queryable, after the fact.” What gets logged:

  • Agent decisions
  • Tool calls
  • File mutations
  • Commands executed
  • Errors and timeouts
  • Context overflows
  • Sensitive file access
  • Policy violations

The Coralogix log explorer showed real agent audit data β€” complete with bar charts, log detail panels, and structured JSON payloads. This is the forensic layer that answers β€œwhat did the agent do and why?” when something goes wrong.

Two Halves of the Loop

Observe the agent, then give it production context β€” observing agents plus empowering agents

β€œObserve the agent. Then give it production context.” Lewis presented the full vision as two halves:

Half one β€” Observing agents (visibility into what they are doing on your behalf): Sessions, Tokens, Cost, Tool calls, File changes, PRs, CI Impact, Policy events

Half two β€” Empowering agents (governed access to the same production data engineers use): Logs, Metrics, Traces, RUM, Alerts, Dashboards, Incidents, SLOs

β€œObserve the agent β€” then give it governed access to production context.”

Schema Governance with OTel Weaver

Consistent telemetry is an enabling constraint β€” OTel Weaver schema gate normalizes attribute drift

β€œConsistent telemetry is an enabling constraint.” The problem: without governance, every agent uses different attribute names:

  • Claude Code: gen_ai.usage.input_tokens / gen_ai.usage.output_tokens
  • Gemini CLI: llm.tokens_in / input_tokens / llm.tokens_out / output_tokens
  • Codex CLI: prompt_tokens / input_tokens / completion_tokens / output_tokens

The solution: an OTel Weaver Schema gate that validates each attribute against the gen_ai semantic conventions before it ever reaches a dashboard β€” normalizing to a single form: gen_ai.system = "claude" | "google" | "openai" and gen_ai.request.model = "claude-3.5-sonnet".

Packed room watching Lewis Isaac present the 5 questions of agent observability

The key problems he identified:

  • No visibility into token consumption and cost β€” teams cannot budget or forecast AI spend
  • No tracking of which files are being touched by AI agents β€” security and compliance gaps
  • No measurement of how long tasks actually take β€” impossible to prove ROI
  • No detection of whether agent activity is introducing regressions into CI/CD pipelines

As organizations scale AI-assisted development from individual experiments to team-wide adoption, observability of the AI layer itself becomes critical infrastructure.

Panel Discussion

The evening concluded with a panel featuring:

  • Lewis Isaac (Coralogix) β€” AI observability and cost visibility
  • Iris Dyrmishi (Miro) β€” practical AI-powered observability workflows
  • Wilco Burggraaf (Hightech Innovators) β€” innovation and engineering leadership
  • Ehsan Khodadadi (ING) β€” enterprise-scale observability challenges

The discussion explored the intersection of traditional SRE practices with the new reality of AI agents in production β€” from monitoring LLM-powered features to ensuring AI-assisted code does not silently degrade system reliability.

CNCF Merge Forward

CNCF Merge Forward announcement β€” Building a stronger open source future together

The event also featured a community announcement about CNCF Merge Forward β€” a new initiative building a stronger open source future together. The community can join via #merge-forward on Slack and at community.cncf.io/merge-forward.

Networking

Luca Berton with Wilco Burggraaf at the Miro Amsterdam office during the SRE NL meetup β€” coffee bar and catering area in background

Great catching up with Wilco Burggraaf β€” the event wrapped up with drinks, pizza, and conversations at Miro’s impressive Amsterdam office.

Key Takeaways

  1. Observability is fundamentally human β€” AI helps navigate signals faster, but understanding still requires people
  2. Start small with MCP + LLM + Skills β€” you do not need to replace your stack, just augment it
  3. Custom skills beat generic AI β€” encoding organizational knowledge (PromQL conventions, PR workflows, internal guidelines) is where real value lives
  4. AI agents need observability too β€” token costs, file access patterns, and CI/CD regression detection are the new observability requirements
  5. The SRE role is evolving β€” from monitoring systems to monitoring both systems AND the AI that helps you monitor systems

Event Details

  • Event: Who’s Watching the Agents?
  • Organizers: SRE NL + Coralogix
  • Venue: Miro Amsterdam office
  • Date: May 21, 2026
  • Sponsors: Coralogix, NVIDIA, Palo Alto Networks, monday.com, Imperva, Trademarks, Adobe, Davos

Free 30-min AI & Cloud consultation

Book Now