What AI and cloud consulting services does Luca Berton offer?

Luca Berton provides expert consulting in AI/ML platform strategy, multi-tenant GPU orchestration on OpenShift AI, MLOps enablement, cloud infrastructure design, Kubernetes workshops, and Ansible & Python training.

What is Ansible Pilot?

Ansible Pilot is the leading resource for Ansible automation learning, featuring a YouTube channel with 6.1K subscribers and 1M+ views, plus AnsiblePilot.com with 648K total users.

How can I book a consultation with Luca Berton?

Schedule a free consultation through Calendly at calendly.com/lucaberton or visit lucaberton.com/contact.

Kubernetes Audit Logging: Complete Setup and Analysis Guide

Why Audit Logging Matters

Kubernetes audit logs answer four questions for every API request:

Who made the request (user, service account, or anonymous)
What they did (verb: get, create, patch, delete)
When it happened (timestamp with nanosecond precision)
On what resource (pods, secrets, configmaps, etc.)

Without audit logging, you cannot detect:

Unauthorized secret access
Privilege escalation attempts
Accidental or malicious resource deletion
Compliance violations

Audit Policy Structure

The audit policy defines what to log and at what detail level:

apiVersion: audit.k8s.io/v1
kind: Policy
rules:
  # Don't log read-only requests to certain endpoints
  - level: None
    users: ["system:kube-proxy"]
    verbs: ["watch"]
    resources:
      - group: ""
        resources: ["endpoints", "services", "services/status"]

  # Log secret access at Metadata level (who accessed, not the content)
  - level: Metadata
    resources:
      - group: ""
        resources: ["secrets", "configmaps", "tokenreviews"]

  # Log all write operations with request body
  - level: Request
    verbs: ["create", "update", "patch", "delete"]
    resources:
      - group: ""
      - group: "apps"
      - group: "batch"

  # Log everything else at metadata level
  - level: Metadata
    omitStages:
      - RequestReceived

Audit Levels

Level	What Is Logged
`None`	Nothing — skip this rule
`Metadata`	Request metadata (user, timestamp, resource, verb)
`Request`	Metadata + request body
`RequestResponse`	Metadata + request body + response body

Warning: RequestResponse on secrets would log secret values. Use Metadata for sensitive resources.

Enabling Audit Logging

Log Backend (File)

# kube-apiserver manifest
spec:
  containers:
    - command:
        - kube-apiserver
        - --audit-policy-file=/etc/kubernetes/audit-policy.yaml
        - --audit-log-path=/var/log/kubernetes/audit.log
        - --audit-log-maxage=30
        - --audit-log-maxbackup=10
        - --audit-log-maxsize=100
      volumeMounts:
        - name: audit-policy
          mountPath: /etc/kubernetes/audit-policy.yaml
          readOnly: true
        - name: audit-logs
          mountPath: /var/log/kubernetes
  volumes:
    - name: audit-policy
      hostPath:
        path: /etc/kubernetes/audit-policy.yaml
        type: File
    - name: audit-logs
      hostPath:
        path: /var/log/kubernetes
        type: DirectoryOrCreate

Webhook Backend (Real-time)

Send audit events to an external service:

apiVersion: v1
kind: Config
clusters:
  - name: audit-webhook
    cluster:
      server: https://audit-collector.internal:9443/audit
      certificate-authority: /etc/kubernetes/pki/webhook-ca.crt
contexts:
  - name: default
    context:
      cluster: audit-webhook
current-context: default

# kube-apiserver flags
- --audit-webhook-config-file=/etc/kubernetes/audit-webhook.yaml
- --audit-webhook-batch-max-size=10
- --audit-webhook-batch-max-wait=5s

Production Audit Policy

Here is a battle-tested policy that balances security visibility with log volume:

apiVersion: audit.k8s.io/v1
kind: Policy
rules:
  # Skip noisy read-only system requests
  - level: None
    userGroups: ["system:nodes"]
    verbs: ["get", "list", "watch"]
    resources:
      - group: ""
        resources: ["nodes", "nodes/status"]

  - level: None
    users:
      - "system:kube-scheduler"
      - "system:kube-controller-manager"
    verbs: ["get", "list", "watch"]

  # Skip health checks and metrics
  - level: None
    nonResourceURLs:
      - "/healthz*"
      - "/readyz*"
      - "/livez*"
      - "/metrics"

  # CRITICAL: Log all secret access
  - level: Metadata
    resources:
      - group: ""
        resources: ["secrets"]

  # Log RBAC changes with full request body
  - level: RequestResponse
    resources:
      - group: "rbac.authorization.k8s.io"
        resources: ["clusterroles", "clusterrolebindings", "roles", "rolebindings"]

  # Log authentication events
  - level: Metadata
    resources:
      - group: "authentication.k8s.io"
      - group: "authorization.k8s.io"

  # Log all deletions with request body
  - level: Request
    verbs: ["delete", "deletecollection"]

  # Log pod exec and port-forward (high-risk)
  - level: Request
    resources:
      - group: ""
        resources: ["pods/exec", "pods/portforward", "pods/attach"]

  # Log workload mutations
  - level: Request
    verbs: ["create", "update", "patch"]
    resources:
      - group: "apps"
        resources: ["deployments", "daemonsets", "statefulsets"]
      - group: "batch"
        resources: ["jobs", "cronjobs"]

  # Default: metadata only
  - level: Metadata
    omitStages:
      - RequestReceived

Analyzing Audit Logs

Common Queries with jq

# Who accessed secrets in the last hour?
cat /var/log/kubernetes/audit.log | \
  jq -r 'select(.objectRef.resource == "secrets") |
    [.requestReceivedTimestamp, .user.username, .verb, .objectRef.namespace + "/" + .objectRef.name] |
    @tsv'

# Failed authentication attempts
cat /var/log/kubernetes/audit.log | \
  jq -r 'select(.responseStatus.code >= 400 and .responseStatus.code < 500) |
    [.requestReceivedTimestamp, .user.username, .verb, .responseStatus.code, .responseStatus.reason] |
    @tsv'

# All pod exec commands (potential breakglass)
cat /var/log/kubernetes/audit.log | \
  jq -r 'select(.objectRef.subresource == "exec") |
    [.requestReceivedTimestamp, .user.username, .objectRef.namespace + "/" + .objectRef.name] |
    @tsv'

# RBAC changes (privilege escalation detection)
cat /var/log/kubernetes/audit.log | \
  jq -r 'select(.objectRef.apiGroup == "rbac.authorization.k8s.io" and
    (.verb == "create" or .verb == "update" or .verb == "patch")) |
    [.requestReceivedTimestamp, .user.username, .verb, .objectRef.resource, .objectRef.name] |
    @tsv'

Shipping to EFK/Loki

# Fluent Bit config for audit logs
[INPUT]
    Name              tail
    Path              /var/log/kubernetes/audit.log
    Parser            json
    Tag               kube.audit.*
    Refresh_Interval  5

[FILTER]
    Name    modify
    Match   kube.audit.*
    Add     log_type kubernetes_audit

[OUTPUT]
    Name            es
    Match           kube.audit.*
    Host            elasticsearch.logging
    Port            9200
    Index           kube-audit
    Type            _doc

Security Alerting Rules

Prometheus AlertManager (via audit-exporter)

groups:
  - name: kubernetes-audit-security
    rules:
      - alert: SecretAccessByUnknownUser
        expr: |
          count by (user) (
            kube_audit_event_total{
              resource="secrets",
              verb=~"get|list|watch",
              user!~"system:.*|admin"
            }
          ) > 0
        for: 0m
        labels:
          severity: warning
        annotations:
          summary: "Unknown user {{ $labels.user }} accessed secrets"

      - alert: ClusterRoleBindingCreated
        expr: |
          kube_audit_event_total{
            resource="clusterrolebindings",
            verb="create"
          } > 0
        for: 0m
        labels:
          severity: critical
        annotations:
          summary: "New ClusterRoleBinding created — potential privilege escalation"

      - alert: PodExecDetected
        expr: |
          kube_audit_event_total{
            subresource="exec"
          } > 0
        for: 0m
        labels:
          severity: info
        annotations:
          summary: "Pod exec by {{ $labels.user }} in {{ $labels.namespace }}"

Managed Kubernetes Audit Logging

EKS

# Enable via eksctl
eksctl utils update-cluster-logging \
  --cluster my-cluster \
  --enable-types audit \
  --approve

# Logs go to CloudWatch Logs group: /aws/eks/my-cluster/cluster
# Query via CloudWatch Insights:
fields @timestamp, user.username, verb, objectRef.resource, objectRef.name
| filter objectRef.resource = "secrets"
| sort @timestamp desc
| limit 50

GKE

# Admin Activity logs are always on (free)
# Data Access logs (secret reads) must be enabled:
gcloud projects get-iam-policy PROJECT_ID --format=json > policy.json
# Add "DATA_READ" for "k8s.io" service

AKS

# Enable diagnostic settings
az monitor diagnostic-settings create \
  --resource /subscriptions/.../managedClusters/my-cluster \
  --name audit-logs \
  --logs '[{"category":"kube-audit","enabled":true}]' \
  --workspace /subscriptions/.../workspaces/my-workspace

Storage and Retention

Audit logs grow fast. Plan for:

Cluster Size	Daily Volume	30-Day Retention
Small (under 50 nodes)	1-5 GB/day	30-150 GB
Medium (50-200 nodes)	5-20 GB/day	150-600 GB
Large (200+ nodes)	20-100+ GB/day	600 GB - 3 TB

Cost optimization:

Use None level aggressively for known-safe traffic
Set omitStages: ["RequestReceived"] to halve event count
Archive to cold storage after 7 days, keep hot for alerting

Deepen Your Kubernetes Skills

If you found this article useful, check out my books for hands-on Kubernetes mastery:

Kubernetes Recipes — A practical guide for container orchestration and deployment with real-world patterns
Ansible for Kubernetes by Example — Automate Kubernetes cluster operations with Ansible playbooks

Both books follow the same practical, example-driven approach you see in my articles.

Kubernetes Audit Logging: Complete Setup and Analysis Guide

Why Audit Logging Matters

Audit Policy Structure

Audit Levels

Enabling Audit Logging

Log Backend (File)

Webhook Backend (Real-time)

Production Audit Policy

Analyzing Audit Logs

Common Queries with jq

Shipping to EFK/Loki

Security Alerting Rules

Prometheus AlertManager (via audit-exporter)

Managed Kubernetes Audit Logging

EKS

GKE

AKS

Storage and Retention

Deepen Your Kubernetes Skills

Related Articles

CloudNativePG: Production PostgreSQL on Kubernetes

Blue-Green and Canary Deployments on Kubernetes: Complete Implementation Guide

Kubernetes Finalizers Explained: Why Resources Get Stuck Deleting

KEDA on Kubernetes: Event-Driven Autoscaling from Zero to Hero

Why Audit Logging Matters

Audit Policy Structure

Audit Levels

Enabling Audit Logging

Log Backend (File)

Webhook Backend (Real-time)

Production Audit Policy

Analyzing Audit Logs

Common Queries with jq

Shipping to EFK/Loki

Security Alerting Rules

Prometheus AlertManager (via audit-exporter)

Managed Kubernetes Audit Logging

EKS

GKE

AKS

Storage and Retention

Deepen Your Kubernetes Skills

Related Content

Related Articles

CloudNativePG: Production PostgreSQL on Kubernetes

Blue-Green and Canary Deployments on Kubernetes: Complete Implementation Guide

Kubernetes Finalizers Explained: Why Resources Get Stuck Deleting

KEDA on Kubernetes: Event-Driven Autoscaling from Zero to Hero