Why Agent Security Is Different
Traditional application security assumes software does what the developer programmed. AI agents donβt β they make autonomous decisions, invoke tools, and generate code at runtime. This fundamentally changes your threat model.
After securing several enterprise agent deployments, here are the patterns that work.
The Agent Threat Model
AI agents introduce unique attack vectors:
- Prompt injection: Malicious input tricks the agent into unauthorized actions
- Tool abuse: The agent uses legitimate tools in unintended ways
- Data exfiltration: The agent accesses sensitive data and leaks it through outputs
- Privilege escalation: The agent chains tools to gain access beyond its intended scope
- Resource exhaustion: Runaway agents consume unbounded compute or API credits
Sandboxing Architecture
Container-Level Isolation
Every agent tool execution should run in an isolated container:
apiVersion: v1
kind: Pod
metadata:
name: agent-sandbox
annotations:
container.apparmor.security.beta.kubernetes.io/sandbox: runtime/default
spec:
securityContext:
runAsNonRoot: true
runAsUser: 65534
fsGroup: 65534
seccompProfile:
type: RuntimeDefault
containers:
- name: sandbox
image: registry.internal/agent-sandbox:latest
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
resources:
limits:
memory: "256Mi"
cpu: "500m"
ephemeral-storage: "100Mi"
volumeMounts:
- name: tmp
mountPath: /tmp
volumes:
- name: tmp
emptyDir:
sizeLimit: 50MiNetwork Isolation
Restrict what the sandbox can reach:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: agent-sandbox-policy
spec:
podSelector:
matchLabels:
role: agent-sandbox
policyTypes:
- Egress
- Ingress
egress:
- to:
- namespaceSelector:
matchLabels:
name: agent-tools
ports:
- port: 443
protocol: TCP
ingress: [] # No inbound trafficPermission Models
Least Privilege Tool Access
Define explicit tool permissions per agent role:
AGENT_PERMISSIONS = {
"infrastructure_agent": {
"allowed_tools": ["kubectl_get", "ansible_check", "prometheus_query"],
"denied_tools": ["kubectl_delete", "ansible_run", "shell_exec"],
"resource_limits": {
"max_api_calls": 50,
"max_tokens": 100000,
"timeout_seconds": 300,
},
"data_access": {
"namespaces": ["monitoring", "app-staging"],
"secrets": False,
"configmaps": True,
}
},
"data_agent": {
"allowed_tools": ["sql_read", "api_get"],
"denied_tools": ["sql_write", "sql_drop", "api_post"],
"resource_limits": {
"max_api_calls": 100,
"max_tokens": 50000,
"timeout_seconds": 120,
},
}
}
def enforce_permissions(agent_role: str, tool: str, params: dict) -> bool:
perms = AGENT_PERMISSIONS.get(agent_role)
if not perms:
return False
if tool in perms.get("denied_tools", []):
audit_log.warning(f"Denied: {agent_role} attempted {tool}")
return False
if tool not in perms.get("allowed_tools", []):
audit_log.warning(f"Denied: {agent_role} not authorized for {tool}")
return False
return TrueAction Classification
Classify every tool action by risk level:
class RiskLevel(Enum):
READ = "read" # No side effects
MODIFY = "modify" # Reversible changes
DELETE = "delete" # Destructive
EXTERNAL = "external" # Leaves the system boundary
TOOL_RISK = {
"kubectl_get": RiskLevel.READ,
"kubectl_apply": RiskLevel.MODIFY,
"kubectl_delete": RiskLevel.DELETE,
"send_email": RiskLevel.EXTERNAL,
"ansible_check": RiskLevel.READ,
"ansible_run": RiskLevel.MODIFY,
}
async def execute_with_approval(tool: str, params: dict, agent: str):
risk = TOOL_RISK.get(tool, RiskLevel.DELETE) # Default to highest risk
if risk == RiskLevel.READ:
return await execute(tool, params)
elif risk == RiskLevel.MODIFY:
await log_action(agent, tool, params)
return await execute(tool, params)
elif risk in (RiskLevel.DELETE, RiskLevel.EXTERNAL):
approval = await request_human_approval(agent, tool, params)
if approval:
return await execute(tool, params)
raise PermissionDenied(f"Human denied {tool}")Audit Logging
Log every agent decision and action immutably:
@dataclass
class AgentAuditEvent:
timestamp: datetime
agent_id: str
session_id: str
event_type: str # "decision", "tool_call", "tool_result", "error"
tool: str | None
params: dict | None
result: str | None
tokens_used: int
risk_level: str
approval_status: str | None
async def audit_log(event: AgentAuditEvent):
# Write to immutable append-only log
await audit_store.append(event)
# Real-time alerting for high-risk actions
if event.risk_level in ("delete", "external"):
await alert_security_team(event)Prompt Injection Defense
Input Sanitization
def sanitize_agent_input(user_input: str) -> str:
# Remove common injection patterns
dangerous_patterns = [
r"ignore previous instructions",
r"you are now",
r"system prompt",
r"</?(system|assistant|user)>",
r"ADMIN MODE",
]
sanitized = user_input
for pattern in dangerous_patterns:
if re.search(pattern, sanitized, re.IGNORECASE):
audit_log.warning(f"Potential injection detected: {pattern}")
sanitized = re.sub(pattern, "[FILTERED]", sanitized, flags=re.IGNORECASE)
return sanitizedOutput Validation
Always validate agent outputs before executing:
async def validate_tool_call(tool: str, params: dict) -> bool:
# Check params against schema
schema = TOOL_SCHEMAS[tool]
try:
schema.validate(params)
except ValidationError:
return False
# Check for suspicious patterns
for value in flatten_dict(params).values():
if isinstance(value, str):
if any(cmd in value for cmd in ["rm -rf", "DROP TABLE", "; curl"]):
audit_log.critical(f"Suspicious param value: {value}")
return False
return TrueKey Takeaways
- Assume the agent will misbehave β design for containment, not prevention
- Sandbox everything β separate containers, network policies, resource limits
- Least privilege always β explicit allow-lists, never deny-lists
- Log immutably β every decision, every action, every result
- Classify risk levels β automated for reads, human approval for destructive actions
- Defense in depth β input sanitization + output validation + runtime monitoring
Security isnβt optional for agent deployments. Itβs the foundation everything else builds on.
Deploying AI agents in a security-sensitive environment? I help organizations build secure agent architectures. Letβs talk.
