Agentic Runbooks for Kubernetes: Tools and Patterns (2026)
Kubernetes is the most common target for agentic runbooks in 2026. High incident frequency, well-known failure modes, and strong RBAC primitives make it ideal for agent automation. Here is the complete guide.
Why Kubernetes is the primary agentic runbook target
Three properties make Kubernetes ideal for agentic automation. First, K8s failure modes are highly recognisable: CrashLoopBackOff, OOMKilled, ImagePullBackOff, and PodPending are patterns any agent can be trained to identify reliably. Second, Kubernetes provides a well-structured API for both read (describe, logs) and write (rollout restart, scale) operations, making it straightforward to define tool_scope boundaries. Third, RBAC is baked in: you can scope an agent's kubectl access to a specific namespace and a specific set of verbs, providing a strong security boundary.
The 10 most automated K8s incident patterns
CrashLoopBackOff pod
OOMKilled pod
ImagePullBackOff
PodPending (scheduling failure)
HPA scaling anomaly
Stuck deployment rollout
Node NotReady
Certificate expiry
PVC full
Service mesh connectivity loss
Tool deep-dive: K8s-native agentic runbook vendors
Komodor Klaudia
95% accuracy, Kubernetes-specialist
Klaudia is trained on thousands of production Kubernetes environments. It has deep context awareness of K8s object relationships: it knows that a failing deployment affects a service, which affects an ingress, which affects user traffic. This topology knowledge is its primary differentiator. The 95% accuracy claim is on Kubernetes failure patterns specifically.
Shoreline.io
120+ pre-built K8s notebooks, 75% MTTR claim
Shoreline Notebooks are interactive runbooks that can be automated. The 120+ pre-built notebooks covering common K8s incidents are the primary value proposition: teams can start automating without writing runbooks from scratch. The Shoreline Language (Op-spec) DSL is purpose-built for incident remediation.
Kubiya
Meta-agent orchestration, deterministic execution
Kubiya's architecture treats K8s, Terraform, and CI/CD as first-class tool domains. The meta-agent pattern (one orchestrator, multiple specialist agents) is well-suited to platform engineering teams managing complex, multi-tool environments. The deterministic execution guarantee (structured tool calls, not free-form LLM output) is important for compliance.
OpenSRE (Tracer-Cloud)
Open source AI SRE toolkit
OpenSRE is the open-source reference implementation for Kubernetes AI SRE agents. It provides the LangGraph patterns for K8s incident agents, vector database integration for postmortem retrieval, and integration with Prometheus and PagerDuty. The GitHub Tracer-Cloud/opensre repository is the practical starting point for teams building vs buying.
RBAC and security: scoping an agent's kubectl access
An agent with unrestricted kubectl access can delete anything in any namespace. This is never the right configuration. Scope agent access to the minimum required permissions using Kubernetes RBAC.
# Minimal RBAC for a CrashLoop remediation agent
# Scoped to the 'production' namespace only
apiVersion: v1
kind: ServiceAccount
metadata:
name: sre-agent
namespace: production
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: sre-agent-role
namespace: production # Namespace-scoped, not ClusterRole
rules:
# Read-only: always safe for auto_approve
- apiGroups: [""]
resources: ["pods", "pods/log", "events"]
verbs: ["get", "list", "watch"]
- apiGroups: ["apps"]
resources: ["deployments", "replicasets"]
verbs: ["get", "list", "watch"]
# Write: require human approval before agent uses
- apiGroups: ["apps"]
resources: ["deployments"]
verbs: ["patch"] # For rollout restart (uses patch, not delete)
# Never grant: delete, deletecollection on any resource
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: sre-agent-binding
namespace: production
subjects:
- kind: ServiceAccount
name: sre-agent
namespace: production
roleRef:
kind: Role
apiGroup: rbac.authorization.k8s.io
name: sre-agent-roleThe full security threat model, including prompt injection into K8s object names and audit trail tamper protection, is at /security-considerations.