GitOps for 100+ Clusters
GitOps at small scale is straightforward. At large scale β hundreds of clusters, thousands of applications, multiple teams β it requires careful architecture. Hereβs what works.
ArgoCD vs Flux: Quick Decision
| Feature | ArgoCD | Flux |
|---|---|---|
| UI | Rich web UI | CLI + Grafana |
| Multi-tenancy | ApplicationSets | Kustomization per tenant |
| Multi-cluster | Centralized hub | Decentralized (per-cluster) |
| Helm support | Native | HelmRelease CRD |
| Notification | Built-in | Notification Controller |
| Architecture | Hub-spoke | Distributed |
| Best for | Teams wanting a UI | Teams wanting simplicity |
ArgoCD at Scale: ApplicationSets
Manage hundreds of apps with a single definition:
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: microservices
namespace: argocd
spec:
goTemplate: true
generators:
- matrix:
generators:
- git:
repoURL: https://gitlab.internal/platform/app-registry
revision: HEAD
files:
- path: "apps/*/config.yaml"
- clusters:
selector:
matchLabels:
environment: production
template:
metadata:
name: '{{.path.basename}}-{{.name}}'
spec:
project: default
source:
repoURL: https://gitlab.internal/{{.values.repo}}
targetRevision: '{{.values.branch}}'
path: deploy/
helm:
valueFiles:
- values-{{.metadata.labels.region}}.yaml
destination:
server: '{{.server}}'
namespace: '{{.values.namespace}}'
syncPolicy:
automated:
prune: true
selfHeal: trueFlux at Scale: Multi-Tenancy
# Platform team: bootstrap
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: tenants
namespace: flux-system
spec:
interval: 10m
sourceRef:
kind: GitRepository
name: platform-config
path: ./tenants
prune: true
# Per-tenant isolation
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: team-payments
namespace: team-payments
spec:
interval: 5m
sourceRef:
kind: GitRepository
name: team-payments-repo
path: ./deploy
prune: true
serviceAccountName: team-payments-sa # RBAC isolation
targetNamespace: team-paymentsProgressive Delivery
Canary Deployments with Flagger
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: api
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api
progressDeadlineSeconds: 600
analysis:
interval: 1m
threshold: 5
maxWeight: 50
stepWeight: 10
metrics:
- name: request-success-rate
thresholdRange:
min: 99
interval: 1m
- name: request-duration
thresholdRange:
max: 500
interval: 1mKey Patterns
- Separate app config from app code β config in a dedicated repo, updated by CI
- Environment promotion via PRs β dev β staging β prod through Git PRs
- Drift detection alerts β alert when manual
kubectlchanges override Git state - Sealed Secrets or External Secrets β never store plaintext secrets in Git
- Namespace-per-team β GitOps isolation boundaries match team boundaries
Scaling GitOps for your organization? I help teams design multi-cluster GitOps architectures. Get in touch.
