Plattr Team Guide¶
This guide covers deploying, operating, and maintaining Plattr. It assumes familiarity with Kubernetes, AWS, and infrastructure tooling.
Prerequisites¶
- AWS CLI configured with admin-level credentials
- AWS CDK v2 —
npm install -g aws-cdk - kubectl connected to your EKS cluster
- Helm v3
- Node.js 18+ and npm
- Docker (for CDK asset bundling)
Infrastructure Overview¶
AWS Account
├── EKS Cluster
│ ├── plattr-system namespace
│ │ ├── Plattr Operator (Deployment)
│ │ ├── Keycloak (StatefulSet, 2 replicas)
│ │ ├── cert-manager
│ │ ├── external-dns
│ │ └── ingress-nginx (NLB)
│ ├── production namespace
│ ├── staging namespace
│ ├── uat namespace
│ └── preview-* namespaces (dynamic)
├── Aurora PostgreSQL (shared, schema-per-app)
├── S3 (bucket-per-app)
├── ECR
│ ├── plattr-operator (operator image)
│ └── plattr-apps (all app images)
├── Secrets Manager
│ └── plattr/db-admin (Aurora admin creds)
└── Route 53 (DNS, managed by external-dns)
CDK Deployment¶
The infrastructure is defined in two CDK stacks.
Stack 1: PlattrOperatorStack¶
Deploys the operator, add-ons, CRDs, and environment namespaces.
cd packages/cdk
# Review what will be deployed
npx cdk diff PlattrOperatorStack
# Deploy
npx cdk deploy PlattrOperatorStack
What it creates:
- plattr-system namespace with CRDs (Application, PreviewEnvironment)
- Secrets Manager secret (plattr/db-admin) for Aurora admin credentials
- ECR repository (plattr-operator) with 25-image retention
- IRSA role for the operator ServiceAccount with permissions for S3, Secrets Manager, and STS
- Operator Helm release with configuration (DB secret ARN, host, region, base domain, Keycloak URL)
- Environment namespaces (staging, uat, production) with resource quotas
- Add-ons (all enabled by default):
- cert-manager + Let's Encrypt ClusterIssuer
- external-dns with Route 53
- ingress-nginx with AWS NLB
- Keycloak 26.0 (2 replicas, external PostgreSQL, HTTPS Ingress)
Stack 2: PlattrCicdStack¶
Sets up CI/CD roles for GitHub Actions.
What it creates:
- GitHub OIDC provider for keyless authentication
- ECR repository (plattr-apps) with 50-image retention
- CI Deploy Role — non-prod deployments from any branch (ECR push + EKS describe)
- Prod Deploy Role — production deployments from main branch only
CDK Context Values¶
Configure in cdk.json or via -c flags:
| Key | Description | Example |
|---|---|---|
clusterName |
EKS cluster name | platform-eks |
kubectlRoleArn |
ARN of kubectl admin role | arn:aws:iam::role/... |
oidcProviderArn |
EKS OIDC provider ARN | arn:aws:iam::oidc-provider/... |
dbHost |
Aurora cluster endpoint | platform-db.cluster-xxx.us-east-1.rds.amazonaws.com |
dbSecretArn |
Secrets Manager ARN for DB admin creds | arn:aws:secretsmanager:... |
baseDomain |
Plattr base domain | platform.company.dev |
keycloakDomain |
Keycloak domain | auth.platform.company.dev |
Operator Management¶
How the Operator Works¶
The operator watches two Custom Resource types:
- Application — represents a deployed app. The operator reconciles it into database schemas, S3 buckets, Keycloak realms, Deployments, Services, Ingresses, and HPAs.
- PreviewEnvironment — represents a PR preview. The operator creates an isolated namespace with its own database schema, storage, and workload.
Reconciliation is triggered by Kubernetes informers (watch events). The operator uses generation-based tracking to avoid infinite loops from status updates.
Operator Configuration¶
The operator reads configuration from environment variables:
| Variable | Description | Required |
|---|---|---|
DB_ADMIN_URL |
PostgreSQL admin connection string (dev mode) | One of these |
DB_SECRET_ARN |
AWS Secrets Manager ARN (prod mode) | required |
DB_HOST |
Aurora cluster endpoint | Yes |
AWS_REGION |
AWS region | Yes |
BASE_DOMAIN |
Plattr base domain | Yes |
KEYCLOAK_URL |
Keycloak base URL | If auth enabled |
KEYCLOAK_ADMIN_USER |
Keycloak admin username | If auth enabled |
KEYCLOAK_ADMIN_PASSWORD |
Keycloak admin password | If auth enabled |
LEADER_ELECTION |
Enable leader election (true/false) |
No (default: false) |
LEASE_NAMESPACE |
Namespace for leader lease | If leader election on |
AWS_ENDPOINT_URL |
Override AWS endpoint (for LocalStack) | No |
Viewing Operator Logs¶
Leader Election¶
For high-availability (multiple operator replicas), enable leader election:
The operator creates a Lease resource (plattr-operator-leader). Only the leader processes reconciliation events. Lease duration is 15 seconds with 5-second renewal intervals.
Scaling the Operator¶
The operator runs as a single Deployment. With leader election enabled, you can run multiple replicas for failover:
Only the leader processes events; standbys take over if the leader's lease expires.
CRD Management¶
Installing CRDs¶
kubectl apply -f manifests/crds/application.yaml
kubectl apply -f manifests/crds/preview-environment.yaml
Verifying CRDs¶
kubectl get crd applications.platform.internal
kubectl get crd previewenvironments.platform.internal
Creating an Application¶
apiVersion: platform.internal/v1alpha1
kind: Application
metadata:
name: my-frontend
namespace: default
spec:
repository: github.com/myorg/my-frontend
framework: nextjs
environment: production
imageRef: 123456789.dkr.ecr.us-east-1.amazonaws.com/plattr-apps:my-frontend-abc123
database:
enabled: true
storage:
enabled: true
buckets:
- name: uploads
public: false
auth:
enabled: true
providers: [google, github]
scaling:
min: 2
max: 20
targetCPU: 70
Checking Application Status¶
# Summary
kubectl get applications
# Detailed status
kubectl get application my-frontend -o yaml
# Status conditions
kubectl get application my-frontend -o jsonpath='{.status.conditions}' | python3 -m json.tool
Status phases: Pending → Provisioning → Running (or Failed)
See the full CRD Reference.
Supporting Add-Ons¶
cert-manager¶
Provides automatic TLS certificates via Let's Encrypt.
# Verify cert-manager is running
kubectl get pods -n cert-manager
# Check ClusterIssuer
kubectl get clusterissuer letsencrypt-prod
# Debug certificate issues
kubectl get certificates -A
kubectl describe certificate <name> -n <namespace>
external-dns¶
Automatically creates DNS records in Route 53 from Ingress resources.
# Verify external-dns is running
kubectl get pods -n kube-system -l app=external-dns
# Check logs for DNS sync
kubectl logs -n kube-system -l app=external-dns
ingress-nginx¶
Routes external traffic to services via an AWS Network Load Balancer.
# Verify ingress controller
kubectl get pods -n ingress-nginx
# Check NLB
kubectl get svc -n ingress-nginx ingress-nginx-controller
# List all Ingress resources
kubectl get ingress -A
Keycloak¶
Managed authentication provider running on EKS.
# Check Keycloak pods
kubectl get pods -n plattr-system -l app=keycloak
# Access admin console (port-forward)
kubectl port-forward -n plattr-system svc/keycloak 8443:443
# Open https://localhost:8443/admin
Monitoring¶
Prometheus Metrics¶
The operator exposes Prometheus metrics on port 9090 (default Express server):
| Metric | Type | Description |
|---|---|---|
application_phase |
Gauge | Current phase of each Application (labeled by name, namespace, phase) |
reconcile_total |
Counter | Total reconciliation count (labeled by name, result: success/failure) |
Health Checks¶
The operator exposes a health endpoint. Check it with:
Resource Quotas¶
Each environment namespace has resource quotas (configured in CDK):
kubectl get resourcequota -n production
kubectl get resourcequota -n staging
kubectl get resourcequota -n uat
Troubleshooting¶
Application Stuck in "Provisioning"¶
-
Check operator logs for errors:
-
Check Application conditions:
-
Common causes:
- DatabaseReady: False — Aurora connectivity issue, check DB_ADMIN_URL or DB_SECRET_ARN
- StorageReady: False — S3/IAM permissions issue, check operator IRSA role
- AuthReady: False — Keycloak unreachable, check Keycloak pods
- DeploymentReady: False — Pods failing to start, check
kubectl describe pod
Application Stuck in "Failed"¶
The operator retries up to 3 times with exponential backoff (1s, 2s, 4s). If it fails:
- Fix the underlying issue
- Trigger re-reconciliation by touching the spec:
Preview Environment Not Cleaning Up¶
The TTL controller runs every 5 minutes. Check:
# List previews with expiry
kubectl get previewenvironments -o custom-columns=NAME:.metadata.name,EXPIRES:.status.expiresAt
# Manually delete
kubectl delete previewenvironment my-app-pr-42
PostgREST Sidecar Not Working¶
-
Check both containers in the pod:
-
Common issues:
- Connection refused — database schema or role doesn't exist yet (migration not run)
- Empty response — no tables granted to the
_anonrole - Schema not found — schema name mismatch (check
PGRST_DB_SCHEMASenv var)
Database Connection Issues¶
# Check the DB secret exists
kubectl get secret my-app-db -n production -o yaml
# Test connectivity from a pod
kubectl run -it --rm pg-test --image=postgres:14 -- psql "$(kubectl get secret my-app-db -n production -o jsonpath='{.data.DATABASE_URL}' | base64 -d)"
Operator Crash Loop¶
# Check events
kubectl get events -n plattr-system --sort-by='.lastTimestamp'
# Check resource limits
kubectl describe pod -n plattr-system -l app=plattr-operator
Common causes: - Missing IRSA role (check ServiceAccount annotation) - Invalid DB_SECRET_ARN - Kubernetes API permissions (check RBAC)
Day-2 Operations¶
Upgrading the Operator¶
- Build and push new operator image to ECR
- Update the Helm values or CDK context with the new image tag
- Deploy: Or update the Helm release directly:
Upgrading PostgREST¶
The PostgREST version is pinned in the operator code (postgrest/postgrest:v12.2.3). To upgrade:
- Update the version in
packages/operator/src/reconcilers/workload.ts - Rebuild and deploy the operator
- All apps will get the new PostgREST version on next reconciliation
Database Backups¶
Aurora handles automated backups. For manual snapshots:
aws rds create-db-cluster-snapshot \
--db-cluster-identifier plattr-aurora \
--db-cluster-snapshot-identifier manual-$(date +%Y%m%d)
Scaling Infrastructure¶
More app replicas: Adjust scaling.min and scaling.max in the Application spec. HPA handles the rest.
More Aurora capacity: Scale through AWS console or CDK.
More Keycloak capacity: Scale the StatefulSet:
Adding a New Environment¶
- Add the namespace in CDK (
PlattrOperatorStack) - Deploy:
npx cdk deploy PlattrOperatorStack - Update the CLI's
resolveEnv()function to map the new environment name to namespace - Update CI/CD workflows to support the new environment
Rotating Database Credentials¶
- Update the secret in AWS Secrets Manager
- Restart the operator to pick up new credentials:
- App-level credentials (per-schema roles) are generated by the operator and stored in Kubernetes Secrets. To rotate, delete the Secret and let the operator recreate it: