Plattr Team Guide¶

This guide covers deploying, operating, and maintaining Plattr. It assumes familiarity with Kubernetes, AWS, and infrastructure tooling.

Prerequisites¶

AWS CLI configured with admin-level credentials
AWS CDK v2 — npm install -g aws-cdk
kubectl connected to your EKS cluster
Helm v3
Node.js 18+ and npm
Docker (for CDK asset bundling)

Infrastructure Overview¶

AWS Account
├── EKS Cluster
│   ├── plattr-system namespace
│   │   ├── Plattr Operator (Deployment)
│   │   ├── Keycloak (StatefulSet, 2 replicas)
│   │   ├── cert-manager
│   │   ├── external-dns
│   │   └── ingress-nginx (NLB)
│   ├── production namespace
│   ├── staging namespace
│   ├── uat namespace
│   └── preview-* namespaces (dynamic)
├── Aurora PostgreSQL (shared, schema-per-app)
├── S3 (bucket-per-app)
├── ECR
│   ├── plattr-operator (operator image)
│   └── plattr-apps (all app images)
├── Secrets Manager
│   └── plattr/db-admin (Aurora admin creds)
└── Route 53 (DNS, managed by external-dns)

CDK Deployment¶

The infrastructure is defined in two CDK stacks.

Stack 1: PlattrOperatorStack¶

Deploys the operator, add-ons, CRDs, and environment namespaces.

cd packages/cdk

# Review what will be deployed
npx cdk diff PlattrOperatorStack

# Deploy
npx cdk deploy PlattrOperatorStack

What it creates: - plattr-system namespace with CRDs (Application, PreviewEnvironment) - Secrets Manager secret (plattr/db-admin) for Aurora admin credentials - ECR repository (plattr-operator) with 25-image retention - IRSA role for the operator ServiceAccount with permissions for S3, Secrets Manager, and STS - Operator Helm release with configuration (DB secret ARN, host, region, base domain, Keycloak URL) - Environment namespaces (staging, uat, production) with resource quotas - Add-ons (all enabled by default): - cert-manager + Let's Encrypt ClusterIssuer - external-dns with Route 53 - ingress-nginx with AWS NLB - Keycloak 26.0 (2 replicas, external PostgreSQL, HTTPS Ingress)

Stack 2: PlattrCicdStack¶

Sets up CI/CD roles for GitHub Actions.

npx cdk deploy PlattrCicdStack

What it creates: - GitHub OIDC provider for keyless authentication - ECR repository (plattr-apps) with 50-image retention - CI Deploy Role — non-prod deployments from any branch (ECR push + EKS describe) - Prod Deploy Role — production deployments from main branch only

CDK Context Values¶

Configure in cdk.json or via -c flags:

Key	Description	Example
`clusterName`	EKS cluster name	`platform-eks`
`kubectlRoleArn`	ARN of kubectl admin role	`arn:aws:iam::role/...`
`oidcProviderArn`	EKS OIDC provider ARN	`arn:aws:iam::oidc-provider/...`
`dbHost`	Aurora cluster endpoint	`platform-db.cluster-xxx.us-east-1.rds.amazonaws.com`
`dbSecretArn`	Secrets Manager ARN for DB admin creds	`arn:aws:secretsmanager:...`
`baseDomain`	Plattr base domain	`platform.company.dev`
`keycloakDomain`	Keycloak domain	`auth.platform.company.dev`

Operator Management¶

How the Operator Works¶

The operator watches two Custom Resource types:

Application — represents a deployed app. The operator reconciles it into database schemas, S3 buckets, Keycloak realms, Deployments, Services, Ingresses, and HPAs.
PreviewEnvironment — represents a PR preview. The operator creates an isolated namespace with its own database schema, storage, and workload.

Reconciliation is triggered by Kubernetes informers (watch events). The operator uses generation-based tracking to avoid infinite loops from status updates.

Operator Configuration¶

The operator reads configuration from environment variables:

Variable	Description	Required
`DB_ADMIN_URL`	PostgreSQL admin connection string (dev mode)	One of these
`DB_SECRET_ARN`	AWS Secrets Manager ARN (prod mode)	required
`DB_HOST`	Aurora cluster endpoint	Yes
`AWS_REGION`	AWS region	Yes
`BASE_DOMAIN`	Plattr base domain	Yes
`KEYCLOAK_URL`	Keycloak base URL	If auth enabled
`KEYCLOAK_ADMIN_USER`	Keycloak admin username	If auth enabled
`KEYCLOAK_ADMIN_PASSWORD`	Keycloak admin password	If auth enabled
`LEADER_ELECTION`	Enable leader election (`true`/`false`)	No (default: false)
`LEASE_NAMESPACE`	Namespace for leader lease	If leader election on
`AWS_ENDPOINT_URL`	Override AWS endpoint (for LocalStack)	No

Viewing Operator Logs¶

kubectl logs -n plattr-system -l app=plattr-operator -f --all-containers

Leader Election¶

For high-availability (multiple operator replicas), enable leader election:

env:
  - name: LEADER_ELECTION
    value: "true"
  - name: LEASE_NAMESPACE
    value: plattr-system

The operator creates a Lease resource (plattr-operator-leader). Only the leader processes reconciliation events. Lease duration is 15 seconds with 5-second renewal intervals.

Scaling the Operator¶

The operator runs as a single Deployment. With leader election enabled, you can run multiple replicas for failover:

kubectl scale deployment plattr-operator -n plattr-system --replicas=2

Only the leader processes events; standbys take over if the leader's lease expires.

CRD Management¶

Installing CRDs¶

kubectl apply -f manifests/crds/application.yaml
kubectl apply -f manifests/crds/preview-environment.yaml

Verifying CRDs¶

kubectl get crd applications.platform.internal
kubectl get crd previewenvironments.platform.internal

Creating an Application¶

apiVersion: platform.internal/v1alpha1
kind: Application
metadata:
  name: my-frontend
  namespace: default
spec:
  repository: github.com/myorg/my-frontend
  framework: nextjs
  environment: production
  imageRef: 123456789.dkr.ecr.us-east-1.amazonaws.com/plattr-apps:my-frontend-abc123
  database:
    enabled: true
  storage:
    enabled: true
    buckets:
      - name: uploads
        public: false
  auth:
    enabled: true
    providers: [google, github]
  scaling:
    min: 2
    max: 20
    targetCPU: 70

kubectl apply -f my-app.yaml

Checking Application Status¶

# Summary
kubectl get applications

# Detailed status
kubectl get application my-frontend -o yaml

# Status conditions
kubectl get application my-frontend -o jsonpath='{.status.conditions}' | python3 -m json.tool

Status phases: Pending → Provisioning → Running (or Failed)

See the full CRD Reference.

Supporting Add-Ons¶

cert-manager¶

Provides automatic TLS certificates via Let's Encrypt.

# Verify cert-manager is running
kubectl get pods -n cert-manager

# Check ClusterIssuer
kubectl get clusterissuer letsencrypt-prod

# Debug certificate issues
kubectl get certificates -A
kubectl describe certificate <name> -n <namespace>

external-dns¶

Automatically creates DNS records in Route 53 from Ingress resources.

# Verify external-dns is running
kubectl get pods -n kube-system -l app=external-dns

# Check logs for DNS sync
kubectl logs -n kube-system -l app=external-dns

ingress-nginx¶

Routes external traffic to services via an AWS Network Load Balancer.

# Verify ingress controller
kubectl get pods -n ingress-nginx

# Check NLB
kubectl get svc -n ingress-nginx ingress-nginx-controller

# List all Ingress resources
kubectl get ingress -A

Keycloak¶

Managed authentication provider running on EKS.

# Check Keycloak pods
kubectl get pods -n plattr-system -l app=keycloak

# Access admin console (port-forward)
kubectl port-forward -n plattr-system svc/keycloak 8443:443
# Open https://localhost:8443/admin

Monitoring¶

Prometheus Metrics¶

The operator exposes Prometheus metrics on port 9090 (default Express server):

Metric	Type	Description
`application_phase`	Gauge	Current phase of each Application (labeled by name, namespace, phase)
`reconcile_total`	Counter	Total reconciliation count (labeled by name, result: success/failure)

Health Checks¶

The operator exposes a health endpoint. Check it with:

kubectl exec -n plattr-system deploy/plattr-operator -- curl -s localhost:9090/healthz

Resource Quotas¶

Each environment namespace has resource quotas (configured in CDK):

kubectl get resourcequota -n production
kubectl get resourcequota -n staging
kubectl get resourcequota -n uat

Troubleshooting¶

Application Stuck in "Provisioning"¶

Check operator logs for errors:

kubectl logs -n plattr-system -l app=plattr-operator --tail=100

Check Application conditions:

kubectl get application my-app -o jsonpath='{.status.conditions}' | python3 -m json.tool

Common causes:
DatabaseReady: False — Aurora connectivity issue, check DB_ADMIN_URL or DB_SECRET_ARN
StorageReady: False — S3/IAM permissions issue, check operator IRSA role
AuthReady: False — Keycloak unreachable, check Keycloak pods
DeploymentReady: False — Pods failing to start, check kubectl describe pod

Application Stuck in "Failed"¶

The operator retries up to 3 times with exponential backoff (1s, 2s, 4s). If it fails:

Fix the underlying issue

Trigger re-reconciliation by touching the spec:

kubectl patch application my-app --type=merge -p '{"spec":{"scaling":{"min":2}}}'

Preview Environment Not Cleaning Up¶

The TTL controller runs every 5 minutes. Check:

# List previews with expiry
kubectl get previewenvironments -o custom-columns=NAME:.metadata.name,EXPIRES:.status.expiresAt

# Manually delete
kubectl delete previewenvironment my-app-pr-42

PostgREST Sidecar Not Working¶

Check both containers in the pod:

kubectl get pods -l app.kubernetes.io/name=my-app
kubectl logs <pod> -c postgrest

Common issues:
Connection refused — database schema or role doesn't exist yet (migration not run)
Empty response — no tables granted to the _anon role
Schema not found — schema name mismatch (check PGRST_DB_SCHEMAS env var)

Database Connection Issues¶

# Check the DB secret exists
kubectl get secret my-app-db -n production -o yaml

# Test connectivity from a pod
kubectl run -it --rm pg-test --image=postgres:14 -- psql "$(kubectl get secret my-app-db -n production -o jsonpath='{.data.DATABASE_URL}' | base64 -d)"

Operator Crash Loop¶

# Check events
kubectl get events -n plattr-system --sort-by='.lastTimestamp'

# Check resource limits
kubectl describe pod -n plattr-system -l app=plattr-operator

Common causes: - Missing IRSA role (check ServiceAccount annotation) - Invalid DB_SECRET_ARN - Kubernetes API permissions (check RBAC)

Day-2 Operations¶

Upgrading the Operator¶

Build and push new operator image to ECR
Update the Helm values or CDK context with the new image tag

Deploy:

npx cdk deploy PlattrOperatorStack

Or update the Helm release directly:

helm upgrade plattr-operator ./chart -n plattr-system --set image.tag=new-version

Upgrading PostgREST¶

The PostgREST version is pinned in the operator code (postgrest/postgrest:v12.2.3). To upgrade:

Update the version in packages/operator/src/reconcilers/workload.ts
Rebuild and deploy the operator
All apps will get the new PostgREST version on next reconciliation

Database Backups¶

Aurora handles automated backups. For manual snapshots:

aws rds create-db-cluster-snapshot \
  --db-cluster-identifier plattr-aurora \
  --db-cluster-snapshot-identifier manual-$(date +%Y%m%d)

Scaling Infrastructure¶

More app replicas: Adjust scaling.min and scaling.max in the Application spec. HPA handles the rest.

More Aurora capacity: Scale through AWS console or CDK.

More Keycloak capacity: Scale the StatefulSet:

kubectl scale statefulset keycloak -n plattr-system --replicas=3

Adding a New Environment¶

Add the namespace in CDK (PlattrOperatorStack)
Deploy: npx cdk deploy PlattrOperatorStack
Update the CLI's resolveEnv() function to map the new environment name to namespace
Update CI/CD workflows to support the new environment

Rotating Database Credentials¶

Update the secret in AWS Secrets Manager

Restart the operator to pick up new credentials:

kubectl rollout restart deployment plattr-operator -n plattr-system

App-level credentials (per-schema roles) are generated by the operator and stored in Kubernetes Secrets. To rotate, delete the Secret and let the operator recreate it:
```
kubectl delete secret my-app-db -n production
# Operator will detect the missing Secret and recreate it with new credentials
```