Skip to content

Operations

Dashboard

Agent Dashboard: atl-e.dashecorp.com Kanban Board: kanban.dashecorp.com

The Automate-E dashboard shows:

  • Agent Info — name, bio, memory type
  • Active Sessions — Discord threads being tracked
  • MCP Servers — GitHub MCP status (green = connected)
  • Recent Tool Calls — GitHub API calls with latency and status
  • Token Usage & Cost — LLM calls, tokens, cost per model
  • Stats — Tool call success rate
  • Live Logs — Agent activity log

Checking Logs

# Gateway logs (Discord connection)
kubectl logs -n atl-e deploy/atl-e-automate-e-gateway

# Worker logs (agent loop + GitHub MCP)
kubectl logs -n atl-e -l app.kubernetes.io/component=worker

# Latest cron run
kubectl logs -n atl-e -l app.kubernetes.io/component=cron --sort-by=.metadata.creationTimestamp | tail -30

# All pods
kubectl get pods -n atl-e

Common Issues

Workers not responding to Discord messages

Check that workers have the GITHUB_PERSONAL_ACCESS_TOKEN env var:

kubectl get deploy atl-e-automate-e-worker -n atl-e \
  -o jsonpath='{.spec.template.spec.containers[0].env[*].name}'

Should include GITHUB_PERSONAL_ACCESS_TOKEN. If missing, ArgoCD hasn't synced the latest chart.

GitHub MCP server fails to connect

kubectl logs -n atl-e -l app.kubernetes.io/component=worker | grep -i "mcp\|github"

Common causes: - GITHUB_PERSONAL_ACCESS_TOKEN not set or expired - Network policy blocking outbound HTTPS - npm registry issues (MCP server downloaded via npx)

Cron posts reasoning instead of just notifications

The character prompt must include strict output rules. Check that personality contains: - "Output ONLY the final notifications" - "Do NOT include reasoning, analysis, or chain-of-thought" - "Keep total response under 1500 characters"

Webhooks not arriving

Check gateway logs for webhook events:

kubectl logs -n atl-e deploy/atl-e-automate-e-gateway | grep -i webhook

Common causes: - GITHUB_WEBHOOK_SECRET env var missing or mismatched - Cloudflare Tunnel not routing to gateway port 3000 - GitHub webhook delivery failures (check repo Settings → Webhooks → Recent Deliveries)

Postgres connection issues

# Check if Postgres is running
kubectl get pods -n atl-e -l app=postgres

# Test connection from worker
kubectl exec -n atl-e deploy/atl-e-automate-e-worker -- \
  node -e "console.log(process.env.DATABASE_URL ? 'DB URL set' : 'DB URL missing')"

# Check if facts are being saved
kubectl exec -n atl-e postgres-0 -- psql -U atl_e -d atl_e -c 'SELECT count(*) FROM facts'

High cost per cron run

Input tokens are high (~80K) because GitHub MCP returns verbose PR data. To reduce: - Reduce number of monitored repos - Increase cron interval (currently 1 hour) - The GitHub MCP server returns full PR bodies which inflates tokens

Scaling

Setting Current To change
Worker replicas 2 workers.replicas in values.yaml
Cron frequency Every hour cron.schedule in values.yaml
Model Haiku (cheapest) character.llm.model in values.yaml

Restart

# Restart gateway + workers
kubectl rollout restart deploy -n atl-e

# Force ArgoCD sync
kubectl -n argocd patch application atl-e --type merge \
  -p '{"metadata":{"annotations":{"argocd.argoproj.io/refresh":"hard"}}}'