Skip to content

Operations

How It Works

ATL-E runs as a k8s CronJob every 15 minutes. Each run:

  1. Loads persisted state from /data/state.json
  2. Polls all repos for open PRs and issues
  3. Computes PR states via the state machine
  4. Evaluates notification rules (respecting cooldowns)
  5. Checks agent online status via GitHub Events API
  6. Finds agent-ready issues and assigns to online agents only
  7. Detects completed assignments (closed issues)
  8. Sends Discord notifications
  9. Saves state

Monitoring

# Check CronJob status
kubectl get cronjobs -n atl-agent

# See recent jobs
kubectl get jobs -n atl-agent --sort-by=.metadata.creationTimestamp

# View logs from last run
kubectl logs -n atl-agent job/$(kubectl get jobs -n atl-agent -o jsonpath='{.items[-1].metadata.name}')

# Check state file
kubectl exec -n atl-agent $(kubectl get pods -n atl-agent -o jsonpath='{.items[0].metadata.name}') -- cat /data/state.json

Issue Workflow

For the user (Stig-Johnny)

  1. Review issues in the backlog
  2. Add agent-ready label to approve for work
  3. Optionally add high-priority for urgency
  4. ATL-E handles the rest

What ATL-E does

  1. Detects agent-ready issues
  2. Checks capability labels (requires-macos, requires-tablez, etc.)
  3. Finds best available agent (by performance score)
  4. Posts assignment to agent's Discord channel
  5. Updates labels: removes agent-ready, adds in-progress + claimed-*
  6. When issue closes: frees agent, assigns next from queue

Priority order

  1. high-priority + bug (critical bugs)
  2. high-priority (priority features)
  3. bug (regular bugs)
  4. Everything else (oldest first)

Troubleshooting

ATL-E not running

# Check if CronJob is suspended
kubectl get cronjob atl-agent -n atl-agent -o jsonpath='{.spec.suspend}'

# Check for failed jobs
kubectl get jobs -n atl-agent --field-selector status.failed=1

Notifications not sending

  1. Check Discord webhook URL is valid
  2. Check logs for "Discord webhook failed"
  3. Verify cooldown hasn't been triggered (default: 120 min between re-notifications)

Agent not getting assigned

Possible causes: - Agent is offline — no GitHub activity in the last 6 hours (configurable via assignment.offlineThresholdMinutes) - Agent is already busy (has an active assignment in state) - No agent-ready issues match agent's capabilities - Issue has workflow-change or infra-change label (Codi-E only) - Issue is already in-progress or claimed

Agent Online Detection

ATL-E checks each agent's recent GitHub activity before assigning work. An agent is considered online if it has any GitHub event (push, comment, PR, etc.) within the configured threshold (default: 6 hours).

How it works: 1. ATL-E calls the GitHub Events API for each agent's username 2. Checks the timestamp of the most recent event 3. If the event is older than offlineThresholdMinutes, the agent is skipped

What counts as activity: - Pushing commits - Creating/commenting on PRs or issues - Creating branches - Any GitHub API action under the agent's account

Logs show agent status each run:

Agent status: 3 online, 1 offline
Agent offline: Volt-E (last seen: 8h ago)

If all agents are offline, no assignments are made — issues stay in the agent-ready queue until an agent comes online.

State file corrupted

# Delete state file to start fresh (loses cooldown/assignment tracking)
kubectl exec -n atl-agent <pod> -- rm /data/state.json

Capability Labels

Label Required Capability Eligible Agents
requires-macos macos iBuild-E
requires-tablez tablez Pi-E, Volt-E
docs-only docs All (except Review-E)
workflow-change Codi-E only (ATL-E skips)
infra-change Codi-E only (ATL-E skips)
(no label) Any agent