Operations¶

How It Works¶

ATL-E runs as a k8s CronJob every 15 minutes. Each run:

Loads persisted state from /data/state.json
Polls all repos for open PRs and issues
Computes PR states via the state machine
Evaluates notification rules (respecting cooldowns)
Checks agent online status via GitHub Events API
Finds agent-ready issues and assigns to online agents only
Detects completed assignments (closed issues)
Sends Discord notifications
Saves state

Monitoring¶

# Check CronJob status
kubectl get cronjobs -n atl-agent

# See recent jobs
kubectl get jobs -n atl-agent --sort-by=.metadata.creationTimestamp

# View logs from last run
kubectl logs -n atl-agent job/$(kubectl get jobs -n atl-agent -o jsonpath='{.items[-1].metadata.name}')

# Check state file
kubectl exec -n atl-agent $(kubectl get pods -n atl-agent -o jsonpath='{.items[0].metadata.name}') -- cat /data/state.json

Issue Workflow¶

For the user (Stig-Johnny)¶

Review issues in the backlog
Add agent-ready label to approve for work
Optionally add high-priority for urgency
ATL-E handles the rest

What ATL-E does¶

Detects agent-ready issues
Checks capability labels (requires-macos, requires-tablez, etc.)
Finds best available agent (by performance score)
Posts assignment to agent's Discord channel
Updates labels: removes agent-ready, adds in-progress + claimed-*
When issue closes: frees agent, assigns next from queue

Priority order¶

high-priority + bug (critical bugs)
high-priority (priority features)
bug (regular bugs)
Everything else (oldest first)

Troubleshooting¶

ATL-E not running¶

# Check if CronJob is suspended
kubectl get cronjob atl-agent -n atl-agent -o jsonpath='{.spec.suspend}'

# Check for failed jobs
kubectl get jobs -n atl-agent --field-selector status.failed=1

Notifications not sending¶

Check Discord webhook URL is valid
Check logs for "Discord webhook failed"
Verify cooldown hasn't been triggered (default: 120 min between re-notifications)

Agent not getting assigned¶

Possible causes: - Agent is offline — no GitHub activity in the last 6 hours (configurable via assignment.offlineThresholdMinutes) - Agent is already busy (has an active assignment in state) - No agent-ready issues match agent's capabilities - Issue has workflow-change or infra-change label (Codi-E only) - Issue is already in-progress or claimed

Agent Online Detection¶

ATL-E checks each agent's recent GitHub activity before assigning work. An agent is considered online if it has any GitHub event (push, comment, PR, etc.) within the configured threshold (default: 6 hours).

How it works: 1. ATL-E calls the GitHub Events API for each agent's username 2. Checks the timestamp of the most recent event 3. If the event is older than offlineThresholdMinutes, the agent is skipped

What counts as activity: - Pushing commits - Creating/commenting on PRs or issues - Creating branches - Any GitHub API action under the agent's account

Logs show agent status each run:

Agent status: 3 online, 1 offline
Agent offline: Volt-E (last seen: 8h ago)

If all agents are offline, no assignments are made — issues stay in the agent-ready queue until an agent comes online.

State file corrupted¶

# Delete state file to start fresh (loses cooldown/assignment tracking)
kubectl exec -n atl-agent <pod> -- rm /data/state.json

Capability Labels¶

Label	Required Capability	Eligible Agents
`requires-macos`	`macos`	iBuild-E
`requires-tablez`	`tablez`	Pi-E, Volt-E
`docs-only`	`docs`	All (except Review-E)
`workflow-change`	—	Codi-E only (ATL-E skips)
`infra-change`	—	Codi-E only (ATL-E skips)
(no label)	—	Any agent