Operations¶
How It Works¶
ATL-E runs as a k8s CronJob every 15 minutes. Each run:
- Loads persisted state from
/data/state.json - Polls all repos for open PRs and issues
- Computes PR states via the state machine
- Evaluates notification rules (respecting cooldowns)
- Checks agent online status via GitHub Events API
- Finds
agent-readyissues and assigns to online agents only - Detects completed assignments (closed issues)
- Sends Discord notifications
- Saves state
Monitoring¶
# Check CronJob status
kubectl get cronjobs -n atl-agent
# See recent jobs
kubectl get jobs -n atl-agent --sort-by=.metadata.creationTimestamp
# View logs from last run
kubectl logs -n atl-agent job/$(kubectl get jobs -n atl-agent -o jsonpath='{.items[-1].metadata.name}')
# Check state file
kubectl exec -n atl-agent $(kubectl get pods -n atl-agent -o jsonpath='{.items[0].metadata.name}') -- cat /data/state.json
Issue Workflow¶
For the user (Stig-Johnny)¶
- Review issues in the backlog
- Add
agent-readylabel to approve for work - Optionally add
high-priorityfor urgency - ATL-E handles the rest
What ATL-E does¶
- Detects
agent-readyissues - Checks capability labels (
requires-macos,requires-tablez, etc.) - Finds best available agent (by performance score)
- Posts assignment to agent's Discord channel
- Updates labels: removes
agent-ready, addsin-progress+claimed-* - When issue closes: frees agent, assigns next from queue
Priority order¶
high-priority+bug(critical bugs)high-priority(priority features)bug(regular bugs)- Everything else (oldest first)
Troubleshooting¶
ATL-E not running¶
# Check if CronJob is suspended
kubectl get cronjob atl-agent -n atl-agent -o jsonpath='{.spec.suspend}'
# Check for failed jobs
kubectl get jobs -n atl-agent --field-selector status.failed=1
Notifications not sending¶
- Check Discord webhook URL is valid
- Check logs for "Discord webhook failed"
- Verify cooldown hasn't been triggered (default: 120 min between re-notifications)
Agent not getting assigned¶
Possible causes:
- Agent is offline — no GitHub activity in the last 6 hours (configurable via assignment.offlineThresholdMinutes)
- Agent is already busy (has an active assignment in state)
- No agent-ready issues match agent's capabilities
- Issue has workflow-change or infra-change label (Codi-E only)
- Issue is already in-progress or claimed
Agent Online Detection¶
ATL-E checks each agent's recent GitHub activity before assigning work. An agent is considered online if it has any GitHub event (push, comment, PR, etc.) within the configured threshold (default: 6 hours).
How it works:
1. ATL-E calls the GitHub Events API for each agent's username
2. Checks the timestamp of the most recent event
3. If the event is older than offlineThresholdMinutes, the agent is skipped
What counts as activity: - Pushing commits - Creating/commenting on PRs or issues - Creating branches - Any GitHub API action under the agent's account
Logs show agent status each run:
Agent status: 3 online, 1 offline
Agent offline: Volt-E (last seen: 8h ago)
If all agents are offline, no assignments are made — issues stay in the agent-ready queue until an agent comes online.
State file corrupted¶
# Delete state file to start fresh (loses cooldown/assignment tracking)
kubectl exec -n atl-agent <pod> -- rm /data/state.json
Capability Labels¶
| Label | Required Capability | Eligible Agents |
|---|---|---|
requires-macos |
macos |
iBuild-E |
requires-tablez |
tablez |
Pi-E, Volt-E |
docs-only |
docs |
All (except Review-E) |
workflow-change |
— | Codi-E only (ATL-E skips) |
infra-change |
— | Codi-E only (ATL-E skips) |
| (no label) | — | Any agent |