C4 Architecture Diagrams¶
Diagram-first design. If a diagram is too complex, the code is too complex.
Level 1: System Context¶
Who uses ATL-E and what external systems does it talk to?
graph TB
User["Stig-Johnny<br/>(Product Owner)"]
ATL["ATL-E<br/>Agent Coordinator"]
GH["GitHub<br/>Issues, PRs, CI"]
DC["Discord<br/>Agent Communication"]
Agents["Agent Fleet<br/>Pi-E, iBuild-E, Volt-E, Review-E"]
User -->|Views dashboard<br/>Labels issues agent-ready| ATL
ATL -->|Polls issues, PRs, CI<br/>Updates labels| GH
ATL -->|Reads agent activity<br/>Posts assignments, pings| DC
ATL -->|Assigns work<br/>Monitors progress| Agents
Agents -->|Push code, create PRs| GH
Agents -->|Post updates, respond to pings| DC
style ATL fill:#4a90d9,color:#fff
style User fill:#2d6a4f,color:#fff
style GH fill:#333,color:#fff
style DC fill:#5865F2,color:#fff
style Agents fill:#e07c24,color:#fff
Kept simple: ATL-E sits between the user and the agents. Three external systems: GitHub, Discord, the agents themselves.
Level 2: Container Diagram¶
What runs inside ATL-E?
graph TB
subgraph "ATL-E Service"
WEB["Dashboard<br/>(Blazor Server)"]
HUB["SignalR Hub<br/>(Real-time updates)"]
GP["GitHub Poller<br/>(Background Service)"]
DB["Discord Bot<br/>(Background Service)"]
SD["Stall Detector<br/>(Background Service)"]
AE["Assignment Engine"]
ST["State Store<br/>(JSON file)"]
end
Browser["Browser"] -->|HTTPS| WEB
WEB -->|Push updates| HUB
HUB -->|WebSocket| Browser
GP -->|Poll every 2 min| GitHub["GitHub API"]
GP -->|Issue/PR changes| HUB
GP -->|New data| AE
DB -->|Read messages| Discord["Discord API"]
DB -->|Post assignments, pings| Discord
DB -->|Agent activity| HUB
SD -->|Check timeouts| AE
SD -->|Ping agents| DB
SD -->|Escalate| DB
AE -->|Read/Write| ST
AE -->|Update labels| GP
style WEB fill:#4a90d9,color:#fff
style HUB fill:#7c3aed,color:#fff
style GP fill:#333,color:#fff
style DB fill:#5865F2,color:#fff
style SD fill:#dc2626,color:#fff
style AE fill:#059669,color:#fff
style ST fill:#78716c,color:#fff
5 containers, each with one job:
| Container | Type | Responsibility |
|---|---|---|
| Dashboard | Blazor Server page | Kanban board, agent status |
| SignalR Hub | WebSocket hub | Push real-time updates to browser |
| GitHub Poller | Background service | Poll issues/PRs/CI, update state |
| Discord Bot | Background service | Read/write Discord, track agent activity |
| Stall Detector | Background service | Detect stalls, ping agents, escalate |
| Assignment Engine | In-process logic | Match issues to agents, manage state |
| State Store | JSON file on PVC | Persist across restarts |
Level 3: Component Diagram — GitHub Poller¶
graph LR
subgraph "GitHub Poller"
POLL["PollLoop<br/>Timer: 2 min"]
IM["IssueMonitor<br/>Open issues + labels"]
PM["PrMonitor<br/>Open PRs + CI + reviews"]
LM["LabelManager<br/>Add/remove labels"]
AM["ActivityMonitor<br/>Agent GitHub events"]
end
POLL --> IM
POLL --> PM
POLL --> AM
IM -->|IssueSnapshot[]| STATE["State Store"]
PM -->|PrSnapshot[]| STATE
AM -->|AgentActivity[]| STATE
LM -->|Update labels| GH["GitHub API"]
STATE -->|Changes| HUB["SignalR Hub"]
No new concepts. Same monitors from v1, wrapped in a timer loop instead of CronJob.
Level 3: Component Diagram — Discord Bot¶
graph LR
subgraph "Discord Bot"
CLIENT["DiscordClient<br/>discord.net"]
READER["MessageReader<br/>Watch agent channels"]
WRITER["MessageWriter<br/>Post via bot (not webhooks)"]
TRACKER["ActivityTracker<br/>Last message per agent"]
end
CLIENT -->|Message received| READER
READER -->|Agent spoke| TRACKER
READER -->|Error keyword| ALERT["Stall Detector"]
TRACKER -->|Last seen times| STATE["State Store"]
STATE -->|Changes| HUB["SignalR Hub"]
WRITER -->|Send message| CLIENT
Key change from v1: Bot token replaces webhooks. Can now READ Discord, not just write.
Level 3: Component Diagram — Stall Detector¶
graph TD
subgraph "Stall Detector"
LOOP["CheckLoop<br/>Timer: 5 min"]
RULES["StallRules"]
end
LOOP --> RULES
RULES -->|"Claimed >60min, no PR"| PING["Ping agent via Discord"]
RULES -->|"Pinged >15min, no response"| ESCALATE["Escalate to #admin"]
RULES -->|"PR approved, not merged >60min"| NOTIFY["Notify #admin"]
RULES -->|"CI failed >30min, no fix"| PING
RULES -->|"Agent offline >6h"| REPORT["Report in dashboard"]
PING --> WRITER["Discord Writer"]
ESCALATE --> WRITER
NOTIFY --> WRITER
5 stall rules, 3 actions: ping agent, escalate to admin, report in dashboard.
Level 3: Component Diagram — Dashboard¶
graph TD
subgraph "Dashboard (Blazor Server)"
BOARD["KanbanBoard.razor<br/>Issues by status"]
AGENTS["AgentPanel.razor<br/>Online/offline + current task"]
FEED["ActivityFeed.razor<br/>Recent events"]
end
HUB["SignalR Hub"] -->|Issue updates| BOARD
HUB -->|Agent status| AGENTS
HUB -->|Events| FEED
BOARD -->|5 columns| COL["Backlog | Ready | In Progress | In Review | Done"]
3 Blazor components. The SignalR hub pushes all data — no polling from the browser.
Kanban Columns¶
| Column | Source | Cards show |
|---|---|---|
| Backlog | label:backlog |
Issue title, repo, age |
| Ready | label:agent-ready |
Issue title, repo, priority |
| In Progress | label:in-progress |
Issue title, agent, time elapsed |
| In Review | Open PR linked to issue | PR title, CI status, reviewer |
| Done | Closed in last 24h | Issue title, agent, duration |
Agent Panel¶
| Field | Source |
|---|---|
| Name | Config |
| Status | GitHub Events API + Discord last message |
| Current task | State Store (assignments) |
| Last seen | Latest of GitHub activity or Discord message |
Level 4: Code — Key Classes¶
classDiagram
class AtlService {
+IServiceProvider Services
+Start()
+Stop()
}
class GitHubPoller {
-Timer _timer
-IssueMonitor _issues
-PrMonitor _prs
-ActivityMonitor _activity
+StartAsync()
+StopAsync()
}
class DiscordBotService {
-DiscordSocketClient _client
-MessageReader _reader
-ActivityTracker _tracker
+StartAsync()
+StopAsync()
+SendMessage(channelId, text)
}
class StallDetector {
-Timer _timer
-StallRules _rules
-DiscordBotService _discord
+StartAsync()
+CheckAll()
}
class DashboardHub {
+SendIssueUpdate(issue)
+SendAgentUpdate(agent)
+SendEvent(event)
}
class StateStore {
-Dictionary issues
-Dictionary prs
-Dictionary agents
-List assignments
+Update(data)
+Save()
+Load()
}
class AssignmentEngine {
+FindAssignments(issues, agents)
+IsAgentOnline(agent)
+IsAgentBusy(agent)
}
AtlService --> GitHubPoller
AtlService --> DiscordBotService
AtlService --> StallDetector
AtlService --> DashboardHub
GitHubPoller --> StateStore
GitHubPoller --> DashboardHub
DiscordBotService --> StateStore
DiscordBotService --> DashboardHub
StallDetector --> AssignmentEngine
StallDetector --> DiscordBotService
AssignmentEngine --> StateStore
7 classes. If this grows beyond ~10 key classes, the design is too complex.
Deployment¶
graph LR
subgraph "Dell k3s"
subgraph "atl-agent namespace"
DEP["Deployment<br/>(always-on, 1 replica)"]
PVC["PVC<br/>state.json"]
SVC["Service<br/>:8080"]
end
end
CF["Cloudflare Tunnel<br/>atl.dashecorp.com"] --> SVC
Browser --> CF
DEP --> PVC
DEP -->|HTTPS| GH["GitHub API"]
DEP -->|WSS| DC["Discord Gateway"]
Change from v1: CronJob → Deployment. Add Service + Cloudflare Tunnel for dashboard access.
Summary¶
| Metric | Value |
|---|---|
| Containers | 5 (Dashboard, Hub, GitHub Poller, Discord Bot, Stall Detector) |
| Blazor components | 3 (Board, Agents, Feed) |
| Background services | 3 |
| Key classes | 7 |
| External integrations | 2 (GitHub, Discord) |
| New credentials needed | 1 (Discord bot token for ATL-E) |