IT Service Management ▾
🏠
🔔
12
AA
Incidents
89 records
◢ Filter
⇵ Sort
☷ Group
+ New Incident
All
My Team
Critical
Unassigned
SLA Breach
INC0042891
12 min ago
RoadTrip agent fleet unresponsive -- 12 agents offline
P1 Critical Assigned: Silas
INC0042890
28 min ago
CarKeys vault API returning 503 on credential retrieval
P1 Critical Assigned: Octavia
INC0042889
1 hr ago
RoadCode IDE -- syntax highlighting broken for .rs files
P2 High Assigned: Silas
INC0042888
2 hrs ago
Memory system journal chain hash mismatch on alice node
P2 High Assigned: Alexa
INC0042887
3 hrs ago
Road web gateway -- /api/chat returns stale agent roster
P3 Medium Assigned: Roadie
INC0042886
5 hrs ago
BackRoad social feed -- post rendering delay > 3s on mobile
P3 Medium Assigned: Octavia
INC0042885
Yesterday
RoadView search index rebuild taking 4x longer than expected
P4 Low Assigned: Silas
INC0042884
Yesterday
OfficeRoad doc export -- PDF margins incorrect on A4
P4 Low Unassigned
INC0042883
2 days ago
RoadBand mesh -- cecilia node drops Tailscale connection daily
P3 Medium Assigned: Alexa
INC0042891
RoadTrip agent fleet unresponsive -- 12 agents offline
In Progress SLA: 18 min remaining · Opened 12 min ago · Updated 2 min ago
Details
Work Notes
Related
Resolution
SLA
Incident Details
Number
INC0042891
State
In Progress
Priority
1 - Critical
Impact
1 - Enterprise
Urgency
1 - Critical
Category
Infrastructure
Subcategory
Agent Runtime
Assignment Group
Assigned To
Caller
Configuration Item
Business Service
Description
12 of 27 RoadTrip agents went offline simultaneously at 20:12 UTC. Affected agents: cecilia, olympia, gematria, portia, atticus, cicero, valeria, celeste, elias, ophelia, gaia, anastasia. Agents on alice and blackroad Pi hosts are still responding. WebSocket connections dropped and heartbeat pings are timing out. No deployment or config change in the last 6 hours. Suspect network partition or NATS broker issue on the secondary cluster.
Activity Stream
All
Work Notes
Customer
SI
Silas
2 min ago
Work Note
Confirmed NATS broker on secondary cluster is unreachable. Failover didn't trigger because the health check interval was set to 60s instead of 10s. Patching now.
$ nats-server --signal reload Reloading config on nats-secondary... Cluster reconnection in progress: 8/12 agents recovered
OC
Octavia
8 min ago
Work Note
Escalated to P1. Impact is enterprise-wide -- any product using RoadTrip agent delegation is affected (RoadWork, Roadie, BackRoad).
RD
Roadie
10 min ago
System
Auto-detected: 12 agents failed heartbeat check. Correlation rule fired: NATS cluster partition. Recommended KB article: KB0008412 "NATS broker failover procedure".
AA
Alexa Amundson
12 min ago
Caller
Opened incident. "12 agents just dropped off the convoy. alice and blackroad hosts are fine but everything on the secondary cluster is gone. Need immediate help."