Incidents
89 records
◢ Filter
⇵ Sort
☷ Group
+ New Incident
INC0042891
12 min ago
RoadTrip agent fleet unresponsive -- 12 agents offline
INC0042890
28 min ago
CarKeys vault API returning 503 on credential retrieval
INC0042889
1 hr ago
RoadCode IDE -- syntax highlighting broken for .rs files
INC0042888
2 hrs ago
Memory system journal chain hash mismatch on alice node
INC0042887
3 hrs ago
Road web gateway -- /api/chat returns stale agent roster
INC0042886
5 hrs ago
BackRoad social feed -- post rendering delay > 3s on mobile
INC0042885
Yesterday
RoadView search index rebuild taking 4x longer than expected
INC0042884
Yesterday
OfficeRoad doc export -- PDF margins incorrect on A4
INC0042883
2 days ago
RoadBand mesh -- cecilia node drops Tailscale connection daily
INC0042891
RoadTrip agent fleet unresponsive -- 12 agents offline
In Progress
SLA: 18 min remaining
· Opened 12 min ago · Updated 2 min ago
Details
Work Notes
Related
Resolution
SLA
Incident Details
▾
Number
INC0042891
State
In Progress
Priority
1 - Critical
Impact
1 - Enterprise
Urgency
1 - Critical
Category
Infrastructure
Subcategory
Agent Runtime
Assignment Group
Platform Engineering
Assigned To
Silas
Caller
Alexa Amundson
Configuration Item
roadtrip.blackroad.io
Business Service
RoadTrip -- Agent Fleet
Description
12 of 27 RoadTrip agents went offline simultaneously at 20:12 UTC. Affected agents: cecilia, olympia, gematria, portia, atticus, cicero, valeria, celeste, elias, ophelia, gaia, anastasia. Agents on alice and blackroad Pi hosts are still responding. WebSocket connections dropped and heartbeat pings are timing out. No deployment or config change in the last 6 hours. Suspect network partition or NATS broker issue on the secondary cluster.
Activity Stream
All
Work Notes
Customer
SI
2 min ago
Work Note
Confirmed NATS broker on secondary cluster is unreachable. Failover didn't trigger because the health check interval was set to 60s instead of 10s. Patching now.
$ nats-server --signal reload
Reloading config on nats-secondary...
Cluster reconnection in progress: 8/12 agents recovered
OC
8 min ago
Work Note
Escalated to P1. Impact is enterprise-wide -- any product using RoadTrip agent delegation is affected (RoadWork, Roadie, BackRoad).
RD
10 min ago
System
Auto-detected: 12 agents failed heartbeat check. Correlation rule fired: NATS cluster partition. Recommended KB article: KB0008412 "NATS broker failover procedure".
AA
12 min ago
Caller
Opened incident. "12 agents just dropped off the convoy. alice and blackroad hosts are fine but everything on the secondary cluster is gone. Need immediate help."