Resilience control planeDemo gateway active
CO
Live DemoDemo gateway
Dashboard / Resilience Control Plane

Agent Control Plane

Monitor failover, MCP health, gateway policy, and graceful user experience during infrastructure chaos.

98

Resilience Score

policy-weighted continuity
0

Fallbacks

automatic recovery paths
0

Retries

budgeted attempts this run
0

Incidents

degraded events contained

Incident Commander

Submit the user-facing request and observe the recovery path.

ready
request idreq_pending
gateway routetfy-gateway/us-east/primary-gpt
confidence92%

Live Agent Health

Model mesh, gateway, and MCP fleet status.

Model meshcontinuityops/incident-agent
success118ms
AI Gatewaytfy-gateway/us-east
success99.95%
MCP Gateway5 governed tool servers
success5 servers

Recovery Analytics

Fallback activity across the current incident run

Weekly activity
M
T
W
T
F
S
S
Average: 62%Peak: 90%

Route Distribution

Where the agent found continuity

Primary routes5
Recovered routes2
Degraded routes1
Human approval0

Resilience Timeline

Every gateway, model, policy, and MCP step in order.

Primary request accepted08:48:24 AM

req_9df3a21 routed through tfy-gateway/us-east

TrueFoundry AI Gateway14ms
Provider instability detected08:48:25 AM

Claude unavailable, routed request through fallback policy

TrueFoundry AI Gateway610mspreview_recovery_path
MCP evidence recovered08:48:26 AM

Metrics server returned 500, switched to cached tool response

TrueFoundry MCP Gateway900mspreview_recovery_path
User response preserved08:48:27 AM

Gateway selected backup model based on latency and reliability score

TrueFoundry AI Gateway142mspreview_recovery_path

Recovery Path

Model fallback, retry policy, and user continuity

armed
Primaryopenai/gpt-4.1healthy
Gateway policytfy-gateway/us-east/primary-gptreliability score 0.94
User responsepreserved2.4s recovery
Gateway will preserve the user response through degraded mode.

When a provider, gateway route, or MCP server fails, ContinuityOps records the failure, applies retry and fallback policy, and returns a transparent response instead of a dead-end error.

Chaos Lab

Inject realistic infrastructure failures.

MCP Server Fleet

Tool status, latency, and recovery action.

MCP serverStatusLatency
Metricsmetrics.query
success84ms

Prometheus adapter healthy

Logslogs.search
success116ms

Log search index current

Runbooksrunbook.lookup
success42ms

Cached runbooks warmed

Statuspagestatuspage.check
success64ms

Provider status reachable

Ticketingticket.create
success95ms

Write path authorized

User Experience Continuity

The response the operator receives after degraded-mode recovery.

systemReady

Run the demo to show how the user receives a useful response even when the model route or MCP server fails.