CF Sentinel
The AI that actually manages your Cloudflare account
Think OpenClaw — but for infrastructure health
Built on Agents SDK ·
Containers ·
Sandbox ·
AI Gateway ·
Workers AI
Peer Point Challenge · 2026
Inspired by OpenClaw
openclaw.ai — "The AI that actually does things"
OpenClaw (Personal)
- Clears your inbox, sends emails
- Manages your calendar
- Checks you in for flights
- Works via WhatsApp / Telegram
- Connects to real services, takes real actions
- Not a chatbot — an agent
CF Sentinel (Infrastructure)
- Monitors error rates, detects anomalies
- Watches audit logs for suspicious changes
- Manages SSL renewals, DNS health
- Works via dashboard / chat / alerts
- Connects to CF APIs, takes real actions
- Not a dashboard — an agent
Same philosophy: AI that acts, not AI that summarizes.
The Problem
Scattered Visibility
- Analytics dashboard for traffic
- Security tab for WAF/DDoS
- Separate audit log viewer
- SSL status buried in settings
- Workers metrics in another panel
- No unified health view
Reactive, Not Proactive
- Alert fatigue from noisy notifications
- No correlation between signals
- Manual root-cause investigation
- No historical pattern analysis
- Config drift goes unnoticed
- Audit log insights require SQL skills
What if your Cloudflare account
had its own OpenClaw?
An AI agent that continuously monitors, correlates, analyzes,
and acts on what's happening — before you even ask.
"Your 5xx rate on api.example.com spiked after a WAF rule change 12 min ago.
I've prepared a rollback. Approve?"
CF Sentinel — What It Does
Continuous Monitoring
- Error rates (4xx/5xx), origin health
- WAF/DDoS events, SSL expiry
- DNS health, Workers errors
Audit Intelligence
- Real-time audit log analysis
- Config change correlation
- API token usage anomalies
AI-Powered Analysis
- Anomaly detection + root cause
- Past incident lookup (RAG)
- Natural language summaries
Actionable Alerts
- Smart dedup + severity scoring
- Human-in-the-loop approval
- Auto-remediation (with consent)
Built on Bleeding-Edge Cloudflare
Every component runs on Cloudflare. Zero external dependencies.
Agents SDK NEW
— Stateful AI agent orchestration on Durable Objects. Tool use, memory, scheduling, human-in-the-loop. The brain of CF Sentinel.
Containers NEW
— Full Docker containers on Workers. Run complex analysis pipelines, Python ML models, custom monitoring scripts that exceed Workers limits.
Sandbox SDK BETA
— Isolated code execution for AI agents. Safely run LLM-generated diagnostic scripts, query builders, and remediation code.
AI Gateway GA
— Proxy & manage all LLM calls. Caching, rate limiting, cost tracking, fallback routing. Controls the AI spend.
Workers AI GA
— Serverless LLM inference (Llama 3, Mistral, embeddings). Powers analysis, summarization, anomaly explanation.
Architecture
Cron Triggers
every 1-5 min
→
Agent (DO)
Agents SDK
→
CF APIs
GraphQL + REST
Queues
event buffer
→
Workers AI
via AI Gateway
→
Analysis
Containers + Sandbox
→
Alerts
Email / Webhook
D1
metrics & config
R2
raw logs & reports
Vectorize
incident embeddings
KV
status cache
Pages
dashboard UI
Agents SDK — The Brain
Stateful AI agents on Durable Objects · npm: agents
Key Capabilities
- Persistent state + scheduled alarms
- Tool calling + human-in-the-loop
- WebSocket real-time UI
Sentinel Tools
queryAnalytics · getAuditLogs
checkSSL · analyzeFirewall
compareBaseline · sendAlert
import { Agent } from "agents";
export class SentinelAgent extends Agent<Env, SentinelState> {
async onSchedule(scheduledTime: number, taskName: string) {
const metrics = await queryAnalytics(this.env, { last: "5m" });
const anomalies = detectAnomalies(metrics, this.state.history);
if (anomalies.length > 0) {
const analysis = await this.env.AI.run("@cf/meta/llama-3.1-8b-instruct", {
messages: [{ role: "user", content: buildAnalysisPrompt(anomalies) }]
});
await sendAlert(this.env, { severity: analysis.severity, summary: analysis.response });
}
this.setState({ lastCheck: Date.now(), history: [...this.state.history, metrics] });
}
}
Containers — Heavy Lifting
Full Docker containers on Cloudflare Workers
What They Enable
- Python/Go/Rust analysis pipelines
- Anomaly detection (scipy, numpy)
- Escape Workers CPU/memory limits
Use in CF Sentinel
- Time-series anomaly detection
- Batch log processing from R2
- Compliance report generation
# wrangler.toml
[[containers]]
name = "anomaly-detector"
image = "cf-sentinel/anomaly-detector:latest"
max_instances = 3
Sandbox SDK — Safe Execution
Isolated environments for AI-generated code
The Challenge
AI agents need to run dynamically generated code:
diagnostic queries, data transformations, remediation scripts.
Running untrusted LLM output directly is dangerous.
Sandbox Solution
Sandbox SDK provides isolated V8 execution environments
with controlled access to APIs, time limits, and memory caps.
The agent can safely execute generated code without risk.
CF Sentinel Use Cases
- Dynamic GraphQL query construction
- LLM-generated diagnostic scripts
- Custom alert rule evaluation
- User-defined monitoring expressions
- Safe "what-if" config analysis
- Ad-hoc data transformation
// Agent generates a diagnostic query, runs it safely
const diagnosticCode = await env.AI.run("@cf/meta/llama-3.1-8b-instruct", {
messages: [{ role: "user", content: `Write JS to analyze: ${anomaly.description}` }]
});
const result = await env.SANDBOX.run({
code: diagnosticCode.response,
timeout: 5000,
bindings: { ANALYTICS: env.ANALYTICS_API },
});
AI Gateway — Cost & Quality Control
Proxy layer managing all LLM interactions
Caching
- Semantic cache for similar queries
- Exact match cache for repeated analysis
- Reduces token spend by 40-60%
- Sub-100ms cached responses
Rate Limiting
- Per-zone request budgets
- Priority queue for critical alerts
- Graceful degradation under load
- Cost ceiling enforcement
Observability
- Token usage per analysis type
- Latency percentiles
- Error rate tracking
- Full request/response logging
Resilience
- Fallback: Workers AI → external provider
- Automatic retries with backoff
- Model routing (fast vs accurate)
- Guardrails & content filtering
What CF Sentinel Monitors
| Signal | Source | Frequency | AI Analysis |
| HTTP Error Rates | GraphQL Analytics | 1 min | Anomaly detection, trend correlation |
| WAF / DDoS Events | Firewall Events API | 1 min | Attack pattern classification |
| Origin Health | Health Check API | 1 min | Degradation prediction |
| SSL Certificates | SSL API | 1 hour | Expiry risk scoring |
| DNS Health | DNS Analytics | 5 min | Resolution failure analysis |
| Workers Errors | Workers Invocations | 1 min | Error clustering, root cause |
| Audit Log | Audit Logs API | 1 min | Suspicious activity detection |
| Bot Traffic | Bot Management API | 5 min | Bot score distribution shifts |
| Config Changes | Audit Logs + Zone API | 1 min | Drift detection, impact assessment |
| API Token Usage | Audit Logs | 5 min | Anomalous access patterns |
Incident Memory — RAG Pipeline
Incident
anomaly detected
→
Embed
bge-base-en
→
Vectorize
similar incidents
→
LLM
via AI Gateway
→
Action
fix + explain
Vectorize stores
- Incident embeddings + resolutions
- Root cause classifications
- Affected zones & TTR metrics
AI produces
- "Similar to incident #42 (3 weeks ago)"
- "Cause: origin timeout after deploy"
- "Fix: roll back Worker v1.2.3 (87%)"
Human-in-the-Loop Remediation
AI suggests, human approves, agent executes
Detect
5xx spike on
api.example.com
→
Analyze
Correlate with
audit log change
→
Propose
"Roll back WAF
rule #12345"
→
Approve
Human clicks
✓ in dashboard
→
Execute
Agent applies
via CF API
Safety Levels
- Auto — cache purge, alert escalation
- Approve — rule changes, DNS updates
- Manual — account settings, SSL config
Powered By
- Agents SDK
requestHumanApproval()
- WebSocket push to dashboard (DO)
- Sandbox for safe "dry run" preview
- Full audit trail in D1
Dashboard — Pages + Durable Objects
Real-time health view with WebSocket updates
Account Overview
- Health score per zone (0-100)
- Active incidents & alerts
- Error rate sparklines
- Traffic volume trends
Incident Timeline
- Chronological event feed
- AI-generated summaries
- Correlated audit log entries
- Resolution status tracking
Chat Interface
- "Why is zone X error rate high?"
- "Show me audit changes today"
- "Compare this week vs last"
- Natural language → analytics
Approval Queue
- Pending remediation actions
- AI reasoning & confidence
- One-click approve/reject
- Action history & rollback
The Complete Stack
| Layer | Service | Role |
| Orchestration | Agents SDK + Durable Objects | Stateful agent lifecycle, scheduling, tool use |
| Compute | Workers + Cron Triggers | API calls, data processing, routing |
| Heavy Compute | Containers | ML models, batch analysis, report generation |
| Safe Execution | Sandbox SDK | LLM-generated code, dynamic queries |
| AI Inference | Workers AI | LLM analysis, embeddings, classification |
| AI Management | AI Gateway | Caching, rate limiting, cost control, fallbacks |
| Relational Data | D1 | Metrics history, config, incidents, alert rules |
| Object Storage | R2 | Raw logs (Logpush), reports, snapshots |
| Vector Search | Vectorize | Incident embeddings for RAG |
| Cache | Workers KV | Current status, dashboard state, config cache |
| Messaging | Queues | Decouple collection → analysis → alerting |
| Frontend | Pages | Dashboard SPA with real-time WebSocket |
| Notifications | Email Workers | Alert delivery, daily digests |
Why All-Cloudflare?
<50ms
API Latency (same network)
$5
Workers Paid Plan / month
Technical Advantages
- Same-network API calls = minimal latency
- Native auth via Service Bindings
- No egress costs (R2)
- Automatic global distribution
- Single deploy target (wrangler)
Operational Benefits
- Single vendor = one bill, one support
- Unified auth & permissions model
- Deployable with
wrangler deploy
- Scales from 1 zone to 1000+
- Open-sourceable reference architecture
Roadmap
Phase 1 — Foundation Now
Core monitoring agent with Agents SDK. Cron-based data collection from GraphQL Analytics + Audit Logs. D1 storage. Basic alerting via Email Workers.
Phase 2 — Intelligence
Workers AI analysis via AI Gateway. Vectorize RAG for incident memory. Anomaly detection. Natural language summaries. Chat interface on Pages dashboard.
Phase 3 — Autonomy
Container-based ML pipelines. Sandbox for dynamic diagnostics. Human-in-the-loop remediation. Multi-account support. Compliance reporting.
Phase 4 — Platform
Open-source release. Custom monitoring plugin API. Community-contributed detection rules. Integration with Cloudflare Notifications system.
Demo Scenarios
Scenario 1: 5xx spike detected → agent correlates with WAF rule change in audit log → suggests rollback → human approves → agent executes
Scenario 2: SSL certificate expiring in 7 days → agent checks renewal status → finds validation stuck → alerts with specific fix steps
Scenario 3: Unusual API token activity → agent detects token used from new IP range → cross-references audit log → flags for security review
Scenario 4: "Why is latency high?" in chat → agent queries analytics, finds origin degradation, checks health checks, summarizes in plain English
CF Sentinel
OpenClaw for your Cloudflare account.
The AI that actually manages your infrastructure.
100% Cloudflare Stack · Agents SDK · Containers · Sandbox · AI Gateway · Workers AI
Questions? Let's build this.