Sentinel Observability
ML-powered anomaly detection that catches production failures 15 minutes before users notice — eliminating 89% of alert noise in the process.
Finding failures before the users do.
A fintech platform was entirely reactive to outages — customers were always the first to notice degraded service. The team had 40+ dashboards with no unified alert routing, and alert fatigue had made engineers ignore pages.
ML-powered anomaly detection with correlated traces.
We built an ingestion pipeline on ClickHouse processing 2M spans/minute, with a Python ML model detecting anomalies 15 minutes before user-visible errors surfaced. A single Slack channel now shows only actionable, deduplicated incidents.
The Impact
Quantifiable improvements across technical and business metrics.
15min
Average early-warning time before user-visible errors.
−89%
Reduction in alert noise after ML-driven routing.
0
Major outages in 6 months since deployment.
“We went from fighting fires to preventing them.”
— CTO, ClearPay