AI & Observability / 2024

Sentinel Observability

ML-powered anomaly detection that catches production failures 15 minutes before users notice — eliminating 89% of alert noise in the process.

Sentinel Observability
Python FastAPI React ClickHouse OpenTelemetry
Role

Backend Architecture

Data Pipeline

Timeline

4 Months Delivery

Live Site View Project
The Challenge

Finding failures before the users do.

A fintech platform was entirely reactive to outages — customers were always the first to notice degraded service. The team had 40+ dashboards with no unified alert routing, and alert fatigue had made engineers ignore pages.

The Solution

ML-powered anomaly detection with correlated traces.

We built an ingestion pipeline on ClickHouse processing 2M spans/minute, with a Python ML model detecting anomalies 15 minutes before user-visible errors surfaced. A single Slack channel now shows only actionable, deduplicated incidents.

The Impact

Quantifiable improvements across technical and business metrics.

15min

Average early-warning time before user-visible errors.

−89%

Reduction in alert noise after ML-driven routing.

0

Major outages in 6 months since deployment.

“We went from fighting fires to preventing them.”

— CTO, ClearPay