PromCon 2024 - Practical Anomaly Detection at Scale With PromQL

  Рет қаралды 287

Prometheus Monitoring

Prometheus Monitoring

Күн бұрын

Speakers: Jorge Creixell, Manoj Acharya
At Grafana Labs, we built a scalable anomaly detection system to aid in debugging issues. For example, within a specified time range, did a particular service exhibit any anomalies in its USE metrics, RED metrics, or other key KPIs? This helps us narrow down the scope to a few “interesting” services when debugging errors in a complex system.
We started by building automatic baselines per service for common USE and RED metrics, and expanded it to easily include any counter or gauge that users can tag with a special label. The baselines and alerting are based on standard deviation but take seasonality into account for the past two weeks. This approach avoids noisy alerts during regular daily or weekly spikes. We have been running this system at scale and in production and are about to roll it out to customers.
In this talk, we will present why we chose to adopt anomaly detection and the framework (to be open-sourced at PromCon) we used to detect the anomalies purely using PromQL. We will demonstrate how the baselines can be visualized in Grafana and how we group these alerts for troubleshooting purposes (DO NOT PAGE ON ANOMALY ALERTS!).
Additionally, we will showcase the flexibility of the framework and how our users can add anomaly detection to their custom metrics by simply adding a single label: asserts_anomaly="gauge".
promcon.io/202...

Пікірлер
PromCon 2024 - Inside a PromQL Query: Understanding the Mechanics
28:47
Prometheus Monitoring
Рет қаралды 249
PromCon 2024 - Prometheus 3.0 Overview
27:50
Prometheus Monitoring
Рет қаралды 356
Who’s the Real Dad Doll Squid? Can You Guess in 60 Seconds? | Roblox 3D
00:34
The selfish The Joker was taught a lesson by Officer Rabbit. #funny #supersiblings
00:12
Funny superhero siblings
Рет қаралды 11 МЛН
Running With Bigger And Bigger Lunchlys
00:18
MrBeast
Рет қаралды 133 МЛН
规则,在门里生存,出来~死亡
00:33
落魄的王子
Рет қаралды 30 МЛН
ObservabilityCON 2024 - Opening Keynote
1:17:32
Grafana
Рет қаралды 6 М.
Setup alerts in Grafana 10 with example
27:33
Learning Software
Рет қаралды 23 М.
PromCon 2024 - Applying GitOps principles for central alert management
28:47
Beyond The Success Of Kotlin / The Documentary by EngX
1:29:42
Anywhere Club
Рет қаралды 55 М.
Building LLMs from the Ground Up: A 3-hour Coding Workshop
2:45:10
Sebastian Raschka
Рет қаралды 59 М.
PromCon 2024 - Practical OpenTelemetry with Prometheus 3.0
27:20
Prometheus Monitoring
Рет қаралды 203
Who’s the Real Dad Doll Squid? Can You Guess in 60 Seconds? | Roblox 3D
00:34