Monitoring, Logging, And Alerting In Kubernetes

  Рет қаралды 28,070

DevOps Toolkit

DevOps Toolkit

Күн бұрын

What is the best combination of tools for monitoring, logging, and alerting in Kubernetes?
#prometheus #grafana #loki #robusta
Consider joining the channel: / devopstoolkit
▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬
➡ Gist with the commands: gist.github.com/e593ff9fa1a34...
🔗 Prometheus: prometheus.io
🔗 Robusta: robusta.dev
🔗 Loki: grafana.com/oss/loki
🔗 Grafana: grafana.com/oss/grafana
🎬 Kubernetes Notifications, Troubleshooting, And Automation With Robusta: • Kubernetes Notificatio...
▬▬▬▬▬▬ 💰 Sponsoships 💰 ▬▬▬▬▬▬
If you are interested in sponsoring this channel, please use calendly.com/vfarcic/meet to book a timeslot that suits you, and we'll go over the details. Or feel free to contact me over Twitter or LinkedIn (see below).
▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬
➡ Twitter: / vfarcic
➡ LinkedIn: / viktorfarcic
▬▬▬▬▬▬ 🚀 Courses, books, and podcasts 🚀 ▬▬▬▬▬▬
📚 Books and courses: www.devopstoolkitseries.com
🎤 Podcast: www.devopsparadox.com/
💬 Live streams: / devopsparadox
▬▬▬▬▬▬ ⏱ Timecodes ⏱ ▬▬▬▬▬▬
00:00 Introduction to monitoring, logging, and alerting
00:35 Metrics And Alerting With Prometheus
08:04 Notifications With Robusta
10:01 Logs Collection With Loki
13:45 Dashboards With Grafana

Пікірлер: 58
@DevOpsToolkit
@DevOpsToolkit 2 жыл бұрын
IMPORTANT: I made a mistake in the video by saying that AlertManager is querying Prometheus. That's incorrect. It's the other way around. Prometheus is evaluating the rules and sending alerts to AlertManager which, in turn, is forwarding them to final destinations like Slack, email, etc. What do you use for monitoring, logging, and alerting? What's your favorite stack?
@oftheriverinthenight
@oftheriverinthenight 2 жыл бұрын
Prometheus, Loki, alert manager, grafana, slack
@sriveralopez
@sriveralopez 2 жыл бұрын
pin this comment!! great video, loving it
@coocoobau
@coocoobau 2 жыл бұрын
I highly recommend kube-prometheus-stack, all-in-one helm chart to deploy prometheus, grafana and alertmanager, each with its own operators. So instead of pre-defining things in values.yaml, you can use CRs to define targets, rules, alerts, dashboards, datasources, etc - in a Kubernetes way. For the logging part, I found banzaiclud's logging-operator to be very interesting, again a way to simplify the deployment of software for logging collection, aggregation and shipment (Loki being just one possible destination). It is also built around an operator and deploys instances of fluentd and fluentbit.
@nas1k
@nas1k 2 жыл бұрын
I think we should include Tracing here. It can be Jaeger, Temp or something else. And all those thing should be standardized by OpenTelemetry.
@DevOpsToolkit
@DevOpsToolkit 2 жыл бұрын
You're right. I should have added tracing to that video. I'll work on making a follow up with tracing
@DevOpsToolkit
@DevOpsToolkit Жыл бұрын
Here's Jaeger: kzbin.info/www/bejne/fHyTpptjbNN3ick
@luismorteo6112
@luismorteo6112 Жыл бұрын
as ever you are rigth, i try loki and wooow! woks perfect with grafana thanks a lot genius!
@soubinan
@soubinan 2 жыл бұрын
Great video!!! Observability is so important and allow a lot off evolution not yet explored today I have the mnemonic word AMLET for alerting, monitoring, logging, eventing (context and others) and tracing I think also that grafana is the de facto place to have all data to observe even as a saas) thanks to tempo and loki we can add more meaning to metrics dashboards (and I have a small preference for sensu go over robusta to serve as a glue around all that) and leverage all that with a runbook system for auto remediation (stackstorm, awx, ansible platform, jenkins, rundeck....). The dream!
@RaviSharma-vw7py
@RaviSharma-vw7py 2 жыл бұрын
Thanks Viktor for your nice video & informative , really helpful
@azerbaijan50
@azerbaijan50 Жыл бұрын
Nice explanation! Thank you very much.
@fpvclub7256
@fpvclub7256 2 жыл бұрын
Fantastic!
@daivol666
@daivol666 2 жыл бұрын
It would be interesting to see an example using opentelemetry to gather the observability data (avoding agents vendor lock-in) and use the otel pipelines to expose the data to different vendor solutions.
@DevOpsToolkit
@DevOpsToolkit 2 жыл бұрын
Adding it to my TODO list... :)
@vMILYAv
@vMILYAv 2 жыл бұрын
VictoriaMetrics (Operator)
@onemanops
@onemanops 2 жыл бұрын
Yes!
@MusheghDavtyan
@MusheghDavtyan 2 жыл бұрын
very interesting
@andriespiitso9797
@andriespiitso9797 Ай бұрын
I suggest you also add the alert example just like you do with querying. Otherwise great video enjoyed it 👍🏽
@jirityr
@jirityr 2 жыл бұрын
If you are serious about monitoring, you need to setup your own monitoring system even on managed kubernetes like EKS, GKE and AKS. I hope you will take the topic of monitoring further with introduction of Prometheus Operator, Grafana Cloud Agent (and GCA Operator), Grafana Operator and perhaps also Grafana Tempo. I would also love to see separate video about VictoriaMetrics that is much better than Prometheus itself.
@richarmunicosamaniego8216
@richarmunicosamaniego8216 Жыл бұрын
MELT stack = Monitoring, Event (alerting +OnCall), Logging and Tracing
@jemag
@jemag 2 жыл бұрын
Would be interesting to have a deeper dive, things like Thanos, Tempo, Mimir, etc. Also, what do you think of using their jsonnet libraries to manage those? I found the community helm charts to be not that well maintained and jsonnet is actually pretty flexible for an enterprise setup
@DevOpsToolkit
@DevOpsToolkit 2 жыл бұрын
I have Tanos on my TODO list. Adding tempo and Mimir as well... :)
@claytoncastro2734
@claytoncastro2734 2 жыл бұрын
I second that request for tempo. Great video as always viktor.
@ZachLanich
@ZachLanich Жыл бұрын
You're awesome
@DevOpsToolkit
@DevOpsToolkit Жыл бұрын
Thanks a ton Zach
@oftheriverinthenight
@oftheriverinthenight 2 жыл бұрын
In the latest versions grafana also shows the alert manager alerts and can be silenced from there too (bell icon)
@DevOpsToolkit
@DevOpsToolkit 2 жыл бұрын
I was not aware of that. That's great news.
@gaetanbloch7119
@gaetanbloch7119 Жыл бұрын
Thanks!
@DevOpsToolkit
@DevOpsToolkit Жыл бұрын
Thanks a ton!
@gaetanbloch7119
@gaetanbloch7119 Жыл бұрын
@@DevOpsToolkit My pleasure! Can’t help but contributing a bit as I’m binge watching your videos. Plus, it was a pleasure collaborating with you on Geekle’s conferences. Keep up the great work!
@vn7057
@vn7057 Жыл бұрын
After few mo later Grafana stack extended make more flexible Grafana tempo + open telemetry for auto instrument + Grafana agent Grafana loki Prometheus Grafana Alertmanger Basically included Metrics , log, apm/tracing and alert Also Grafana able to adding silence by UI so we don’t need expose Prometheus alertmanger to make alert mute
@SerhiiHromov
@SerhiiHromov Жыл бұрын
How did I miss this video, 2 days wasted. Thanks
@lhxperimental
@lhxperimental 2 жыл бұрын
This could not have come at a better time! Looking forward to part2 with tracing, open telemetry etc. and maybe also cover the maintenance aspects. Prometheus does automatic data purging which makes it maintenance free; how does loki compare with it. With logs the data volumes are going larger and much more workload dependent so one could easily overwhelm the system. Plus some organizations may need log archives to be kept for several years, how loki supports that use case would be interesting to see. My organization uses elastic search. Can loki be a replacement for elastic search today, or in future? The reason I would prefer loki over elastic is because I can co-relate logs with metrics, events and maybe even traces. In case of java/spring boot based apps, tracing can be very simple to achieve with auto-instrumentation. This would provide great visibility into the working of the application. I am myself exploring it this week.
@DevOpsToolkit
@DevOpsToolkit 2 жыл бұрын
Adding those to my TODO list... :) In the meantime, you might want to join kzbin.info/www/bejne/gKLIaJ6hjb-tedE and ask those questions there as well.
@lhxperimental
@lhxperimental 2 жыл бұрын
@@DevOpsToolkit Sure, added a reminder
@DevOpsToolkit
@DevOpsToolkit Жыл бұрын
OpenTelemetry is finally finished and available at kzbin.info/www/bejne/pZaYioyebtKbsNk. Tracing is coming next.
@dmsalomon
@dmsalomon 2 жыл бұрын
Great stuff, but the in my opinion the really tricky part is managing these things at scale. First of all there is the storage aspect, but also Prometheus seems to breakdown when the cluster gets too big. At that point you either need to use a federated setup or something else and it would be useful to hear your thoughts on that.
@DevOpsToolkit
@DevOpsToolkit 2 жыл бұрын
You're right. When running at scale, a single Prometheus does not work (it cannot scale). I already have it on my TODO list to tackle that subject in one of the upcoming videos.
@dmsalomon
@dmsalomon 2 жыл бұрын
@@DevOpsToolkit amazing! Very much looking forward to that
@joebowbeer
@joebowbeer 2 жыл бұрын
I concur with others that Tracing is conspicuously absent, as is OpenTelemetry (OTEL), which is the emerging standard that ties all these CNCF pieces together with others such as Fluent Bit
@DevOpsToolkit
@DevOpsToolkit 2 жыл бұрын
You're right. I got complaints in the past that the videos were too long so I started making shorter ones and that often results in things missing. I'll work on a second part of that video with tracing and open. telemetry
@DevOpsToolkit
@DevOpsToolkit Жыл бұрын
I finally got around making a video about OpenTelemetry (I wanted to explore it separately first). It's available at kzbin.info/www/bejne/pZaYioyebtKbsNk. Tracing is coming next.
@AhmedAyman-gs7oz
@AhmedAyman-gs7oz Ай бұрын
Great video. Are there any other self-managed logging solutions other than ELK/EFK and Loki-Grafana?
@DevOpsToolkit
@DevOpsToolkit Ай бұрын
There's fluentd and fluentbit for shipping logs.
@AhmedAyman-gs7oz
@AhmedAyman-gs7oz Ай бұрын
@@DevOpsToolkit Thanks. I mean more on the storage side. Like if I want to move away from AWS open search to decrease cost. Are there any alternatives than Loki?
@DevOpsToolkit
@DevOpsToolkit Ай бұрын
@AhmedAyman-gs7oz Loki is the only one I used besides elasticsearch (excluding managed solutions).
@samehammar8062
@samehammar8062 2 жыл бұрын
Thanks for this great video , what about black-box exporter ?
@DevOpsToolkit
@DevOpsToolkit 2 жыл бұрын
I should probably create a separate video about a selection of Prometheus exporters.
@samehammar8062
@samehammar8062 2 жыл бұрын
@@DevOpsToolkit this would be nice 😊. Thank you 🙏
@jgarfield
@jgarfield 2 жыл бұрын
How do we configure SSO for Grafana login?
@DevOpsToolkit
@DevOpsToolkit 2 жыл бұрын
As far as I know, there is no SSO in Grafana open-source version (not sure about Enterprise). I would probably try something like Teleport (you'll find a video in this channel).
@vn7057
@vn7057 2 жыл бұрын
Grafana is a great tools but somehow like Jenkins If the plug-in no longer maintain then you may get trouble it have chance may your Grafana unable to perform upgrade otherwise graph will dead But I do love the trend they go for Loki+Tempo+Display Prometheus Love to see if there have tools to generate dashboard and integrated 3 of them Do you have plan or already have video for Tempo?
@DevOpsToolkit
@DevOpsToolkit 2 жыл бұрын
Tempo is on my todo list :)
@student_voice
@student_voice 11 ай бұрын
Thnks man. I got error LogfmtErr. How to solved it?
@DevOpsToolkit
@DevOpsToolkit 11 ай бұрын
Not sure without taking a closer look at what you have...
@student_voice
@student_voice 11 ай бұрын
Can you share your email, so that i can share screenshot.?
@student_voice
@student_voice 11 ай бұрын
It is showing as : 👇 __error__ : LogfmtParserErr
@DevOpsToolkit
@DevOpsToolkit 11 ай бұрын
Please send me a dm on Twitter or LinkedIn. You'll find my info in a description of any video.
How To Troubleshoot And Support Kubernetes Applications And Clusters?
22:33
Каха и суп
00:39
К-Media
Рет қаралды 5 МЛН
Should We Run Databases In Kubernetes? CloudNativePG (CNPG) PostgreSQL
19:10
How Prometheus Monitoring works | Prometheus Architecture explained
21:31
TechWorld with Nana
Рет қаралды 1 МЛН
[ Kube 100 ] Getting started with Grafana Loki in Kubernetes
15:39
Just me and Opensource
Рет қаралды 47 М.
My Kubernetes cluster finally has Grafana Logging
15:28
Web Dev Cody
Рет қаралды 8 М.
You MUST Instrument Your Code With OpenTelemetry (OTEL)!
18:04
DevOps Toolkit
Рет қаралды 38 М.
Do NOT Learn Kubernetes Without Knowing These Concepts...
13:01
Travis Media
Рет қаралды 248 М.
Kubernetes RBAC Explained
23:17
Anton Putra
Рет қаралды 8 М.
10 Must-Have Kubernetes Tools
18:53
DevOps Toolkit
Рет қаралды 38 М.
Fluentd on Kubernetes: Log collection explained
27:54
That DevOps Guy
Рет қаралды 48 М.
1$ vs 500$ ВИРТУАЛЬНАЯ РЕАЛЬНОСТЬ !
23:20
GoldenBurst
Рет қаралды 1,7 МЛН
Battery  low 🔋 🪫
0:10
dednahype
Рет қаралды 4 МЛН