Рет қаралды 1,119
Cortex: How to Run a Rock Solid Multi-Tenant Prometheus - Friedrich Gonzalez, Adobe & Alan Protasio, Amazon Web Services
Cortex is a CNCF open-source project that provides horizontally scalable, highly available, multi-tenant, long term storage for Prometheus. Friedrich will initially introduce Cortex current architecture and project status. Then the core of the talk will be about some resilience strategies and features included in cortex that prevent or reduce failure, so that metrics continue flowing. It will be explained which have been added recently and how operators can use all of them in 2023. The first important feature is the hash-ring and replication factor that ensures that process crashing can be tolerated. There is also the zone aware replication that helps to tolerate zone outages. No less important are the tenant limits that help to control costs and usage for specific tenants. After that there are also the instance limits that prevent single processes from getting overloaded. And finally, there is shuffle sharding that reduces the blast radius of an outage.