monitoring: HomelabPrometheusTargetDown + HomelabDiskCritical
Schliesst die zwei in ALERT_RULES.md identifizierten Hoch-Luecken: - up==0 (5m) als critical in neuer Gruppe homelab-meta — Scrape-Targets (node-exporter/cadvisor/blackbox/traefik) sind nicht laenger stille Ausfaelle. - Disk-Critical bei >95% (5m) als critical, zusaetzlich zum bestehenden Warning bei >85% — fuer DB/appdata/Cache-Schreibblockaden. ALERT_RULES.md Tabellen und Status-Abschnitt aktualisiert. Wird wirksam nach Prometheus-Reload via Komodo-Redeploy des monitoring-Stacks. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -57,6 +57,15 @@ groups:
|
||||
summary: "Disk usage high on {{ $labels.mountpoint }}"
|
||||
description: "{{ $labels.mountpoint }} is above 85% used."
|
||||
|
||||
- alert: HomelabDiskCritical
|
||||
expr: 100 * (1 - node_filesystem_avail_bytes{fstype!~"tmpfs|overlay"} / node_filesystem_size_bytes{fstype!~"tmpfs|overlay"}) > 95
|
||||
for: 5m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "Disk critically full on {{ $labels.mountpoint }}"
|
||||
description: "{{ $labels.mountpoint }} is above 95% used. Writes may start to fail (DB, appdata, cache)."
|
||||
|
||||
- alert: HomelabHighMemoryUsage
|
||||
expr: 100 * (1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) > 90
|
||||
for: 10m
|
||||
@@ -130,3 +139,14 @@ groups:
|
||||
annotations:
|
||||
summary: "Critical container is down: {{ $labels.name }}"
|
||||
description: "The host textfile exporter reports that critical container {{ $labels.name }} is not running."
|
||||
|
||||
- name: homelab-meta
|
||||
rules:
|
||||
- alert: HomelabPrometheusTargetDown
|
||||
expr: up == 0
|
||||
for: 5m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "Prometheus target down: {{ $labels.job }} / {{ $labels.instance }}"
|
||||
description: "Scrape target {{ $labels.instance }} (job {{ $labels.job }}) is unreachable. Metrics from this target are silent — alerts built on them will not fire."
|
||||
|
||||
Reference in New Issue
Block a user