Fix operations report warnings
This commit is contained in:
+1
-1
@@ -71,11 +71,11 @@ Bewusst nicht jetzt - Begruendungen in `docs/DECISIONS.md`, hier nur Thema und T
|
||||
|
||||
## Zuletzt erledigt (Kurzlog, max. 5 Eintraege)
|
||||
|
||||
- **2026-06-17** Komodo/Gitea-Webhooks normalisiert: aktive Komodo-Hooks fuer `Micha/homelab-infra` nutzen Branch-Filter `master`; DB-Backup vor Host-Hotfix erstellt. Workflow-Regel nachgezogen.
|
||||
- **2026-06-13** Home Assistant MQTT-Integration produktiv verbunden: Config-Entry `smarthome-mosquitto` ist `loaded`, Mosquitto sieht den HA-Client `homeassistant`; `check_config` gruen.
|
||||
- **2026-06-13** HA Energy Dashboard konfiguriert: Netz, PV und Speicher aus SolarEdge Local gesetzt, `energy/validate` ohne Issues; HA-Backup danach erzeugt.
|
||||
- **2026-06-13** SolarEdge lokal angebunden: `solaredge_modbus_multi` v3.2.5 ueber `192.168.178.111:1502`, Device-ID `1`; 68 Entitaeten inkl. Inverter, Smart Meter und Batterie; HA-Backup danach erzeugt.
|
||||
- **2026-06-13** Home Assistant Restore-Probe erfolgreich: isolierter Test aus HA-native Backup + Mosquitto-Appdata + Fachrepo-Clone, HA HTTP/API/check_config gruen, MQTT Publish/Subscribe und retained Topic nach Broker-Restart gruen. Report: `/mnt/user/backups/restore-reports/homeassistant-2026-06-13.md`.
|
||||
- **2026-06-13** Home Assistant Foundation live: `smart-home` in Komodo angelegt, Gitea-Webhook aktiv, Authelia-Onboarding-Guard entfernt, HA-native Auth + Login-Ban aktiv, HA-Backup erzeugt/geprueft und MQTT-Broker-Smoke erfolgreich.
|
||||
|
||||
---
|
||||
|
||||
|
||||
+9
-3
@@ -124,14 +124,20 @@ Pflichtschritte beim Anlegen:
|
||||
1. Stack in Komodo aus Gitea anlegen
|
||||
2. `webhook_enabled` in Komodo aktivieren
|
||||
3. passenden Gitea-Webhook fuer die aktuelle Stack-ID anlegen
|
||||
4. Gitea-Hook gegen `http://komodo-core:9120/listener/github/stack/<stack-id>/deploy` pruefen
|
||||
5. einen Push oder Test-Delivery ausloesen und `last_status`/Komodo-Deploy pruefen
|
||||
6. Ausnahmen explizit dokumentieren
|
||||
4. Branch-Filter im Gitea-Hook auf den produktiven Branch setzen, aktuell `master`
|
||||
5. Gitea-Hook gegen `http://komodo-core:9120/listener/github/stack/<stack-id>/deploy` pruefen
|
||||
6. einen Push oder Test-Delivery ausloesen und `last_status`/Komodo-Deploy pruefen
|
||||
7. Ausnahmen explizit dokumentieren
|
||||
|
||||
**Regel:** Kein neuer produktiver GitOps-Stack ohne funktionierenden Gitea->Komodo-Webhook. Bewusste Ausnahmen muessen im selben Aenderungsblock dokumentiert werden, inklusive Grund und Alternativ-Deploy-Weg.
|
||||
|
||||
Der Standardfall nutzt den globalen `KOMODO_WEBHOOK_SECRET` aus der Komodo-Host-`.env`, ausser Komodo zeigt fuer den Stack explizit ein eigenes per-Stack-Secret.
|
||||
|
||||
Der Gitea-Branch-Filter darf nicht leer oder `*` bleiben, solange der Komodo-Stack
|
||||
einen konkreten Repo-Branch erwartet. Sonst triggern Feature-/Arbeitsbranches alle
|
||||
Stack-Listener, Komodo verwirft sie mit `request branch does not match expected`
|
||||
und der Operations-Report bekommt unnuetzes Komodo-/Traefik-Rauschen.
|
||||
|
||||
### Ausnahme: Komodo-Zugangsmodell
|
||||
|
||||
Komodo bleibt **bewusst** ohne zentrale Traefik-ForwardAuth-Middleware.
|
||||
|
||||
@@ -13,6 +13,7 @@ CERT_MAX_ROWS="${CERT_MAX_ROWS:-12}"
|
||||
IMAGE_AGE_WARN_DAYS="${IMAGE_AGE_WARN_DAYS:-180}"
|
||||
IMAGE_AGE_ALLOW_FILE="${IMAGE_AGE_ALLOW_FILE:-/mnt/user/services/homelab-infra/services/posture-check/image-age-allow.patterns}"
|
||||
LOG_VOLUME_TOP_N="${LOG_VOLUME_TOP_N:-10}"
|
||||
LOG_VOLUME_OBSERVE_THRESHOLD="${LOG_VOLUME_OBSERVE_THRESHOLD:-100000}"
|
||||
DISK_USAGE_WARN_PCT="${DISK_USAGE_WARN_PCT:-85}"
|
||||
CERT_WARN_DAYS="${CERT_WARN_DAYS:-21}"
|
||||
BACKUP_DRIFT_FACTOR="${BACKUP_DRIFT_FACTOR:-2.0}"
|
||||
@@ -217,6 +218,73 @@ derive_report_status() {
|
||||
set_summary "report_status" "$REPORT_STATUS"
|
||||
}
|
||||
|
||||
print_status_reasons() {
|
||||
local count=0
|
||||
|
||||
add_reason() {
|
||||
printf '%s\n' "- $1"
|
||||
count=$((count + 1))
|
||||
}
|
||||
|
||||
[ "${borg_status:-unknown}" != "completed" ] && add_reason "Borg Backup ist \`${borg_status:-unknown}\` statt \`completed\`."
|
||||
[ "${prometheus_alerts:-0}" = "unknown" ] && add_reason "Prometheus Alerts konnten nicht sicher gelesen werden."
|
||||
[ "${cert_warnings:-0}" != "0" ] && add_reason "Zertifikatswarnungen: \`${cert_warnings:-0}\`."
|
||||
[ "${disk_warnings:-0}" != "0" ] && add_reason "Storage-Warnungen: \`${disk_warnings:-0}\`."
|
||||
if [ "${image_warnings:-0}" != "0" ]; then
|
||||
if [ -n "${image_warning_names:-}" ]; then
|
||||
add_reason "Image-Warnungen: \`${image_warnings:-0}\` (${image_warning_names})."
|
||||
else
|
||||
add_reason "Image-Warnungen: \`${image_warnings:-0}\`."
|
||||
fi
|
||||
fi
|
||||
[ "${containers_exited_nonzero:-0}" != "0" ] && add_reason "Container exited non-zero: \`${containers_exited_nonzero:-0}\`."
|
||||
[ "${host_recent_boot:-0}" = "1" ] && add_reason "Host-Reboot innerhalb der letzten 24 Stunden."
|
||||
[ "${backup_duration_drift:-0}" = "1" ] && add_reason "Backup-Dauer-Drift erkannt."
|
||||
[ "${noise_threshold_exceeded:-0}" != "0" ] && add_reason "Noise-Pattern ueber Eskalations-Schwelle: \`${noise_threshold_exceeded:-0}\`."
|
||||
|
||||
if [ "${prometheus_alerts_pending:-0}" != "0" ] && [ "${prometheus_alerts_pending:-0}" != "unknown" ]; then
|
||||
add_reason "Prometheus pending Alerts: \`${prometheus_alerts_pending:-0}\`."
|
||||
fi
|
||||
if [ "${prometheus_alerts_firing:-0}" != "0" ] && [ "${prometheus_alerts_firing:-0}" != "unknown" ]; then
|
||||
add_reason "Prometheus firing Alerts: \`${prometheus_alerts_firing:-0}\`."
|
||||
fi
|
||||
[ "${containers_unhealthy:-0}" != "0" ] && add_reason "Unhealthy Container: \`${containers_unhealthy:-0}\`."
|
||||
|
||||
if [ "$count" -eq 0 ]; then
|
||||
printf '%s\n' "- Keine direkten Ampel-Ausloeser im Summary-Set gefunden."
|
||||
fi
|
||||
}
|
||||
|
||||
print_notable_observations() {
|
||||
local count=0
|
||||
|
||||
add_observation() {
|
||||
printf '%s\n' "- $1"
|
||||
count=$((count + 1))
|
||||
}
|
||||
|
||||
if [ "${traefik_5xx:-0}" != "0" ] && [ "${traefik_5xx:-0}" != "unknown" ]; then
|
||||
if [ -n "${traefik_5xx_top:-}" ] && [ "${traefik_5xx_top:-none}" != "none" ]; then
|
||||
add_observation "Traefik 5xx: \`${traefik_5xx:-0}\` (Top-Gruppe: \`${traefik_5xx_top}\`)."
|
||||
else
|
||||
add_observation "Traefik 5xx: \`${traefik_5xx:-0}\`."
|
||||
fi
|
||||
fi
|
||||
if [ "${log_highlights:-0}" != "0" ] && [ "${log_highlights:-0}" != "unknown" ]; then
|
||||
add_observation "Log-Highlights: \`${log_highlights:-0}\` handlungsrelevante Treffer; Beispiele stehen in der Log-Auswertung."
|
||||
fi
|
||||
if printf '%s' "${log_volume_total:-0}" | grep -Eq '^[0-9]+$' && [ "${log_volume_total:-0}" -ge "$LOG_VOLUME_OBSERVE_THRESHOLD" ]; then
|
||||
add_observation "Log-Volumen: \`${log_volume_total:-0}\` Zeilen im Zeitraum; Top-Verursacher stehen im Log-Volumen-Abschnitt."
|
||||
fi
|
||||
if [ "${docker_events:-0}" != "0" ] && [ "${docker_events:-0}" != "unknown" ]; then
|
||||
add_observation "Docker Critical Events: \`${docker_events:-0}\`."
|
||||
fi
|
||||
|
||||
if [ "$count" -eq 0 ]; then
|
||||
printf '%s\n' "- Keine zusaetzlichen auffaelligen Beobachtungen im Management-Summary."
|
||||
fi
|
||||
}
|
||||
|
||||
collect_borg() {
|
||||
append "## Borg Backup"
|
||||
append ""
|
||||
@@ -584,6 +652,7 @@ collect_image_freshness() {
|
||||
local image_file="$TMP_DIR/images.tsv"
|
||||
local image_warnings=0
|
||||
local image_allowed=0
|
||||
local image_warning_names=""
|
||||
local now_epoch
|
||||
: > "$image_file"
|
||||
now_epoch="$(date +%s)"
|
||||
@@ -630,6 +699,7 @@ collect_image_freshness() {
|
||||
else
|
||||
note="ueberaltert"
|
||||
image_warnings=$((image_warnings + 1))
|
||||
image_warning_names="${image_warning_names:+$image_warning_names,}$name:${age_days}d"
|
||||
fi
|
||||
fi
|
||||
printf '%d\t%s\t%s\t%s\n' "$age_days" "$name" "$image_tag" "$note" >> "$image_file"
|
||||
@@ -637,6 +707,7 @@ collect_image_freshness() {
|
||||
|
||||
set_summary "image_warnings" "$image_warnings"
|
||||
set_summary "image_allowed" "$image_allowed"
|
||||
set_summary "image_warning_names" "$image_warning_names"
|
||||
|
||||
if [ ! -s "$image_file" ]; then
|
||||
append "- Keine Image-Daten verfuegbar."
|
||||
@@ -781,8 +852,16 @@ collect_traefik_5xx() {
|
||||
set_summary "traefik_5xx" "$count"
|
||||
|
||||
if [ "$count" -eq 0 ]; then
|
||||
set_summary "traefik_5xx_top" "none"
|
||||
append "- Keine 5xx-Antworten."
|
||||
else
|
||||
local top_group
|
||||
top_group="$(awk '{ code=$9; service=$12; gsub(/"/, "", service); counts[service " " code]++ } END { for (k in counts) print counts[k], k }' "$file" \
|
||||
| sort -nr \
|
||||
| head -n 1 \
|
||||
| awk '{ print $2 ":" $3 ":" $1 }' \
|
||||
| sed -E 's#[^A-Za-z0-9_.:@/-]+#_#g')"
|
||||
set_summary "traefik_5xx_top" "${top_group:-none}"
|
||||
append "- 5xx-Antworten: $count"
|
||||
append ""
|
||||
append "### Gruppiert nach Service/Code"
|
||||
@@ -1181,10 +1260,20 @@ write_report() {
|
||||
if [ "$REPORT_STATUS" = "OK" ]; then
|
||||
printf 'Im betrachteten Zeitraum zeigt das Homelab eine stabile Betriebslage. Das letzte Borg-Backup ist erfolgreich abgeschlossen, Prometheus meldet keine firing Alerts, keine unhealthy Container, Zertifikate und Storage im erwarteten Bereich.\n\n'
|
||||
elif [ "$REPORT_STATUS" = "WARNUNG" ]; then
|
||||
printf 'Im betrachteten Zeitraum gibt es Punkte, die Aufmerksamkeit verdienen. Der Betrieb ist nicht automatisch als kompromittiert zu bewerten, aber mindestens ein Signal (Backup, Pending Alert, Zertifikat, Storage, Image-Alter, Drift oder Reboot) weicht vom Normalzustand ab.\n\n'
|
||||
printf 'Im betrachteten Zeitraum gibt es Punkte, die Aufmerksamkeit verdienen. Der Betrieb ist nicht automatisch als kompromittiert zu bewerten; die konkreten Ampel-Ausloeser stehen direkt darunter.\n\n'
|
||||
else
|
||||
printf 'Im betrachteten Zeitraum liegt ein kritisches Betriebssignal vor. Der Bericht sollte zeitnah gelesen und die betroffenen Komponenten priorisiert geprueft werden.\n\n'
|
||||
fi
|
||||
printf '### Warum dieser Status?\n\n'
|
||||
if [ "$REPORT_STATUS" = "OK" ]; then
|
||||
printf '%s\n\n' "- Keine Ampel-Ausloeser im Summary-Set."
|
||||
else
|
||||
print_status_reasons
|
||||
printf '\n'
|
||||
fi
|
||||
printf '### Weitere auffaellige Beobachtungen\n\n'
|
||||
print_notable_observations
|
||||
printf '\n'
|
||||
printf '### Management-Bewertung\n\n'
|
||||
printf '%s\n' "- Status: \`$REPORT_STATUS\`"
|
||||
printf '%s\n' "- Borg Backup: \`${borg_status:-unknown}\`"
|
||||
|
||||
@@ -28,3 +28,9 @@ immich_postgres 2026-09-10
|
||||
# (Dez 2025). Das Image-Alter ist nur Build-Alter, keine veraltete Version.
|
||||
# Re-check: ob eine blackbox_exporter-Version > v0.28.0 erschienen ist.
|
||||
monitoring-blackbox-exporter 2026-09-10
|
||||
|
||||
# glance-docker-socket-proxy: v0.4.2 ist am 2026-06-17 weiterhin der neueste
|
||||
# stabile Tag / latest. Neuere Tags sind nur master/nightly und werden fuer den
|
||||
# lesenden Glance-Socket-Proxy bewusst nicht produktiv eingesetzt.
|
||||
# Re-check: ob ein stabiler Tag > v0.4.2 erschienen ist.
|
||||
glance-docker-socket-proxy 2026-09-17
|
||||
|
||||
@@ -87,3 +87,11 @@ adguard.*bad question section.*only 1 question allowed
|
||||
# this lookup is harmless and does not affect any dashboard.
|
||||
# Re-check: only if Amazon Prometheus is added as a datasource.
|
||||
monitoring-grafana.*grafana-amazonprometheus-datasource not found
|
||||
|
||||
# cAdvisor stale container filesystem stats on Unraid.
|
||||
# Why: cAdvisor can keep reporting an already removed Docker container path in
|
||||
# fsHandler even though the container and path no longer exist. This is a
|
||||
# collector bookkeeping issue, not a failed workload or missing data path.
|
||||
# Re-check: if the message references an existing/running container, if
|
||||
# Prometheus target health fails, or if broader cAdvisor errors appear.
|
||||
monitoring-cadvisor.*failed to collect filesystem stats.*var/lib/docker/containers/[0-9a-f]{64}
|
||||
|
||||
@@ -431,24 +431,24 @@ def render_summary_grid(entries):
|
||||
status = classify(label, value)
|
||||
theme = STATUS_THEMES.get(status, STATUS_THEMES["UNKNOWN"])
|
||||
cards.append(
|
||||
'<td style="padding:6px;width:33.33%;vertical-align:top">'
|
||||
'<td style="padding:6px;width:50%;vertical-align:top">'
|
||||
f'<div style="background:{theme["card_bg"]};'
|
||||
f'border:1px solid {theme["card_border"]};'
|
||||
'border-radius:8px;padding:12px 14px">'
|
||||
'border-radius:8px;padding:11px 12px;min-height:74px">'
|
||||
f'<div style="font-size:11px;color:#1e293b;'
|
||||
'text-transform:uppercase;letter-spacing:0.08em;font-weight:700;'
|
||||
f'line-height:1.3;opacity:0.78">{html.escape(label)}</div>'
|
||||
f'<div style="font-size:17px;font-weight:700;'
|
||||
'text-transform:uppercase;letter-spacing:0.04em;font-weight:700;'
|
||||
f'line-height:1.35;opacity:0.78;overflow-wrap:anywhere">{html.escape(label)}</div>'
|
||||
f'<div style="font-size:16px;font-weight:700;'
|
||||
f'color:{theme["card_text"]};margin-top:5px;line-height:1.25;'
|
||||
f'word-break:break-word;font-variant-numeric:tabular-nums">'
|
||||
f'word-break:normal;overflow-wrap:anywhere;font-variant-numeric:tabular-nums">'
|
||||
f'{html.escape(value)}</div>'
|
||||
'</div></td>'
|
||||
)
|
||||
rows_html = []
|
||||
for chunk_start in range(0, len(cards), 3):
|
||||
chunk = cards[chunk_start:chunk_start + 3]
|
||||
while len(chunk) < 3:
|
||||
chunk.append('<td style="padding:6px;width:33.33%"></td>')
|
||||
for chunk_start in range(0, len(cards), 2):
|
||||
chunk = cards[chunk_start:chunk_start + 2]
|
||||
while len(chunk) < 2:
|
||||
chunk.append('<td style="padding:6px;width:50%"></td>')
|
||||
rows_html.append("<tr>" + "".join(chunk) + "</tr>")
|
||||
return (
|
||||
'<table role="presentation" cellpadding="0" cellspacing="0" border="0" width="100%" '
|
||||
|
||||
Reference in New Issue
Block a user