homelab-infra

Author	SHA1	Message	Date
Micha	4cf9e3226e	Use credentials file for Dawarich metrics scrape	2026-06-21 23:00:00 +02:00
Micha	2bb6eaa267	Run Prometheus as root for file secrets	2026-06-21 22:43:28 +02:00
Micha	cb80e2d2c0	Fix Dawarich Prometheus secret permissions	2026-06-21 22:41:46 +02:00
Micha	725e3b0125	Add Dawarich stack	2026-06-21 22:32:41 +02:00
Micha	5559aa3f24	chore(deps): update gitea and cadvisor images	2026-06-21 21:20:11 +02:00
Micha	424772dcfa	Merge Renovate: renovate/minor-patch-updates	2026-06-21 19:28:53 +02:00
Micha	f296338530	monitoring + backup: Stale-Handle-Hardening und Dead-Man's-Switch Schliesst den lokalen Code-Stand fuer zwei offene MASTER_TODO-Punkte ab. monitoring: restliche Einzeldatei-Bind-Mounts (alertmanager, blackbox, loki, promtail, alertmanager-ntfy-bridge) auf Directory-Mounts umgestellt, analog zum Prometheus-Fix vom 2026-06-19. Vermeidet "Stale NFS file handle" auf dem /mnt/user-FUSE-Share bei git/Komodo-Updates. grafana-provisioning war bereits Directory-Mount. `docker compose config` gruen. Beim Deploy --force-recreate noetig, da sich Mount-Zielpfade aendern. backup: endpoint-agnostischer Dead-Man's-Switch (Healthchecks-kompatibel, Cloud oder self-hosted) in pull-critical-backups.ps1 und pre-borg.sh. Pings /start, Erfolg und /fail; No-Op ohne konfigurierte URL, bricht also keinen Lauf. Ping-URLs sind Capability-URLs und bleiben als Secret ausserhalb des Repos. Doku: SECRETS_MAP, Nearline-README und MASTER_TODO nachgezogen. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-21 17:54:53 +02:00
renovate	3861eaa0d1	chore(deps): update minor-and-patch-updates	2026-06-20 22:20:40 +00:00
Micha	80385c4560	monitoring: Prometheus-Config als Verzeichnis-Mount (FUSE-Stale-Handle-Fix) Einzeldatei-Bind-Mounts von alerts.yml/prometheus.yml brechen auf dem Unraid-FUSE-Share bei git/Komodo-Updates zu "Stale NFS file handle" (Inode-Wechsel) -> Config-Reload laedt 0 Regeln, nur --force-recreate heilt. Umgestellt auf stabilen Directory-Mount ./prometheus:/etc/prometheus/config:ro plus angepasste --config.file und rule_files. Kuenftig reicht ein Reload. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-19 10:20:35 +02:00
Micha	fc9e4aad8e	fix: raise influxdb3 query-file-limit (weather panels no data) InfluxDB 3 Core kompaktiert nicht; haeufige HA-Writes liessen "°C"/"%"/"hPa" ins 432-Dateien-Query-Limit laufen -> No data in Grafana. --query-file-limit auf 20000 angehoben (Stopgap; langfristig Enterprise-Compaction oder weniger Writes). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-16 22:19:24 +02:00
renovate	9847baf327	chore(deps): update minor-and-patch-updates	2026-06-10 14:32:08 +00:00
Micha	ce747f687f	ops-report: cert-dedup, blackbox-DNS auf AdGuard, neue Noise-Patterns Behebt drei Befunde aus dem Operations-Report 2026-06-10: - daily-status-report.sh: Zertifikate werden vor der Auswertung pro Domain-Set dedupliziert; nur das laengstlaufende Cert zaehlt. Traefik haelt waehrend der Erneuerung altes + neues Cert in acme.json, was bisher eine falsche KRITISCH-Warnung (traefik.kaleschke.info 5 Tage) ausloeste, obwohl das neue Cert 65 Tage Restlaufzeit hat. - monitoring/blackbox-exporter: DNS von 1.1.1.1/8.8.8.8 auf AdGuard (172.23.0.3 via dns_net) umgestellt. Externe Resolver lieferten die WAN-IP, was Hairpin-NAT-Timeouts (9,5s) bei Probes von cloud/glances verursachte (662 Fehler/Tag). - log-noise.patterns: Fritz!Box-SOA-Fehler (AdGuard, RFC-1035-Verstoss) und fehlendes grafana-amazonprometheus-datasource-Plugin als bekanntes Rauschen klassifiziert (~1800 Zeilen/Tag). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-10 10:06:52 +02:00
Micha	30f076c85a	monitoring/grafana: OIDC-SSO via Authelia (Stufe-1-Proof) - generic_oauth gegen Authelia (client_id grafana, PKCE, client_secret via __FILE aus /mnt/user/appdata/secrets/grafana_oidc_client_secret) - Traefik-Middleware authelia@file entfernt -> OIDC ist jetzt die Auth; lokaler Grafana-Admin bleibt Fallback - Authelia-Client wurde host-seitig angelegt (Secret nur als Host-Datei + Hash in Authelia-Config) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-06 13:11:00 +02:00
Micha	e835dfd6ed	fix: let grafana read host secrets	2026-05-31 21:33:09 +02:00
Micha	6e928b6944	chore: harden grafana 13 provisioning	2026-05-31 21:31:58 +02:00
Micha	60015c1e2c	chore: upgrade grafana to 13	2026-05-31 21:28:59 +02:00
renovate	90ef6374a5	chore(deps): update minor-and-patch-updates	2026-05-31 10:20:19 +00:00
Micha	1a4929f9ef	Pin monitoring stack images by digest Reads live RepoDigests of each running monitoring container and freezes the compose to the exact image manifest. Brings the monitoring stack to the same digest-pin discipline as the stateful tier-1 services. influxdb3-core was already pinned. Affected: prometheus, alertmanager, alertmanager-ntfy-bridge, blackbox-exporter, loki, promtail, grafana, node-exporter, cadvisor (plus a second python:3.13-alpine for the bootstrap dashboard importer). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-29 15:23:03 +02:00
Micha	8e111d1e04	Prepare monitoring alert rules	2026-05-27 06:38:57 +02:00
Micha	db7dc3f2af	Add ntfy alert delivery for monitoring	2026-05-17 11:34:19 +02:00
Micha	c748236886	Prune monitoring dashboard imports	2026-05-17 11:30:00 +02:00
Micha	8aa850df40	Set Grafana DNS resolvers	2026-05-17 11:26:27 +02:00
Micha	b7050812d4	Fix blackbox DNS resolution	2026-05-17 11:24:20 +02:00
Micha	c95fa601f0	Add monitoring replacement baseline	2026-05-17 11:22:38 +02:00
Micha	0c308ff352	Preserve InfluxDB data in monitoring stack	2026-05-17 10:47:57 +02:00
Micha	53216e50c1	Fix monitoring InfluxDB volume permissions	2026-05-17 10:45:32 +02:00
Micha	b7dfdad621	Consolidate monitoring target stack	2026-05-17 10:41:29 +02:00
Micha	61625a7a1c	ops: keep monitoring importer running for komodo	2026-05-16 22:39:09 +02:00
Micha	6e28ea94d2	ops: wire monitoring stack to traefik metrics	2026-05-16 22:10:43 +02:00
Micha	58eb53a6a8	ops: add monitoring compose stack	2026-05-16 21:59:20 +02:00

30 Commits