Commit Graph

846 Commits

Author SHA1 Message Date
Micha 79657d526c Record alert-chain end-to-end verification in MASTER_TODO
Codex drills proved all six alert paths to the phone (send-ntfy, restore wrapper, freshness negative, Alertmanager->bridge->ntfy, docker-critical-watcher smoke, Borg pre-hook failure). Add a Kurzlog entry, note the send-ntfy exec-bit beleg-bug (6870ae5), and the one optional gap (Prometheus->AM leg via temp rule). Trim oldest entry to stay at max 5.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 19:12:57 +02:00
Micha 6870ae53da Mark send-ntfy.sh executable so restore-failure alerts fire
run-restore-job-with-ntfy.sh execs send-ntfy.sh directly; without the exec bit the failure-alert path errored with Permission denied (found during Codex alert drill 2026-06-23). Set the exec bit in the repo to match the live fix.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 17:25:24 +02:00
Micha 46d6010c66 Record backend_net internal:true after live flip; close audit remediation
backend_net was recreated with --internal (Codex live): egress from postgresql17 blocked, all 12 members reattached, frontends and DB connections verified. Move the parked #17 item to the MASTER_TODO Kurzlog and confirm the live state in NETWORK_INVENTORY. No dawarich_egress needed (sidekiq makes no external connections).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 15:48:59 +02:00
Micha 5a0a4c9d56 Park backend_net internal:true hardening with egress prereq in MASTER_TODO
Capture the audit egress analysis durably so the deferred maintenance window keeps the prep. backend_net -> internal:true is the only remaining P3 item; the single risk is dawarich_sidekiq (the only backend_net-only worker), all DB/cache and dual-homed containers are safe. If sidekiq needs egress, use a dedicated dawarich_egress net (immich_egress precedent).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 15:22:33 +02:00
Micha c4ba67b55c Activate Hetzner snapshot restore test after live validation
Codex first live run passed (SUCCESS, 7 snapshots, single-file restore from .zfs/snapshot; report hetzner-snapshot-2026-06-23.md) with no ENV overrides. Set runbook status to active, document the run, and add the monthly cadence (15th, cron 0 6 15 * *) to schedule.md and the restore-tests README. Remaining host step: create the Unraid User Script restore-hetzner-snapshot-monthly.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 13:15:58 +02:00
Micha 275558b2db Record audit-2026-06-23 remediation status in MASTER_TODO
Add an Aktiv row for the remaining Codex/operator follow-ups (#19 snapshot-test validation, #12 default-bridge recreate, #14 drift redeploy, #15 immich path) and a Kurzlog entry summarizing the closed P1/P2 core (Vault /admin + Komodo de-publicized, snapshots proven, auth-matrix). Trim oldest Kurzlog entry to stay at max 5.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 12:23:59 +02:00
Micha 3e9c12eb75 Add Hetzner Storage Box snapshot restore test
Make the off-site snapshot protection a repeatable, monitored proof (DECISIONS 2026-06-11/-23): a read-only restore-test that lists .zfs/snapshot on the Storage Box, checks retention and newest-snapshot age, and SFTP-fetches one small file from the newest snapshot (size + SHA256). Connection is derived from the borg-ui repo URL and runs via docker exec borg-ui; no secret in the script, no write access. Wired into the run-restore-checks.sh dispatcher; runbook documents the pending one-time live validation.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 12:18:44 +02:00
Micha 813d3bd303 Mirror Komodo IP-allowlist labels and document de-publicization
Codex applied the ipallowlist middleware (Tailnet 100.64.0.0/10 + LAN 192.168.178.0/24) to the Komodo router live in the inline-managed self-stack; public now returns 403. Mirror the labels in ops/komodo/docker-compose.yml for parity (not auto-deployed), record the decision in docs/DECISIONS.md, and update docs/AUTH_MATRIX.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 12:11:45 +02:00
Micha ad47979000 Add consolidated Auth-Matrix doc
Consolidate effective access policy per public domain (Authelia bypass/two_factor, native exceptions, Tailscale-only, IP-allowlist) into a single reviewable matrix, surfacing the Authelia bypass list that previously lived only in the live config. Indexed in docs/README.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 11:03:26 +02:00
Micha 23a6975a67 Restrict Vaultwarden /admin to trusted networks (Tailscale + LAN)
Audit 2026-06-23 (P1): /admin was publicly reachable (200). Add a higher-priority Traefik router scoped to PathPrefix(/admin) with an ipallowlist middleware (Tailnet 100.64.0.0/10 + LAN 192.168.178.0/24); the main router stays native for browser and mobile clients. Documented in docs/DECISIONS.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 11:03:26 +02:00
Micha 81151d8af4 Fix Dawarich Grafana datasource database config 2026-06-22 20:45:09 +02:00
Micha 45ff8286cf Use table-format Dawarich Grafana panels 2026-06-22 20:20:07 +02:00
Micha f318d80477 Simplify Dawarich Grafana dashboard query model 2026-06-22 20:12:14 +02:00
Micha b8d9bba5d3 Replace Dawarich Grafana dashboard 2026-06-22 19:55:47 +02:00
Micha 3bebc03a8f chore(deps): move dawarich redis to v8 track
dawarich_redis was the last redis instance still on 7-alpine; the
closed PR #10 kept it as an "Ignored or Blocked" entry in the Renovate
Dependency Dashboard (issue #6). Bump to the already-running
redis:8.8.0-alpine digest and add apps/dawarich to the renovate redis
8.x allowedVersions pin. Data path /mnt/user/appdata/dawarich/redis
unchanged; redis 8 loads the existing RDB snapshots.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 19:51:36 +02:00
Micha 0f1e78e0ca Fix Dawarich Grafana readonly user init 2026-06-22 19:01:39 +02:00
Micha 658750bc19 Allow Dawarich mobile track point route 2026-06-22 16:55:18 +02:00
Micha 5afba298e9 Allow Dawarich mobile API routes 2026-06-22 16:45:26 +02:00
Micha bd0deea90d Use internal Dawarich metrics scrape 2026-06-21 23:21:21 +02:00
Micha 5a6ab2cc37 Bypass Authelia for Dawarich healthcheck 2026-06-21 23:17:38 +02:00
Micha 30c3435ddf Bypass Authelia for Dawarich metrics 2026-06-21 23:04:58 +02:00
Micha b236eaeeaa Restore Dawarich metrics basic auth config 2026-06-21 23:02:26 +02:00
Micha 4cf9e3226e Use credentials file for Dawarich metrics scrape 2026-06-21 23:00:00 +02:00
Micha 699b1f118e Scrape Dawarich metrics over HTTPS 2026-06-21 22:52:57 +02:00
Micha db886c9eb2 Send forwarded proto for Dawarich metrics 2026-06-21 22:47:51 +02:00
Micha 2a342614db Allow internal Dawarich scrape host 2026-06-21 22:44:48 +02:00
Micha 2bb6eaa267 Run Prometheus as root for file secrets 2026-06-21 22:43:28 +02:00
Micha cb80e2d2c0 Fix Dawarich Prometheus secret permissions 2026-06-21 22:41:46 +02:00
Micha 201b201657 Fix Dawarich healthcheck behind HTTPS 2026-06-21 22:39:15 +02:00
Micha 725e3b0125 Add Dawarich stack 2026-06-21 22:32:41 +02:00
Micha 1de6ffc5ac chore(deps): update nextcloud to v34 2026-06-21 21:29:37 +02:00
Micha 5559aa3f24 chore(deps): update gitea and cadvisor images 2026-06-21 21:20:11 +02:00
Micha ed61fda0ec docs(renovate): Routine-Merge-Runde 2026-06-21 im Betriebsstand festhalten
Sechs Renovate-PRs gemergt (minor-patch-Gruppe, unbound/traefik/postgres
Digest-Refreshes, n8n 2.27, nextcloud-33 Digest); Nextcloud-34-Major bewusst
gehalten.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 19:31:09 +02:00
Micha 85d8270898 Merge Renovate: renovate/docker.n8n.io-n8nio-n8n-2.x 2026-06-21 19:28:53 +02:00
Micha a904955d11 Merge Renovate: renovate/nextcloud-33.0.5-apache 2026-06-21 19:28:53 +02:00
Micha 882ea5ad01 Merge Renovate: renovate/postgres-18.4 2026-06-21 19:28:53 +02:00
Micha e3ae97bbaf Merge Renovate: renovate/traefik-v3.7 2026-06-21 19:28:53 +02:00
Micha 64976e0c0e Merge Renovate: renovate/shaanmajid-unbound-1.25.1 2026-06-21 19:28:53 +02:00
Micha 424772dcfa Merge Renovate: renovate/minor-patch-updates 2026-06-21 19:28:53 +02:00
Micha c8380b5755 docs: H:-Nearline als Restore-Quelle im DR-Fall dokumentieren
Bisher war die H:-Nearline-Kopie nur als Backup-Ziel beschrieben
(CAPACITY_AND_LIFECYCLE), nicht als Restore-Quelle. Im Ernstfall fehlte der
Hinweis, dass auf baerchen eine frische lokale Kopie aller Dumps + Bundles
liegt.

- ops/h-drive-nearline/README.md: neuer Abschnitt "Restore aus H:/ (DR-Fall)"
  mit Inhalt, Rueckspiel-Weg (-> RESTORE_MATRIX / SERVICES_RECOVERY) und
  Pflicht-Frische-Pruefung (deckt den S4U-Stale-Fall ab).
- docs/DISASTER_RECOVERY.md: baerchen-Abschnitt verweist jetzt auf die
  H:-Fallback-Restore-Quelle und das Runbook.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 19:16:09 +02:00
Micha 83d7988c72 nearline: S4U-Fix angewendet und verifiziert (LogonType Interactive)
Task "KalliLab H Drive Nearline Pull" am 2026-06-21 von S4U auf LogonType
Interactive ("Nur ausfuehren, wenn der Benutzer angemeldet ist") umgestellt.
Kein gespeichertes Passwort noetig, da michi der dauerhaft angemeldete
Konsolen-User ist. Per Scheduler ausgeloest, Ergebnis 0x0 verifiziert
(SMB-Zugriff vorhanden, Spiegel frisch).

Doku korrigiert: README beschreibt jetzt Interactive als angewendete Loesung
(Password war nur die nicht genutzte Alternative). MASTER_TODO: Root-Cause
behoben, nur noch optionale Healthchecks-URL offen.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 19:06:55 +02:00
Micha 1a4593110a nearline: S4U-Root-Cause dokumentiert + Exitcode-Leak gefixt
Diagnose 2026-06-21: Der Scheduled Task "KalliLab H Drive Nearline Pull"
lief als LogonType S4U (ohne gespeichertes Passwort) und hatte damit keine
Netzwerk-Anmeldeinformationen fuer den SMB-Share \192.168.178.58\backups.
Jeder geplante 05:30-Lauf brach still mit Exit 1 ab, ohne Report; der
Nearline-Spiegel war 2026-06-19 bis 2026-06-21 veraltet. Manuell nachgezogen,
Spiegel wieder frisch.

pull-critical-backups.ps1: explizites `exit 0` auf dem Erfolgspfad, damit der
letzte robocopy-Exitcode (1 = "Dateien kopiert") nicht als Prozess-Exit leakt
und der Scheduled Task ein wahrheitsgemaesses Ergebnis meldet.

README: Pflicht-Hinweis, dass der Task mit gespeichertem Passwort (nicht S4U)
laufen muss. MASTER_TODO: Root-Cause + verbleibender Operator-Schritt.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 18:02:58 +02:00
Micha f296338530 monitoring + backup: Stale-Handle-Hardening und Dead-Man's-Switch
Schliesst den lokalen Code-Stand fuer zwei offene MASTER_TODO-Punkte ab.

monitoring: restliche Einzeldatei-Bind-Mounts (alertmanager, blackbox,
loki, promtail, alertmanager-ntfy-bridge) auf Directory-Mounts umgestellt,
analog zum Prometheus-Fix vom 2026-06-19. Vermeidet "Stale NFS file handle"
auf dem /mnt/user-FUSE-Share bei git/Komodo-Updates. grafana-provisioning
war bereits Directory-Mount. `docker compose config` gruen. Beim Deploy
--force-recreate noetig, da sich Mount-Zielpfade aendern.

backup: endpoint-agnostischer Dead-Man's-Switch (Healthchecks-kompatibel,
Cloud oder self-hosted) in pull-critical-backups.ps1 und pre-borg.sh.
Pings /start, Erfolg und /fail; No-Op ohne konfigurierte URL, bricht also
keinen Lauf. Ping-URLs sind Capability-URLs und bleiben als Secret
ausserhalb des Repos.

Doku: SECRETS_MAP, Nearline-README und MASTER_TODO nachgezogen.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 17:54:53 +02:00
renovate df2e308c65 chore(deps): update shaanmajid/unbound:1.25.1 docker digest to 6fa3d52 2026-06-21 10:20:57 +00:00
renovate 3861eaa0d1 chore(deps): update minor-and-patch-updates 2026-06-20 22:20:40 +00:00
Micha 7ff6a24c9d weather day-report: No-data an trockenen Tagen abfangen
- Bewertungs-Banner: LEFT JOIN von der immer vorhandenen Temperatur-Reihe statt
  CROSS JOIN; leeres gw3000a_daily_rain killt die Zeile nicht mehr. Ergebnis als
  numerischer Code 0-4 mit Value-Mapping auf Text+Farbe (robust gegen Strings).
- Regen-Karte: noValue "0 mm" statt "No data", wenn keine Regen-Samples vorliegen.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-20 11:45:28 +02:00
Micha ac1fa5b8e9 weather day-report: Layout im Stil des Wetterarchivs (Gauges + Charts)
- 6 runde Gauges wie im Wetterarchiv (Temp max/min, Luftfeuchte Ø, Boee max,
  UV max, Solar max) mit denselben continuous-Farbverlaeufen
- Strip aus 3 Stat-Karten (Gefuehlt max, Regen, Luftdruck Ø) mit Sparkline
- Bewertungs-Banner und 2 Tagescharts (Temperatur, Solar+UV)
Loest den schwer lesbaren einzeiligen Markdown-Block endgueltig ab.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-20 09:38:43 +02:00
Micha f0735265eb monitoring/README: Tagesbericht-Beschreibung an visuelles Layout angepasst
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-20 09:35:17 +02:00
Micha d99082a3a7 weather day-report: visuelles Report-Layout statt Markdown-Textblock
Die markdown-html-Tabellenzelle rendert in Grafana 13 als Klartext (eine
ueberlaufende Zeile). Ersetzt durch native Panels:
- farbcodiertes Bewertungs-Banner (stat, background-color per Mapping)
- 8 Kennzahl-Karten mit Mini-Sparkline (T min/max, Regen, UV, Boee,
  Luftfeuchte, Luftdruck, Solar) inkl. Thresholds in Blau/Cyan/Amber/Gruen
- 2 Tagescharts: Temperatur (Aussen/Gefuehlt/Taupunkt) und Solar+UV
Gleiche $__timeFilter-Queries wie das Wetterarchiv-Dashboard.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-20 09:34:56 +02:00
Micha 536a6fd0cd monitoring: Wetter-Tagesberichte in Grafana auffindbar machen
- weather-report-history.json (ha-weather-report-history): Finder-Tabelle,
  eine Zeile pro Tag (Datum, Kurzbewertung, T min/max/Mittel, Regen, UV, Boee)
  mit Drilldown-Data-Link aufs Tagesbericht-Dashboard
- weather-day-report.json: Zeitzone Europe/Berlin, Info-Panel zur Tagesauswahl,
  Nav-Dropdown zu den Wetter-Dashboards
- monitoring/README: Abschnitt Wetter-Tagesberichte (finden, Datum waehlen,
  Quelle InfluxDB-SQL statt Markdown-Index, Deploy, Explore-Test)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-20 09:27:26 +02:00