Commit Graph

301 Commits

Author SHA1 Message Date
Micha 7d4d5f901a Add Renovate GitHub token support 2026-06-23 20:42:19 +02:00
Micha ad8010767d Healthchecks: Gitea->Komodo webhook active, mark webhook step done
Webhook authenticated and triggered a successful DeployStack (komodo-core
log 18:39:00). Only remaining step is wiring internal jobs as checks.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 20:40:42 +02:00
Micha 02389ed292 Healthchecks: mark live after API deploy; document host secret files
Stack deployed to Komodo (id 6a3acf2ca7867a4fbab9bfc1), both containers
healthy, Traefik route + LE cert OK, DNS resolves, superuser created and
auth-verified. Flip status to live in ARCHITECTURE 7.6, SERVICE_CATALOG,
MASTER_TODO and the stack README. Document the new host secret files
(secret_key, superuser_password = login password, webhook_secret) in
SECRETS_MAP. Remaining operator step: the Gitea->Komodo webhook.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 20:31:10 +02:00
Micha cbfbb8ca4f Add self-hosted Healthchecks stack for internal job monitoring (hybrid)
Self-hosted Healthchecks (ops/healthchecks/) as the hub for internal
cron/job heartbeats. The three host-down/backup watchdogs (Borg pre-hook,
baerchen nearline pull, monitoring watchdog #8) deliberately stay on
healthchecks.io cloud, since an on-host watcher cannot report a host outage.

- frontend_net + dedicated PostgreSQL 18 in healthchecks_internal
- native Healthchecks auth; ping/API exempt from Authelia (n8n/Komodo pattern)
- registered as middleware_exempt in ops/policy-checks/exceptions.json
- docs: DECISIONS, ARCHITECTURE (3.1/4.2/7.6/10), SERVICE_CATALOG,
  SECRETS_MAP, MASTER_TODO, README index

docker compose config validated (exit 0). Not yet deployed: host secret file,
appdata dir, Komodo stack + ENV and Gitea webhook remain operator steps.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 20:09:56 +02:00
Micha ee0d450a27 docs: close audit remediation status 2026-06-23 19:50:48 +02:00
Micha 79657d526c Record alert-chain end-to-end verification in MASTER_TODO
Codex drills proved all six alert paths to the phone (send-ntfy, restore wrapper, freshness negative, Alertmanager->bridge->ntfy, docker-critical-watcher smoke, Borg pre-hook failure). Add a Kurzlog entry, note the send-ntfy exec-bit beleg-bug (6870ae5), and the one optional gap (Prometheus->AM leg via temp rule). Trim oldest entry to stay at max 5.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 19:12:57 +02:00
Micha 46d6010c66 Record backend_net internal:true after live flip; close audit remediation
backend_net was recreated with --internal (Codex live): egress from postgresql17 blocked, all 12 members reattached, frontends and DB connections verified. Move the parked #17 item to the MASTER_TODO Kurzlog and confirm the live state in NETWORK_INVENTORY. No dawarich_egress needed (sidekiq makes no external connections).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 15:48:59 +02:00
Micha 5a0a4c9d56 Park backend_net internal:true hardening with egress prereq in MASTER_TODO
Capture the audit egress analysis durably so the deferred maintenance window keeps the prep. backend_net -> internal:true is the only remaining P3 item; the single risk is dawarich_sidekiq (the only backend_net-only worker), all DB/cache and dual-homed containers are safe. If sidekiq needs egress, use a dedicated dawarich_egress net (immich_egress precedent).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 15:22:33 +02:00
Micha 275558b2db Record audit-2026-06-23 remediation status in MASTER_TODO
Add an Aktiv row for the remaining Codex/operator follow-ups (#19 snapshot-test validation, #12 default-bridge recreate, #14 drift redeploy, #15 immich path) and a Kurzlog entry summarizing the closed P1/P2 core (Vault /admin + Komodo de-publicized, snapshots proven, auth-matrix). Trim oldest Kurzlog entry to stay at max 5.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 12:23:59 +02:00
Micha 813d3bd303 Mirror Komodo IP-allowlist labels and document de-publicization
Codex applied the ipallowlist middleware (Tailnet 100.64.0.0/10 + LAN 192.168.178.0/24) to the Komodo router live in the inline-managed self-stack; public now returns 403. Mirror the labels in ops/komodo/docker-compose.yml for parity (not auto-deployed), record the decision in docs/DECISIONS.md, and update docs/AUTH_MATRIX.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 12:11:45 +02:00
Micha ad47979000 Add consolidated Auth-Matrix doc
Consolidate effective access policy per public domain (Authelia bypass/two_factor, native exceptions, Tailscale-only, IP-allowlist) into a single reviewable matrix, surfacing the Authelia bypass list that previously lived only in the live config. Indexed in docs/README.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 11:03:26 +02:00
Micha 23a6975a67 Restrict Vaultwarden /admin to trusted networks (Tailscale + LAN)
Audit 2026-06-23 (P1): /admin was publicly reachable (200). Add a higher-priority Traefik router scoped to PathPrefix(/admin) with an ipallowlist middleware (Tailnet 100.64.0.0/10 + LAN 192.168.178.0/24); the main router stays native for browser and mobile clients. Documented in docs/DECISIONS.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 11:03:26 +02:00
Micha 3bebc03a8f chore(deps): move dawarich redis to v8 track
dawarich_redis was the last redis instance still on 7-alpine; the
closed PR #10 kept it as an "Ignored or Blocked" entry in the Renovate
Dependency Dashboard (issue #6). Bump to the already-running
redis:8.8.0-alpine digest and add apps/dawarich to the renovate redis
8.x allowedVersions pin. Data path /mnt/user/appdata/dawarich/redis
unchanged; redis 8 loads the existing RDB snapshots.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 19:51:36 +02:00
Micha bd0deea90d Use internal Dawarich metrics scrape 2026-06-21 23:21:21 +02:00
Micha b236eaeeaa Restore Dawarich metrics basic auth config 2026-06-21 23:02:26 +02:00
Micha 4cf9e3226e Use credentials file for Dawarich metrics scrape 2026-06-21 23:00:00 +02:00
Micha 725e3b0125 Add Dawarich stack 2026-06-21 22:32:41 +02:00
Micha 1de6ffc5ac chore(deps): update nextcloud to v34 2026-06-21 21:29:37 +02:00
Micha ed61fda0ec docs(renovate): Routine-Merge-Runde 2026-06-21 im Betriebsstand festhalten
Sechs Renovate-PRs gemergt (minor-patch-Gruppe, unbound/traefik/postgres
Digest-Refreshes, n8n 2.27, nextcloud-33 Digest); Nextcloud-34-Major bewusst
gehalten.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 19:31:09 +02:00
Micha c8380b5755 docs: H:-Nearline als Restore-Quelle im DR-Fall dokumentieren
Bisher war die H:-Nearline-Kopie nur als Backup-Ziel beschrieben
(CAPACITY_AND_LIFECYCLE), nicht als Restore-Quelle. Im Ernstfall fehlte der
Hinweis, dass auf baerchen eine frische lokale Kopie aller Dumps + Bundles
liegt.

- ops/h-drive-nearline/README.md: neuer Abschnitt "Restore aus H:/ (DR-Fall)"
  mit Inhalt, Rueckspiel-Weg (-> RESTORE_MATRIX / SERVICES_RECOVERY) und
  Pflicht-Frische-Pruefung (deckt den S4U-Stale-Fall ab).
- docs/DISASTER_RECOVERY.md: baerchen-Abschnitt verweist jetzt auf die
  H:-Fallback-Restore-Quelle und das Runbook.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 19:16:09 +02:00
Micha 83d7988c72 nearline: S4U-Fix angewendet und verifiziert (LogonType Interactive)
Task "KalliLab H Drive Nearline Pull" am 2026-06-21 von S4U auf LogonType
Interactive ("Nur ausfuehren, wenn der Benutzer angemeldet ist") umgestellt.
Kein gespeichertes Passwort noetig, da michi der dauerhaft angemeldete
Konsolen-User ist. Per Scheduler ausgeloest, Ergebnis 0x0 verifiziert
(SMB-Zugriff vorhanden, Spiegel frisch).

Doku korrigiert: README beschreibt jetzt Interactive als angewendete Loesung
(Password war nur die nicht genutzte Alternative). MASTER_TODO: Root-Cause
behoben, nur noch optionale Healthchecks-URL offen.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 19:06:55 +02:00
Micha 1a4593110a nearline: S4U-Root-Cause dokumentiert + Exitcode-Leak gefixt
Diagnose 2026-06-21: Der Scheduled Task "KalliLab H Drive Nearline Pull"
lief als LogonType S4U (ohne gespeichertes Passwort) und hatte damit keine
Netzwerk-Anmeldeinformationen fuer den SMB-Share \192.168.178.58\backups.
Jeder geplante 05:30-Lauf brach still mit Exit 1 ab, ohne Report; der
Nearline-Spiegel war 2026-06-19 bis 2026-06-21 veraltet. Manuell nachgezogen,
Spiegel wieder frisch.

pull-critical-backups.ps1: explizites `exit 0` auf dem Erfolgspfad, damit der
letzte robocopy-Exitcode (1 = "Dateien kopiert") nicht als Prozess-Exit leakt
und der Scheduled Task ein wahrheitsgemaesses Ergebnis meldet.

README: Pflicht-Hinweis, dass der Task mit gespeichertem Passwort (nicht S4U)
laufen muss. MASTER_TODO: Root-Cause + verbleibender Operator-Schritt.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 18:02:58 +02:00
Micha f296338530 monitoring + backup: Stale-Handle-Hardening und Dead-Man's-Switch
Schliesst den lokalen Code-Stand fuer zwei offene MASTER_TODO-Punkte ab.

monitoring: restliche Einzeldatei-Bind-Mounts (alertmanager, blackbox,
loki, promtail, alertmanager-ntfy-bridge) auf Directory-Mounts umgestellt,
analog zum Prometheus-Fix vom 2026-06-19. Vermeidet "Stale NFS file handle"
auf dem /mnt/user-FUSE-Share bei git/Komodo-Updates. grafana-provisioning
war bereits Directory-Mount. `docker compose config` gruen. Beim Deploy
--force-recreate noetig, da sich Mount-Zielpfade aendern.

backup: endpoint-agnostischer Dead-Man's-Switch (Healthchecks-kompatibel,
Cloud oder self-hosted) in pull-critical-backups.ps1 und pre-borg.sh.
Pings /start, Erfolg und /fail; No-Op ohne konfigurierte URL, bricht also
keinen Lauf. Ping-URLs sind Capability-URLs und bleiben als Secret
ausserhalb des Repos.

Doku: SECRETS_MAP, Nearline-README und MASTER_TODO nachgezogen.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 17:54:53 +02:00
Micha c39ae5cdfa monitoring: Prometheus-Directory-Mount-Hardening als erledigt vermerkt
Prometheus laeuft jetzt mit stabilem Directory-Mount (recreated, 25 Regeln
aktiv). Verbleibendes Einzeldatei-Muster bei den uebrigen Monitoring-Services
als Folge-Item praezisiert.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-19 10:24:45 +02:00
Micha 7587ee4e77 Backup-Hardening: Live-Verifikation + Monitoring-Bind-Mount-Befund
Dump-Alerts live verifiziert (25 Regeln geladen, 0 feuern, Scope-Drift 0).
Stale-Handle der Prometheus-alerts.yml (FUSE-Einzeldatei-Mount) per
--force-recreate behoben; Directory-Mount-Hardening als TODO aufgenommen.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-19 10:17:58 +02:00
Micha a3c5610934 Backup-Hardening: TODO-Status an verifizierten Live-Stand angleichen
Host-Clone-Pull und Borg-UI-Scope sind erledigt/verifiziert (Live-Drift 0,
alle 33 Quellen konfiguriert). Offen bleiben nur der Prometheus-Config-Reload
fuer die neuen Dump-Alerts und der Nearline-Dead-Man's-Switch.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-19 08:08:04 +02:00
Micha bc9ace315a Backup-Audit-Hardening: Dump-Frische-Monitoring und Scope-Konsistenz
Findings aus dem Backup-/Restore-Audit 2026-06-18 umgesetzt:

- Dump-Frische als Prometheus-Metrik (homelab_borg_dump_present /
  homelab_borg_dump_age_seconds) im Host-Exporter; schliesst den
  Blindfleck, dass Borg weiterlaeuft und stale Dumps archiviert, ohne
  Job-Fehler.
- Neue Alerts HomelabBorgDumpMissing / HomelabBorgDumpStale (critical)
  plus ALERT_RULES.md.
- Freshness-Gate (.sh + .ps1) und H:-Nearline-Pull um n8n.sqlite.dump
  und postgresql17-globals.sql ergaenzt.
- Critical-Container-Watch um mail-archiver, n8n, homeassistant,
  smarthome-mosquitto erweitert.
- BACKUP_SCOPE: /mnt/user/projekte und sonstige User-Shares ausserhalb
  App-Scope als bewusste offene Operator-Entscheidung dokumentiert;
  Hermes-data-Pfad als geparkt klargestellt.
- MASTER_TODO: Nearline-Pull-Ueberwachung, Host-Pull-Nachzug und
  projekte-Scope-Entscheidung aufgenommen.

Enthaelt ausserdem die zuvor vorbereiteten Scope-Erweiterungen
(nextcloud html+data, n8n, filebrowser, influxdb3) und Scope-Drift-/
Retention-/Compact-/Check-Alerts.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-18 20:25:54 +02:00
Micha 0ecb2aceca Refresh current homelab todo state 2026-06-17 22:30:12 +02:00
Micha 1160f50663 Clear completed Glance token todo 2026-06-17 22:04:52 +02:00
Micha 88c48faab1 Tidy AdGuard and DNS repo drift 2026-06-17 21:59:59 +02:00
Micha 861f70da58 Fix operations report warnings 2026-06-17 21:49:33 +02:00
Micha 15b351fa25 @
fix(immich): ML-Egress-Netz fuer Modell-Download

immich_machine_learning hing nur in immich_default (internal: true) ->
kein DNS/Egress, /cache leer, Logs "Failed to resolve huggingface.co".
Container healthy, aber Smart Search + Gesichtserkennung faktisch tot.

Fix: dediziertes nicht-internes Netz immich_egress nur an ML + explizites
dns 1.1.1.1/8.8.8.8 (DNS-Regel docs/WORKFLOW.md). DB/Redis bleiben in
immich_default isoliert (P3). Bewusst nicht frontend_net (unauth. ML-API).

Doku: Architektur-Zielbild (Netze + ML-Zeile), SERVICE_CATALOG, DECISIONS-ADR.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@
2026-06-16 10:51:16 +02:00
Micha e18720d1f8 docs: record ha influxdb weather archive and glance ha token
- DECISIONS: HA -> InfluxDB 3 Core Wetterarchiv (monitoring_net-Attach,
  Admin-Token-Trade-off, Grafana-Datasource/Dashboard)
- SECRETS_MAP: ha_influxdb_token, Agent-Tokens, GLANCE_HA_TOKEN

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 19:51:07 +02:00
Micha c9bd4af2a8 docs: record ha energy dashboard setup 2026-06-13 16:02:10 +02:00
Micha 5927b478fa docs: record local solaredge integration 2026-06-13 15:02:41 +02:00
Micha 0fabed4d1a docs: record ecowitt lan-only ingress decision
LAN-only Host-Bind 192.168.178.58:8123 fuer den Ecowitt-HTTP-Push
dokumentiert: DECISIONS-Eintrag (loest Phase-2-Frage), Architektur-Master
Ausnahme 10, SERVICE_CATALOG. Webhook + LAN-Endpunkt verifiziert; offen
bleibt nur die GW3000-Customized-Server-Konfiguration am Geraet.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 10:06:29 +02:00
Micha 170a7dcc1f docs: record ha mqtt integration 2026-06-13 08:49:46 +02:00
Micha 0f5045ea8e docs: close home assistant restore gate 2026-06-13 08:40:47 +02:00
Micha 2eb8da1cd4 docs: clarify mqtt broker smoke status 2026-06-13 08:33:01 +02:00
Micha 2acbc1adde docs: record home assistant foundation status 2026-06-13 08:30:53 +02:00
Micha 4ab6dcefd2 fix: protect ha onboarding with authelia 2026-06-12 21:52:45 +02:00
Micha 25a4ada891 fix: guard home assistant onboarding 2026-06-12 21:50:15 +02:00
Micha ce6f5c72dd feat: add smart home runtime foundation 2026-06-12 20:38:03 +02:00
Micha 290cb8949e ops: glance dashboard v2 - split config, stack widgets, releases page
- Config per $include aufgeteilt (glance.yml -> pages/home/infrastructure/ops, containers-map zentral)
- Neue Widgets: Komodo Stacks, Gitea GitOps, Paperless, Mealie, Scrutiny Disk Health, Wetter, To-do
- Neue Seite Ops und Releases (releases-Widget fuer gepinnte Images, RSS, Commit-Log)
- Homelab-Status in Tab-Gruppen Core/Apps/Ops, Speedtest-Widget mit ehrlichem Leerzustand
- Theme-Presets (Catppuccin, Gruvbox, Light) + custom.css via Assets-Mount
- Compose: 5 neue read-only Token-ENVs, Doku in SECRETS_MAP/MASTER_TODO nachgezogen

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 16:06:42 +02:00
Micha baedf9f932 docs: record komodo-stack-hygiene-weekly activation
Cron registered in /boot/config/plugins/user.scripts and live in
/etc/cron.d/root after update_cron. First scheduled run: Sun 05:00.
End-to-end smoke test on host: 6 warnings, 0 critical.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-12 12:57:06 +02:00
Micha b387757e87 ops: add komodo stack hygiene posture-check
Catches the failure class that let immich_new slip through: stacks
without a configured repo, project_missing, hash drift, and repo
compose files without a matching Komodo stack. Dry-run on host found
6 honest warnings, 0 critical. Wrapper as Unraid User Script for
weekly cadence is tracked in MASTER_TODO.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-12 12:51:07 +02:00
Micha 3eedbcbe16 docs: record immich stack cleanup 2026-06-12 08:24:27 +02:00
Micha 9033724b15 docs: record host DNS fallback as active
eth0 DNS server 2 = 192.168.178.1 (FRITZ!Box) is set as failover behind
AdGuard. Mark the komodo-bulk-deploy-dns runbook immediate measure as
implemented. Closes the AdGuard SPOF for Docker image pulls.
Ref: docs/homelab-optimierung.md recommendation 3a.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 20:26:22 +02:00
Micha aae176f1b7 docs: record Hetzner Storage Box automatic snapshots as active
Daily snapshots at 05:30 UTC (after the 04:30 local Borg run), 7 days
retention, snapshot directory visible for single-file restore via
.zfs/snapshot/. Closes the ransomware/misuse gap left open by the
explicit decision against Borg append-only (2026-06-01).
Ref: docs/homelab-optimierung.md recommendation 2.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 20:25:01 +02:00
Micha 3e486b95f6 docs: add pdf cleanup and quarterly doc gardening to MASTER_TODO
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 19:55:15 +02:00