Commit Graph

314 Commits

Author SHA1 Message Date
Micha 0847d839e7 Healthchecks heartbeats for renovate and gitea-bundle-mirror (6h jobs)
Same endpoint-agnostic ping via EXIT trap. These two jobs have no warning
level, so only rc==0 pings success, any non-zero pings /fail. gitea-bundle
edit is POSIX-sh clean (script is /bin/sh). Capability URLs from per-job host
secret files. bash -n verified.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 21:09:21 +02:00
renovate 4079b1cbce chore(deps): update minor-and-patch-updates 2026-06-23 18:45:55 +00:00
Micha 7d4d5f901a Add Renovate GitHub token support 2026-06-23 20:42:19 +02:00
Micha 02389ed292 Healthchecks: mark live after API deploy; document host secret files
Stack deployed to Komodo (id 6a3acf2ca7867a4fbab9bfc1), both containers
healthy, Traefik route + LE cert OK, DNS resolves, superuser created and
auth-verified. Flip status to live in ARCHITECTURE 7.6, SERVICE_CATALOG,
MASTER_TODO and the stack README. Document the new host secret files
(secret_key, superuser_password = login password, webhook_secret) in
SECRETS_MAP. Remaining operator step: the Gitea->Komodo webhook.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 20:31:10 +02:00
Micha cbfbb8ca4f Add self-hosted Healthchecks stack for internal job monitoring (hybrid)
Self-hosted Healthchecks (ops/healthchecks/) as the hub for internal
cron/job heartbeats. The three host-down/backup watchdogs (Borg pre-hook,
baerchen nearline pull, monitoring watchdog #8) deliberately stay on
healthchecks.io cloud, since an on-host watcher cannot report a host outage.

- frontend_net + dedicated PostgreSQL 18 in healthchecks_internal
- native Healthchecks auth; ping/API exempt from Authelia (n8n/Komodo pattern)
- registered as middleware_exempt in ops/policy-checks/exceptions.json
- docs: DECISIONS, ARCHITECTURE (3.1/4.2/7.6/10), SERVICE_CATALOG,
  SECRETS_MAP, MASTER_TODO, README index

docker compose config validated (exit 0). Not yet deployed: host secret file,
appdata dir, Komodo stack + ENV and Gitea webhook remain operator steps.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 20:09:56 +02:00
Micha ee0d450a27 docs: close audit remediation status 2026-06-23 19:50:48 +02:00
Micha 6870ae53da Mark send-ntfy.sh executable so restore-failure alerts fire
run-restore-job-with-ntfy.sh execs send-ntfy.sh directly; without the exec bit the failure-alert path errored with Permission denied (found during Codex alert drill 2026-06-23). Set the exec bit in the repo to match the live fix.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 17:25:24 +02:00
Micha c4ba67b55c Activate Hetzner snapshot restore test after live validation
Codex first live run passed (SUCCESS, 7 snapshots, single-file restore from .zfs/snapshot; report hetzner-snapshot-2026-06-23.md) with no ENV overrides. Set runbook status to active, document the run, and add the monthly cadence (15th, cron 0 6 15 * *) to schedule.md and the restore-tests README. Remaining host step: create the Unraid User Script restore-hetzner-snapshot-monthly.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 13:15:58 +02:00
Micha 3e9c12eb75 Add Hetzner Storage Box snapshot restore test
Make the off-site snapshot protection a repeatable, monitored proof (DECISIONS 2026-06-11/-23): a read-only restore-test that lists .zfs/snapshot on the Storage Box, checks retention and newest-snapshot age, and SFTP-fetches one small file from the newest snapshot (size + SHA256). Connection is derived from the borg-ui repo URL and runs via docker exec borg-ui; no secret in the script, no write access. Wired into the run-restore-checks.sh dispatcher; runbook documents the pending one-time live validation.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 12:18:44 +02:00
Micha 813d3bd303 Mirror Komodo IP-allowlist labels and document de-publicization
Codex applied the ipallowlist middleware (Tailnet 100.64.0.0/10 + LAN 192.168.178.0/24) to the Komodo router live in the inline-managed self-stack; public now returns 403. Mirror the labels in ops/komodo/docker-compose.yml for parity (not auto-deployed), record the decision in docs/DECISIONS.md, and update docs/AUTH_MATRIX.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 12:11:45 +02:00
Micha 424772dcfa Merge Renovate: renovate/minor-patch-updates 2026-06-21 19:28:53 +02:00
Micha c8380b5755 docs: H:-Nearline als Restore-Quelle im DR-Fall dokumentieren
Bisher war die H:-Nearline-Kopie nur als Backup-Ziel beschrieben
(CAPACITY_AND_LIFECYCLE), nicht als Restore-Quelle. Im Ernstfall fehlte der
Hinweis, dass auf baerchen eine frische lokale Kopie aller Dumps + Bundles
liegt.

- ops/h-drive-nearline/README.md: neuer Abschnitt "Restore aus H:/ (DR-Fall)"
  mit Inhalt, Rueckspiel-Weg (-> RESTORE_MATRIX / SERVICES_RECOVERY) und
  Pflicht-Frische-Pruefung (deckt den S4U-Stale-Fall ab).
- docs/DISASTER_RECOVERY.md: baerchen-Abschnitt verweist jetzt auf die
  H:-Fallback-Restore-Quelle und das Runbook.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 19:16:09 +02:00
Micha 83d7988c72 nearline: S4U-Fix angewendet und verifiziert (LogonType Interactive)
Task "KalliLab H Drive Nearline Pull" am 2026-06-21 von S4U auf LogonType
Interactive ("Nur ausfuehren, wenn der Benutzer angemeldet ist") umgestellt.
Kein gespeichertes Passwort noetig, da michi der dauerhaft angemeldete
Konsolen-User ist. Per Scheduler ausgeloest, Ergebnis 0x0 verifiziert
(SMB-Zugriff vorhanden, Spiegel frisch).

Doku korrigiert: README beschreibt jetzt Interactive als angewendete Loesung
(Password war nur die nicht genutzte Alternative). MASTER_TODO: Root-Cause
behoben, nur noch optionale Healthchecks-URL offen.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 19:06:55 +02:00
Micha 1a4593110a nearline: S4U-Root-Cause dokumentiert + Exitcode-Leak gefixt
Diagnose 2026-06-21: Der Scheduled Task "KalliLab H Drive Nearline Pull"
lief als LogonType S4U (ohne gespeichertes Passwort) und hatte damit keine
Netzwerk-Anmeldeinformationen fuer den SMB-Share \192.168.178.58\backups.
Jeder geplante 05:30-Lauf brach still mit Exit 1 ab, ohne Report; der
Nearline-Spiegel war 2026-06-19 bis 2026-06-21 veraltet. Manuell nachgezogen,
Spiegel wieder frisch.

pull-critical-backups.ps1: explizites `exit 0` auf dem Erfolgspfad, damit der
letzte robocopy-Exitcode (1 = "Dateien kopiert") nicht als Prozess-Exit leakt
und der Scheduled Task ein wahrheitsgemaesses Ergebnis meldet.

README: Pflicht-Hinweis, dass der Task mit gespeichertem Passwort (nicht S4U)
laufen muss. MASTER_TODO: Root-Cause + verbleibender Operator-Schritt.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 18:02:58 +02:00
Micha f296338530 monitoring + backup: Stale-Handle-Hardening und Dead-Man's-Switch
Schliesst den lokalen Code-Stand fuer zwei offene MASTER_TODO-Punkte ab.

monitoring: restliche Einzeldatei-Bind-Mounts (alertmanager, blackbox,
loki, promtail, alertmanager-ntfy-bridge) auf Directory-Mounts umgestellt,
analog zum Prometheus-Fix vom 2026-06-19. Vermeidet "Stale NFS file handle"
auf dem /mnt/user-FUSE-Share bei git/Komodo-Updates. grafana-provisioning
war bereits Directory-Mount. `docker compose config` gruen. Beim Deploy
--force-recreate noetig, da sich Mount-Zielpfade aendern.

backup: endpoint-agnostischer Dead-Man's-Switch (Healthchecks-kompatibel,
Cloud oder self-hosted) in pull-critical-backups.ps1 und pre-borg.sh.
Pings /start, Erfolg und /fail; No-Op ohne konfigurierte URL, bricht also
keinen Lauf. Ping-URLs sind Capability-URLs und bleiben als Secret
ausserhalb des Repos.

Doku: SECRETS_MAP, Nearline-README und MASTER_TODO nachgezogen.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 17:54:53 +02:00
renovate 3861eaa0d1 chore(deps): update minor-and-patch-updates 2026-06-20 22:20:40 +00:00
Micha bc9ace315a Backup-Audit-Hardening: Dump-Frische-Monitoring und Scope-Konsistenz
Findings aus dem Backup-/Restore-Audit 2026-06-18 umgesetzt:

- Dump-Frische als Prometheus-Metrik (homelab_borg_dump_present /
  homelab_borg_dump_age_seconds) im Host-Exporter; schliesst den
  Blindfleck, dass Borg weiterlaeuft und stale Dumps archiviert, ohne
  Job-Fehler.
- Neue Alerts HomelabBorgDumpMissing / HomelabBorgDumpStale (critical)
  plus ALERT_RULES.md.
- Freshness-Gate (.sh + .ps1) und H:-Nearline-Pull um n8n.sqlite.dump
  und postgresql17-globals.sql ergaenzt.
- Critical-Container-Watch um mail-archiver, n8n, homeassistant,
  smarthome-mosquitto erweitert.
- BACKUP_SCOPE: /mnt/user/projekte und sonstige User-Shares ausserhalb
  App-Scope als bewusste offene Operator-Entscheidung dokumentiert;
  Hermes-data-Pfad als geparkt klargestellt.
- MASTER_TODO: Nearline-Pull-Ueberwachung, Host-Pull-Nachzug und
  projekte-Scope-Entscheidung aufgenommen.

Enthaelt ausserdem die zuvor vorbereiteten Scope-Erweiterungen
(nextcloud html+data, n8n, filebrowser, influxdb3) und Scope-Drift-/
Retention-/Compact-/Check-Alerts.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-18 20:25:54 +02:00
Micha 0ecb2aceca Refresh current homelab todo state 2026-06-17 22:30:12 +02:00
Micha 88c48faab1 Tidy AdGuard and DNS repo drift 2026-06-17 21:59:59 +02:00
Micha a1e6a03f79 chore: remove redundant berlin weather tile from glance
Die generische Berlin-Wetterkachel ist durch die lokale Ecowitt-Kachel
(custom-api aus HA) ersetzt.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 19:51:07 +02:00
Micha 8200697258 fix: parse glance weather gust as float via gjson
toFloat erwartet eine Zahl, der HA-State kommt aber als String -> Template-
Fehler. Boee jetzt direkt per (.Subrequest "gust").JSON.Float "state" lesen,
gjson parst den numerischen String korrekt fuer den Schwellenvergleich.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 19:44:42 +02:00
Micha 05b12c4802 fix: pass GLANCE_HA_TOKEN into glance container
Die Compose-environment-Sektion listet die GLANCE_*-Vars einzeln; der neue
GLANCE_HA_TOKEN fehlte und kam daher nie im Container an (Glance: variable
not found). Jetzt durchgereicht analog der anderen Tokens.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 19:26:46 +02:00
Micha 8d01c3537a feat: add glance weather tile from home assistant
custom-api Wetterkachel zieht die Ecowitt-Sensoren live aus HA
(intern http://homeassistant:8123, frontend_net) im Neon-Ops-Stil.
Boeen > 40 km/h werden rot markiert (analog HA-Warnautomation).
Benoetigt GLANCE_HA_TOKEN als Glance-Stack-ENV.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 18:52:43 +02:00
Micha dfa3acc21e ops: add home assistant restore test 2026-06-13 08:37:33 +02:00
Micha 25a4ada891 fix: guard home assistant onboarding 2026-06-12 21:50:15 +02:00
Micha ce6f5c72dd feat: add smart home runtime foundation 2026-06-12 20:38:03 +02:00
Micha 630ee8dd90 ops: glance server-stats balken kraeftiger (13px, innenschatten)
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 20:00:38 +02:00
Micha b1ca9ef19c ops: glance server-stats balken - rund, gradient, glow, warnfarbe ab 85%
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 19:57:36 +02:00
Micha 1c949d3fcc ops: glance internet-widget - bytes/s nach mbit/s umrechnen
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 19:55:05 +02:00
Micha cfa6c01768 ops: glance komodo/immich widgets - stat-leisten mit trennlinien, pills, gradient-bars
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 19:24:12 +02:00
Micha 3474d53ce5 ops: glance borg-widget fix - alter via promql berechnen statt now.Unix im template
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 19:05:48 +02:00
Micha ca81b959cc ops: glance borg-backup-widget via prometheus + synthwave/matrix presets
- glance zusaetzlich in monitoring_net (nur lesende Prometheus-Query, kein neuer Listener)
- Borg-Widget: Backup-Alter aus homelab_borg_last_completed_timestamp_seconds, Status aus homelab_borg_last_success
- Theme-Presets synthwave und matrix

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 19:01:32 +02:00
Micha 23764dff38 ops: glance farbschema entlilat - neutraler grund, akzente blau/cyan/amber/gruen
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 18:44:00 +02:00
Micha 3c4a48d7e5 ops: glance neon-ops v2 - rotierende akzentfarben, gradient-zahlen, animierte header
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 18:30:38 +02:00
Micha c0a39f5dfc ops: glance neon-ops look - card styling, glows, sattere theme-farben
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 18:27:30 +02:00
Micha a1d7b6e433 ops: glance layout verdichtet - internet kombiniert, container-tabs, mealie/commit-fixes
- Home rechte Spalte: Internet+Speed in einem Widget, DNS-und-VPN-Monitor entfernt, Container-Listen als Tab-Gruppe
- Infrastructure: Container-Listen als Tab-Gruppe, Mealie-Statistik auf /api/admin/about/statistics (404-Fix)
- Commit-Widgets: toRelativeTime als span-Attribut, nur erste Commit-Zeile

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 16:47:25 +02:00
Micha 45f43da659 ops: glance speedtest widget - ookla raw data fallbacks (data.data.*)
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 16:44:18 +02:00
Micha 290cb8949e ops: glance dashboard v2 - split config, stack widgets, releases page
- Config per $include aufgeteilt (glance.yml -> pages/home/infrastructure/ops, containers-map zentral)
- Neue Widgets: Komodo Stacks, Gitea GitOps, Paperless, Mealie, Scrutiny Disk Health, Wetter, To-do
- Neue Seite Ops und Releases (releases-Widget fuer gepinnte Images, RSS, Commit-Log)
- Homelab-Status in Tab-Gruppen Core/Apps/Ops, Speedtest-Widget mit ehrlichem Leerzustand
- Theme-Presets (Catppuccin, Gruvbox, Light) + custom.css via Assets-Mount
- Compose: 5 neue read-only Token-ENVs, Doku in SECRETS_MAP/MASTER_TODO nachgezogen

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 16:06:42 +02:00
Micha a4f4696b0d docs: anchor documentation rules, rebuild index, archive proposal
- REPO_MAP.md: replace Arbeitsregel with 8 binding documentation rules
  (one fact one home, done leaves the working copy, file types, header
  convention, quarterly gardening)
- WORKFLOW.md Dokumentationspflicht and CLAUDE.md aligned to the rules
- docs/README.md index rebuilt for the consolidated state
- H drive docs merged into ops/h-drive-nearline/README.md (scheduled
  task + no-MIR rule added); docs/H_DRIVE_NEARLINE_PULL.md removed
- implemented proposal archived to docs/archive/2026/

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 07:14:11 +02:00
Micha 1fcdb68221 docs: consolidate restore documentation into ops/restore-tests
- merge RESTORE_HANDBOOK.md into ops/restore-tests/README.md (single
  operations doc; restore status lives only in RESTORE_MATRIX maturity
  table)
- RESTORE_MATRIX.md: extract embedded runbook drafts (261 -> 141 lines);
  unraid-flash and tailscale stubs become ops/restore-tests runbooks,
  adguard/redis checklists superseded by validated scripts
- delete six historical pre-first-run *-plan.md files (runbook + script
  are the source of truth since the validated first runs)
- SERVICES_RECOVERY: drop completed task table; DISASTER_RECOVERY:
  point related docs and section 11 to MASTER_TODO/schedule

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 07:11:16 +02:00
Micha c80b51f585 docs: introduce docs/archive, remove finished sprint boards and generated report
- docs/archive/2026/ with index README: DR tabletop drill, workstation
  audits, HA/Ecowitt draft, pre-Borg backup audit, finished windows
  reinstall project docs
- delete weekend sprint boards (content preserved in MASTER_TODO done log
  and git history)
- untrack generated ops/policy-checks/last-report.md and gitignore it
- fix references (CLAUDE.md, docs/README.md, ops/windows-reinstall/README.md)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 07:02:57 +02:00
Micha 42ed59a4d7 docs: commit pending status updates from 2026-06-06 sprint wrap-up
Preserves uncommitted working-copy updates (Veeam recovery test done,
BitLocker decision, ACL rollout, freshness negative test) before the
documentation consolidation restructures these files.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 07:00:25 +02:00
Micha e80e5dd49f renovate: komodo-Stack (inline-managed) aus Tracking nehmen
Der komodo-Stack wird in Komodo inline (file_contents) verwaltet, nicht aus dem Repo deployed. Renovate-PRs darauf wirken zur Laufzeit nicht und erzeugen Git-Komodo-Scheinsicherheit. Daher: ops/komodo/** in ignorePaths, mongo-Digest auf den real laufenden Stand zurueckgesetzt, Inline-Ausnahme in docs/RENOVATE.md und im Compose-Header dokumentiert.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-10 20:34:52 +02:00
Micha 3c339474a7 Merge pull request 'chore(deps): update mongo:8.0.23 docker digest to 73ee318' (#13) from renovate/mongo-8.0.23 into master
Reviewed-on: #13
2026-06-10 18:27:13 +00:00
renovate 9847baf327 chore(deps): update minor-and-patch-updates 2026-06-10 14:32:08 +00:00
renovate cf11b4d75b chore(deps): update mongo:8.0.23 docker digest to 73ee318 2026-06-10 04:21:10 +00:00
Micha cf9ca59eb1 docs: close baerchen veeam recovery test 2026-06-06 13:27:31 +02:00
Micha d2a9c3b8cb docs: record baerchen veeam recovery usb boot 2026-06-06 13:25:53 +02:00
Micha a4c79d9d81 docs: record guest iot preflight 2026-06-06 13:14:07 +02:00
Micha 18a90fbb4b ops: add guest iot network preflight 2026-06-06 13:13:01 +02:00