Commit Graph

864 Commits

Author SHA1 Message Date
Micha 0847d839e7 Healthchecks heartbeats for renovate and gitea-bundle-mirror (6h jobs)
Same endpoint-agnostic ping via EXIT trap. These two jobs have no warning
level, so only rc==0 pings success, any non-zero pings /fail. gitea-bundle
edit is POSIX-sh clean (script is /bin/sh). Capability URLs from per-job host
secret files. bash -n verified.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 21:09:21 +02:00
Micha f775685cd2 Healthchecks heartbeats for compose-drift, komodo-hygiene, daily-report
Add endpoint-agnostic Healthchecks pings to the three remaining scheduled
host-audit jobs via an EXIT-trap merge (start + success/fail), so the body of
each script (incl. the 1400-line daily-status-report) stays untouched. Exit
0/1/2 = ran (ok/warning/critical); only rc>2 pings /fail. Capability URLs come
from per-job host secret files (healthchecks_<job>_url), never in the repo.
bash -n verified.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 21:06:02 +02:00
Micha a137129c75 cert-token-check: Healthchecks heartbeat; document internal ping URLs
Add the same endpoint-agnostic Healthchecks ping wrapper to cert-token-check.sh
(daily) as in posture-check.sh; capability URL from host secret file
healthchecks_cert_token_url. SECRETS_MAP: document the per-job internal ping
URL files. MASTER_TODO: posture-check + cert-token-check wired and verified
(status up); project KalliLab CORE + ntfy integration created.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 20:56:13 +02:00
Micha 5ca4922d8d Merge Renovate minor and patch updates 2026-06-23 20:49:44 +02:00
Micha 09381d932a Merge Renovate n8n update 2026-06-23 20:49:40 +02:00
Micha 1c183df8d2 Merge Renovate Traefik digest update 2026-06-23 20:49:37 +02:00
Micha acc92e84e1 Merge Renovate Redis digest update 2026-06-23 20:49:33 +02:00
Micha 2844b63b37 posture-check: endpoint-agnostic Healthchecks heartbeat ping
Wrap main() with a Healthchecks ping (start + success/fail). The capability
ping URL is read from $HEALTHCHECKS_POSTURE_URL or the host secret file
/mnt/user/appdata/secrets/healthchecks_posture_url (never in the repo, same
pattern as pre-borg.sh). Exit code preserved; warning/critical still count as
"ran" (posture alerts stay on ntfy), only a real abort (rc>2) pings /fail.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 20:49:30 +02:00
renovate 99a8d9fb6a chore(deps): update docker.n8n.io/n8nio/n8n docker tag to v2.28.0 2026-06-23 18:46:08 +00:00
renovate 8002b197af chore(deps): update traefik:v3.7 docker digest to e4d9815 2026-06-23 18:46:02 +00:00
renovate 4613da82e2 chore(deps): update redis:8.8.0-alpine docker digest to 9d31717 2026-06-23 18:45:58 +00:00
renovate 4079b1cbce chore(deps): update minor-and-patch-updates 2026-06-23 18:45:55 +00:00
Micha 7ded74aeef Rebase stale Renovate branches 2026-06-23 20:45:12 +02:00
Micha 7d4d5f901a Add Renovate GitHub token support 2026-06-23 20:42:19 +02:00
Micha ad8010767d Healthchecks: Gitea->Komodo webhook active, mark webhook step done
Webhook authenticated and triggered a successful DeployStack (komodo-core
log 18:39:00). Only remaining step is wiring internal jobs as checks.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 20:40:42 +02:00
Micha 02389ed292 Healthchecks: mark live after API deploy; document host secret files
Stack deployed to Komodo (id 6a3acf2ca7867a4fbab9bfc1), both containers
healthy, Traefik route + LE cert OK, DNS resolves, superuser created and
auth-verified. Flip status to live in ARCHITECTURE 7.6, SERVICE_CATALOG,
MASTER_TODO and the stack README. Document the new host secret files
(secret_key, superuser_password = login password, webhook_secret) in
SECRETS_MAP. Remaining operator step: the Gitea->Komodo webhook.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 20:31:10 +02:00
Micha cbfbb8ca4f Add self-hosted Healthchecks stack for internal job monitoring (hybrid)
Self-hosted Healthchecks (ops/healthchecks/) as the hub for internal
cron/job heartbeats. The three host-down/backup watchdogs (Borg pre-hook,
baerchen nearline pull, monitoring watchdog #8) deliberately stay on
healthchecks.io cloud, since an on-host watcher cannot report a host outage.

- frontend_net + dedicated PostgreSQL 18 in healthchecks_internal
- native Healthchecks auth; ping/API exempt from Authelia (n8n/Komodo pattern)
- registered as middleware_exempt in ops/policy-checks/exceptions.json
- docs: DECISIONS, ARCHITECTURE (3.1/4.2/7.6/10), SERVICE_CATALOG,
  SECRETS_MAP, MASTER_TODO, README index

docker compose config validated (exit 0). Not yet deployed: host secret file,
appdata dir, Komodo stack + ENV and Gitea webhook remain operator steps.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 20:09:56 +02:00
Micha ee0d450a27 docs: close audit remediation status 2026-06-23 19:50:48 +02:00
Micha 79657d526c Record alert-chain end-to-end verification in MASTER_TODO
Codex drills proved all six alert paths to the phone (send-ntfy, restore wrapper, freshness negative, Alertmanager->bridge->ntfy, docker-critical-watcher smoke, Borg pre-hook failure). Add a Kurzlog entry, note the send-ntfy exec-bit beleg-bug (6870ae5), and the one optional gap (Prometheus->AM leg via temp rule). Trim oldest entry to stay at max 5.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 19:12:57 +02:00
Micha 6870ae53da Mark send-ntfy.sh executable so restore-failure alerts fire
run-restore-job-with-ntfy.sh execs send-ntfy.sh directly; without the exec bit the failure-alert path errored with Permission denied (found during Codex alert drill 2026-06-23). Set the exec bit in the repo to match the live fix.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 17:25:24 +02:00
Micha 46d6010c66 Record backend_net internal:true after live flip; close audit remediation
backend_net was recreated with --internal (Codex live): egress from postgresql17 blocked, all 12 members reattached, frontends and DB connections verified. Move the parked #17 item to the MASTER_TODO Kurzlog and confirm the live state in NETWORK_INVENTORY. No dawarich_egress needed (sidekiq makes no external connections).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 15:48:59 +02:00
Micha 5a0a4c9d56 Park backend_net internal:true hardening with egress prereq in MASTER_TODO
Capture the audit egress analysis durably so the deferred maintenance window keeps the prep. backend_net -> internal:true is the only remaining P3 item; the single risk is dawarich_sidekiq (the only backend_net-only worker), all DB/cache and dual-homed containers are safe. If sidekiq needs egress, use a dedicated dawarich_egress net (immich_egress precedent).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 15:22:33 +02:00
Micha c4ba67b55c Activate Hetzner snapshot restore test after live validation
Codex first live run passed (SUCCESS, 7 snapshots, single-file restore from .zfs/snapshot; report hetzner-snapshot-2026-06-23.md) with no ENV overrides. Set runbook status to active, document the run, and add the monthly cadence (15th, cron 0 6 15 * *) to schedule.md and the restore-tests README. Remaining host step: create the Unraid User Script restore-hetzner-snapshot-monthly.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 13:15:58 +02:00
Micha 275558b2db Record audit-2026-06-23 remediation status in MASTER_TODO
Add an Aktiv row for the remaining Codex/operator follow-ups (#19 snapshot-test validation, #12 default-bridge recreate, #14 drift redeploy, #15 immich path) and a Kurzlog entry summarizing the closed P1/P2 core (Vault /admin + Komodo de-publicized, snapshots proven, auth-matrix). Trim oldest Kurzlog entry to stay at max 5.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 12:23:59 +02:00
Micha 3e9c12eb75 Add Hetzner Storage Box snapshot restore test
Make the off-site snapshot protection a repeatable, monitored proof (DECISIONS 2026-06-11/-23): a read-only restore-test that lists .zfs/snapshot on the Storage Box, checks retention and newest-snapshot age, and SFTP-fetches one small file from the newest snapshot (size + SHA256). Connection is derived from the borg-ui repo URL and runs via docker exec borg-ui; no secret in the script, no write access. Wired into the run-restore-checks.sh dispatcher; runbook documents the pending one-time live validation.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 12:18:44 +02:00
Micha 813d3bd303 Mirror Komodo IP-allowlist labels and document de-publicization
Codex applied the ipallowlist middleware (Tailnet 100.64.0.0/10 + LAN 192.168.178.0/24) to the Komodo router live in the inline-managed self-stack; public now returns 403. Mirror the labels in ops/komodo/docker-compose.yml for parity (not auto-deployed), record the decision in docs/DECISIONS.md, and update docs/AUTH_MATRIX.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 12:11:45 +02:00
Micha ad47979000 Add consolidated Auth-Matrix doc
Consolidate effective access policy per public domain (Authelia bypass/two_factor, native exceptions, Tailscale-only, IP-allowlist) into a single reviewable matrix, surfacing the Authelia bypass list that previously lived only in the live config. Indexed in docs/README.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 11:03:26 +02:00
Micha 23a6975a67 Restrict Vaultwarden /admin to trusted networks (Tailscale + LAN)
Audit 2026-06-23 (P1): /admin was publicly reachable (200). Add a higher-priority Traefik router scoped to PathPrefix(/admin) with an ipallowlist middleware (Tailnet 100.64.0.0/10 + LAN 192.168.178.0/24); the main router stays native for browser and mobile clients. Documented in docs/DECISIONS.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 11:03:26 +02:00
Micha 81151d8af4 Fix Dawarich Grafana datasource database config 2026-06-22 20:45:09 +02:00
Micha 45ff8286cf Use table-format Dawarich Grafana panels 2026-06-22 20:20:07 +02:00
Micha f318d80477 Simplify Dawarich Grafana dashboard query model 2026-06-22 20:12:14 +02:00
Micha b8d9bba5d3 Replace Dawarich Grafana dashboard 2026-06-22 19:55:47 +02:00
Micha 3bebc03a8f chore(deps): move dawarich redis to v8 track
dawarich_redis was the last redis instance still on 7-alpine; the
closed PR #10 kept it as an "Ignored or Blocked" entry in the Renovate
Dependency Dashboard (issue #6). Bump to the already-running
redis:8.8.0-alpine digest and add apps/dawarich to the renovate redis
8.x allowedVersions pin. Data path /mnt/user/appdata/dawarich/redis
unchanged; redis 8 loads the existing RDB snapshots.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 19:51:36 +02:00
Micha 0f1e78e0ca Fix Dawarich Grafana readonly user init 2026-06-22 19:01:39 +02:00
Micha 658750bc19 Allow Dawarich mobile track point route 2026-06-22 16:55:18 +02:00
Micha 5afba298e9 Allow Dawarich mobile API routes 2026-06-22 16:45:26 +02:00
Micha bd0deea90d Use internal Dawarich metrics scrape 2026-06-21 23:21:21 +02:00
Micha 5a6ab2cc37 Bypass Authelia for Dawarich healthcheck 2026-06-21 23:17:38 +02:00
Micha 30c3435ddf Bypass Authelia for Dawarich metrics 2026-06-21 23:04:58 +02:00
Micha b236eaeeaa Restore Dawarich metrics basic auth config 2026-06-21 23:02:26 +02:00
Micha 4cf9e3226e Use credentials file for Dawarich metrics scrape 2026-06-21 23:00:00 +02:00
Micha 699b1f118e Scrape Dawarich metrics over HTTPS 2026-06-21 22:52:57 +02:00
Micha db886c9eb2 Send forwarded proto for Dawarich metrics 2026-06-21 22:47:51 +02:00
Micha 2a342614db Allow internal Dawarich scrape host 2026-06-21 22:44:48 +02:00
Micha 2bb6eaa267 Run Prometheus as root for file secrets 2026-06-21 22:43:28 +02:00
Micha cb80e2d2c0 Fix Dawarich Prometheus secret permissions 2026-06-21 22:41:46 +02:00
Micha 201b201657 Fix Dawarich healthcheck behind HTTPS 2026-06-21 22:39:15 +02:00
Micha 725e3b0125 Add Dawarich stack 2026-06-21 22:32:41 +02:00
Micha 1de6ffc5ac chore(deps): update nextcloud to v34 2026-06-21 21:29:37 +02:00
Micha 5559aa3f24 chore(deps): update gitea and cadvisor images 2026-06-21 21:20:11 +02:00