ops-report: cert-dedup, blackbox-DNS auf AdGuard, neue Noise-Patterns
Behebt drei Befunde aus dem Operations-Report 2026-06-10: - daily-status-report.sh: Zertifikate werden vor der Auswertung pro Domain-Set dedupliziert; nur das laengstlaufende Cert zaehlt. Traefik haelt waehrend der Erneuerung altes + neues Cert in acme.json, was bisher eine falsche KRITISCH-Warnung (traefik.kaleschke.info 5 Tage) ausloeste, obwohl das neue Cert 65 Tage Restlaufzeit hat. - monitoring/blackbox-exporter: DNS von 1.1.1.1/8.8.8.8 auf AdGuard (172.23.0.3 via dns_net) umgestellt. Externe Resolver lieferten die WAN-IP, was Hairpin-NAT-Timeouts (9,5s) bei Probes von cloud/glances verursachte (662 Fehler/Tag). - log-noise.patterns: Fritz!Box-SOA-Fehler (AdGuard, RFC-1035-Verstoss) und fehlendes grafana-amazonprometheus-datasource-Plugin als bekanntes Rauschen klassifiziert (~1800 Zeilen/Tag). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -18,7 +18,7 @@
|
||||
# Removing a pattern: replace with a fresh attention example in the next
|
||||
# daily report and consult before reintroducing.
|
||||
#
|
||||
# Last reviewed: 2026-05-21
|
||||
# Last reviewed: 2026-06-10
|
||||
|
||||
# Loki internal query cancellations / scheduler chatter.
|
||||
# Why: Loki cancels internal queries continuously when downstream Promtails
|
||||
@@ -72,3 +72,18 @@ authelia.*Request timeout occurred.*status_code=408
|
||||
# noise becomes overwhelming, add a *narrow* pattern restricted to
|
||||
# push contexts only (e.g. `vaultwarden.*push.*(ResolveError|...)`).
|
||||
vaultwarden.*(Token has expired|Invalid refresh token|Failed to decode.*refresh_token|POST /identity/connect/token => 401 Unauthorized)
|
||||
|
||||
# AdGuard: Fritz!Box sends malformed SOA queries for myfritz.net / myfritz.link.
|
||||
# Why: AVM Fritz!Box devices send multi-question DNS SOA queries that violate
|
||||
# RFC 1035 ("only 1 question allowed"). AdGuard rejects them with an error
|
||||
# but they have no operational impact.
|
||||
# Re-check: if the same error appears for non-AVM domains, or if rate spikes
|
||||
# well above 1000/day without a Fritz!Box reboot explaining it.
|
||||
adguard.*bad question section.*only 1 question allowed
|
||||
|
||||
# Grafana: usage-stats collector looks for the Amazon Prometheus plugin, which
|
||||
# is not installed in this setup. The error is emitted once per stats cycle.
|
||||
# Why: GF_PLUGINS_PREINSTALL_DISABLED=true keeps the plugin list minimal;
|
||||
# this lookup is harmless and does not affect any dashboard.
|
||||
# Re-check: only if Amazon Prometheus is added as a datasource.
|
||||
monitoring-grafana.*grafana-amazonprometheus-datasource not found
|
||||
|
||||
Reference in New Issue
Block a user