feat(telemetry): add per-app metrics and log telemetry to hub reports (v0.28.0)
- New internal/metrics/telemetry.go: MetricsStore.GetContainerTelemetry() aggregates container memory/CPU from SQLite over the last 15 min - New internal/metrics/logscanner.go: ScanContainerLogs() scans docker logs for errors/warnings, deduplicates via fingerprinting (strips timestamps, replaces 6+ digit numbers, hex strings, UUIDs) - New internal/report/telemetry.go: buildAppTelemetrySection() assembles per-stack AppTelemetry by aggregating container metrics and log summaries - internal/report/types.go: added AppTelemetry field to Report struct plus AppTelemetry type with memory/CPU/log fields and LogIssue references - internal/report/builder.go: calls buildAppTelemetrySection() in BuildReport() - Backward-compatible: old Hub versions silently ignore app_telemetry field Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
+19
-1
@@ -4,7 +4,7 @@
|
||||
|
||||
A single, lightweight Go container that replaces Portainer + scattered systemd scripts with a unified, Hungarian-language web dashboard for managing Docker Compose stacks, backups, storage, monitoring, and notifications on customer hardware.
|
||||
|
||||
**Current version: v0.27.2**
|
||||
**Current version: v0.28.0**
|
||||
|
||||
---
|
||||
|
||||
@@ -846,9 +846,27 @@ Periodic JSON push (default every 15 min) to the central felhom-hub service:
|
||||
- Health: current status, issues, warnings
|
||||
- Stacks: deployed apps with versions and states
|
||||
- Config hash: SHA256 of `controller.yaml` for Hub-side config comparison
|
||||
- **App telemetry** (v0.28.0+): Per-stack memory (current/avg/peak) and CPU averages from the last 15 minutes of metrics data, plus log scan results (error/warning counts with deduplicated issues). Only non-protected, deployed stacks are included. Backward-compatible: old Hub versions silently ignore this field.
|
||||
|
||||
Bearer token authentication, 3-attempt retry with 5-second backoff. Push status tracked via `PushStatus` struct (LastAttempt, LastSuccess, LastError, consecutive failures) — used by the monitoring page and alert system to show Hub connection health.
|
||||
|
||||
#### App Telemetry (`internal/metrics/telemetry.go`, `internal/metrics/logscanner.go`, `internal/report/telemetry.go`)
|
||||
|
||||
Each report push now includes per-app telemetry data:
|
||||
|
||||
**Metrics collection** (`telemetry.go`):
|
||||
- `MetricsStore.GetContainerTelemetry(since)` aggregates container-level memory (avg, peak, current) and CPU averages from the `container_metrics` SQLite table for the last 15 minutes.
|
||||
|
||||
**Log scanning** (`logscanner.go`):
|
||||
- `ScanContainerLogs(containerNames, since, logger)` runs `docker logs --since=15m --tail=1000` sequentially on all non-protected deployed containers.
|
||||
- Classifies lines by keyword match (errors: `error`, `fatal`, `panic`, `crit`, `oom`, `killed`, `exception`, `traceback`; warnings: `warn`, `warning`) on the first 5 words (case-insensitive).
|
||||
- Deduplicates via fingerprinting: strips timestamps, replaces 6+ digit numbers with `<N>`, 8+ char hex with `<HEX>`, UUIDs with `<UUID>`. Groups identical fingerprints, keeps top 10 per container.
|
||||
- Returns `[]ContainerLogSummary` with `ErrorCount`, `WarnCount`, `RecentIssues []LogIssue`.
|
||||
|
||||
**Report integration** (`report/telemetry.go`):
|
||||
- `buildAppTelemetrySection()` calls both, then `buildAppTelemetry()` aggregates by stack — summing container metrics, merging issues, capping at 10 per app.
|
||||
- Results stored as `[]AppTelemetry` in the `Report` struct field `app_telemetry`.
|
||||
|
||||
#### Infrastructure Backup to Hub (`internal/report/infra_backup.go`)
|
||||
|
||||
After each backup cycle (including manual Tier 2 triggers via `OnCrossDriveComplete` callback), the controller pushes a full infrastructure snapshot to the Hub for disaster recovery. This snapshot includes:
|
||||
|
||||
Reference in New Issue
Block a user