From 87e79548b0a7b60a88bf600ede3aa0935e4fd9bc Mon Sep 17 00:00:00 2001 From: kisfenyo Date: Mon, 16 Feb 2026 10:01:43 +0100 Subject: [PATCH] v0.5.0: Backup Bugfixes + Monitoring Page with Metrics Store --- TASK.md | 1008 ++++++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 806 insertions(+), 202 deletions(-) diff --git a/TASK.md b/TASK.md index 1b9be42..4d3d08c 100644 --- a/TASK.md +++ b/TASK.md @@ -1,171 +1,131 @@ -# TASK.md — v0.4.7: Protected Stack Detail Pages + Backup Page Caching +# TASK.md — v0.5.0: Backup Bugfixes + Monitoring Page with Metrics Store -> Version bump: **v0.4.7** -> Scope: UX fix + performance fix +> Version bump: **v0.5.0** +> Scope: 2 backup fixes + new metrics subsystem + new monitoring page --- -## Task 1: Protected stacks — enable detail page + click-through +## Overview + +1. **Bugfix**: "Helyi mentés" shows "–" after controller restart (in-memory `LastBackup` lost) +2. **Performance**: Backup page caching — `GetFullStatus()` calls restic/docker on every page load (3-4s) +3. **Feature**: New **Rendszermonitor** (System Monitor) page with historical metrics stored in SQLite, rendered with Chart.js + +--- + +## Task 1: Fix "Helyi mentés" status after restart ### Problem -Protected stacks (filebrowser, traefik, cloudflared, felhom-controller) are excluded from the app detail page in two ways: +After controller restart, `LastBackup` is nil (in-memory only). The template checks `{{if .Backup.LastBackup}}` and falls through to the "–" state. However, `snapshotHistory` IS loaded on startup from `LoadSnapshotHistory()` — so we know backups exist. -1. **No click-through**: `data-href="/apps/{{.Meta.Slug}}"` is gated behind `{{if not .Protected}}` on both `dashboard.html` and `stacks.html` -2. **No "Részletek" button**: The protected section in `stacks.html` only shows "Védett rendszerkomponens" badge + restart button — no "Részletek" link -3. **Detail page works**: Manually navigating to `/apps/filebrowser` renders fine — so the handler and template already support it +### Fix -### Fix — Templates +In `GetFullStatus()`, after copying snapshot history, synthesize a `LastBackup` if it's nil but snapshots exist: -**stacks.html**: Add `data-href` for protected stacks that have a slug, and add "Részletek" button: +```go +// After m.mu.Unlock() and snapshot reversal... -Change the card opening div from: -```html -
-``` -To: -```html -
+// Synthesize LastBackup from snapshot history if not in memory (e.g., after restart) +if status.LastBackup == nil && len(status.SnapshotHistory) > 0 { + latest := status.SnapshotHistory[0] // already reversed, newest first + status.LastBackup = &BackupStatus{ + LastRun: latest.Time, + Success: latest.Success, + Snapshot: &SnapshotResult{ + SnapshotID: latest.SnapshotID, + }, + } +} ``` -This lets any stack with a slug (including protected ones with `.felhom.yml`) be clickable. Stacks without `.felhom.yml` (no slug) won't have the click handler — which is correct. +Also do the same for `LastDBDump` — synthesize from `DumpFiles` on disk: -In the protected actions section, add "Részletek" link: -```html -{{if .Protected}} - Védett rendszerkomponens - {{if isOperational .State}} - - {{end}} - {{if .Meta.Slug}} - Részletek - {{end}} -{{else if not .Deployed}} -``` - -**dashboard.html**: Same `data-href` fix for the compact card: - -Change from: -```html -
-``` -To: -```html -
-``` - -### Fix — FileBrowser .felhom.yml (resources) - -The manually created `.felhom.yml` on the demo node is missing `resources`. Update it to include: - -```yaml -resources: - mem_request: "128M" - mem_limit: "256M" - pi_compatible: true - needs_hdd: true -``` - -Also add this to the `.felhom.yml` created by `install_filebrowser()` in `scripts/docker-setup.sh`. - -**Manual fix for demo node** (run after deploy): -```bash -cat >> /opt/docker/stacks/filebrowser/.felhom.yml << 'EOF' -resources: - mem_request: "128M" - mem_limit: "256M" - pi_compatible: true - needs_hdd: true -EOF +```go +if status.LastDBDump == nil && len(status.DumpFiles) > 0 { + var results []DumpResult + var latestTime time.Time + for _, f := range status.DumpFiles { + results = append(results, DumpResult{ + DB: DiscoveredDB{StackName: f.StackName, DBType: f.DBType, ContainerName: f.StackName}, + FilePath: f.FileName, + Size: f.Size, + }) + if f.ModTime.After(latestTime) { + latestTime = f.ModTime + } + } + status.LastDBDump = &DBDumpStatus{ + LastRun: latestTime, + Results: results, + Success: true, + } +} ``` ### Verification -- Clicking FileBrowser card on Alkalmazások page opens `/apps/filebrowser` detail page -- "Részletek" button appears next to "Újraindítás" on FileBrowser -- Detail page shows memory badges (~128M, HDD szükséges, Pi kompatibilis) -- App info section shows use cases, first steps, prerequisites -- Other protected stacks without `.felhom.yml` (traefik, cloudflared) don't show "Részletek" (no slug) +Restart the controller (`docker compose restart`), then load the backup page. "Helyi mentés" should show green checkmark "aktív" (not "–"). --- -## Task 2: Backup page — cache expensive data +## Task 2: Backup page caching ### Problem -`GetFullStatus()` is called synchronously on every page load of `/backups` and runs three blocking subprocess calls: +`GetFullStatus()` runs `restic stats --json`, `restic snapshots --json`, `docker ps`, and `docker inspect` synchronously on **every page load** of `/backups`. Takes 3-4 seconds. -1. `m.restic.Stats()` — executes `restic stats --json` + `restic snapshots --json` (~2-3 seconds) -2. `ListDumpFiles()` — directory listing (fast, ~1ms) -3. `DiscoverDatabases()` — `docker ps` + `docker inspect` per container (~0.5s) +### Fix: Background cache with periodic refresh + -Total: 3-4 seconds per page load. This is unacceptable for a dashboard page. - -### Solution: Background cache with periodic refresh - -Add a cached `FullBackupStatus` to the Manager that refreshes periodically via the scheduler, instead of computing on every page load. - -#### New fields in Manager: +Add cached status to Manager: ```go type Manager struct { // ... existing fields ... - - mu sync.Mutex - lastDBDump *DBDumpStatus - lastBackup *BackupStatus - running bool - snapshotHistory []SnapshotRecord - - // Cached status for page rendering - cachedStatus *FullBackupStatus - cacheTime time.Time + cachedStatus *FullBackupStatus + cacheTime time.Time } ``` -#### New method: `RefreshCache()` +New method: ```go -// RefreshCache updates the cached full status. Called by scheduler every 5 minutes -// and after each backup run. +// RefreshCache updates the cached backup status in the background. +// Called by scheduler every 5 minutes and after each backup run. func (m *Manager) RefreshCache(nextDBDump, nextBackup time.Time) { - // Same logic as current GetFullStatus() — run restic stats, list dump files, discover DBs + // Execute all the expensive calls (restic stats, docker ps, etc.) + // Same logic currently in GetFullStatus()... status := &FullBackupStatus{ ... } - // ... all the expensive calls ... - m.mu.Lock() m.cachedStatus = status m.cacheTime = time.Now() m.mu.Unlock() - - m.logger.Printf("[INFO] Backup status cache refreshed") } ``` -#### Modified `GetFullStatus()`: read from cache +Modified `GetFullStatus()` — reads from cache, updates only cheap dynamic fields: ```go -// GetFullStatus returns the cached backup status for page rendering. -// Returns instantly — no subprocess calls. func (m *Manager) GetFullStatus(nextDBDump, nextBackup time.Time) *FullBackupStatus { m.mu.Lock() defer m.mu.Unlock() if m.cachedStatus != nil { - // Update dynamic fields that don't need subprocess calls + // Update dynamic fields without subprocess calls m.cachedStatus.Running = m.running m.cachedStatus.NextDBDump = nextDBDump m.cachedStatus.NextBackup = nextBackup m.cachedStatus.LastDBDump = m.lastDBDump m.cachedStatus.LastBackup = m.lastBackup - // Update snapshot history m.cachedStatus.SnapshotHistory = make([]SnapshotRecord, len(m.snapshotHistory)) copy(m.cachedStatus.SnapshotHistory, m.snapshotHistory) + // Apply snapshot reversal + LastBackup synthesis (from Task 1) return m.cachedStatus } - // No cache yet — return a minimal status (first page load before cache is populated) + // No cache yet — return minimal status (before first refresh completes) return &FullBackupStatus{ Enabled: m.cfg.Backup.Enabled, Running: m.running, @@ -178,137 +138,781 @@ func (m *Manager) GetFullStatus(nextDBDump, nextBackup time.Time) *FullBackupSta NextBackup: nextBackup, Retention: m.cfg.Backup.Retention, RepoPath: m.cfg.Backup.ResticRepo, - LastCheckTime: m.lastCheckTime, - LastCheckOK: m.lastCheckOK, - BackupPaths: []string{ - m.cfg.Paths.StacksDir, - m.cfg.Paths.DBDumpDir, - "/opt/docker/felhom-controller/controller.yaml", - }, + BackupPaths: []string{m.cfg.Paths.StacksDir, m.cfg.Paths.DBDumpDir, "/opt/docker/felhom-controller/controller.yaml"}, } } ``` -#### Scheduler integration (in `main.go`): +### Scheduler + lifecycle integration -Register a periodic job that refreshes the backup cache: +In `main.go`: ```go -if cfg.Backup.Enabled && backupMgr != nil { - // ... existing daily jobs ... +// Register cache refresh job (every 5 min) +sched.Every("backup-cache", 5*time.Minute, func(ctx context.Context) error { + nextDBDump := scheduler.NextDailyRun(cfg.Backup.DBDumpSchedule) + nextBackup := scheduler.NextDailyRun(cfg.Backup.ResticSchedule) + backupMgr.RefreshCache(nextDBDump, nextBackup) + return nil +}) - // Cache refresh: every 5 minutes - sched.Every("backup-cache", 5*time.Minute, func(ctx context.Context) error { - nextDBDump := scheduler.NextDailyRun(cfg.Backup.DBDumpSchedule) - nextBackup := scheduler.NextDailyRun(cfg.Backup.ResticSchedule) - backupMgr.RefreshCache(nextDBDump, nextBackup) - return nil - }) -} -``` - -#### Refresh after backup completion: - -At the end of `RunFullBackup()` and `RunBackup()`, call `RefreshCache()` so the page shows updated data immediately after a backup: - -```go -func (m *Manager) RunFullBackup(ctx context.Context) error { - // ... existing logic ... - - // Refresh cache after backup completes - m.RefreshCache( - scheduler.NextDailyRun(m.cfg.Backup.DBDumpSchedule), - scheduler.NextDailyRun(m.cfg.Backup.ResticSchedule), +// Initial cache population (non-blocking) +go func() { + backupMgr.RefreshCache( + scheduler.NextDailyRun(cfg.Backup.DBDumpSchedule), + scheduler.NextDailyRun(cfg.Backup.ResticSchedule), ) - return nil // or the backup error -} +}() ``` -**Note**: `RefreshCache()` needs to import `scheduler.NextDailyRun`. To avoid a circular import (backup → scheduler → backup), either: -- Pass the next run times as parameters (already the pattern used) -- Make `NextDailyRun` a standalone utility function that both packages can import -- Or just call `RefreshCache` from `main.go` via a callback +At the end of `RunFullBackup()` and `RunBackup()`, call `RefreshCache()` so the page shows updated data immediately. -Simplest approach: `RefreshCache` takes `nextDBDump, nextBackup time.Time` params (same as `GetFullStatus`). The scheduler job and the post-backup refresh both compute the times before calling. +**Import note**: `RefreshCache` takes `nextDBDump, nextBackup time.Time` as params to avoid circular import with scheduler package. -#### Initial cache population on startup: - -In `main.go`, after scheduler starts, trigger initial cache refresh in a goroutine (don't block startup): +### Add `CacheTime` to FullBackupStatus ```go -if cfg.Backup.Enabled && backupMgr != nil { - go func() { - nextDBDump := scheduler.NextDailyRun(cfg.Backup.DBDumpSchedule) - nextBackup := scheduler.NextDailyRun(cfg.Backup.ResticSchedule) - backupMgr.RefreshCache(nextDBDump, nextBackup) - }() +type FullBackupStatus struct { + // ... existing ... + CacheTime time.Time // when cache was last refreshed } ``` -### Result - -- Page load: instant (reads cached struct) -- Cache refresh: every 5 minutes in background (user never waits) -- After manual backup: cache refreshes immediately -- First page load after startup: may show minimal data for a few seconds until goroutine completes - -### Cache staleness indicator (optional) - -Add `CacheTime time.Time` to `FullBackupStatus`. The template can optionally show "Utolsó frissítés: X perccel ezelőtt" at the bottom of the page in a muted font. Not critical, but helpful for debugging. +Template can optionally show staleness at the bottom of the page in muted text. --- -## Implementation order +## Task 3: Monitoring Page — Metrics Store + Charts -### Step 1: Protected stack detail pages -1. Fix `data-href` gating in `stacks.html` and `dashboard.html` — use `{{if .Meta.Slug}}` instead of `{{if not .Protected}}` -2. Add "Részletek" button to protected stack section in `stacks.html` -3. Update `install_filebrowser()` in `docker-setup.sh` — add `resources` to `.felhom.yml` +### Architecture -### Step 2: Backup page caching -1. Add `cachedStatus`, `cacheTime` fields to `Manager` -2. Create `RefreshCache()` method -3. Modify `GetFullStatus()` to read from cache -4. Register `backup-cache` scheduler job in `main.go` -5. Call `RefreshCache()` at end of `RunFullBackup()` and `RunBackup()` -6. Add initial cache goroutine in `main.go` +``` + ┌──────────────────────────┐ + │ felhom-controller │ + │ │ + /proc/stat ────────────►│ MetricsCollector │ + /proc/meminfo ─────────►│ (every 60s) │ + /sys/thermal ──────────►│ ↓ │ + docker stats ──────────►│ SQLite DB │ + │ (/data/metrics.db) │ + │ ↓ │ + │ REST API │ + │ /api/metrics/* │ + │ ↓ │ + │ Chart.js in browser │ + └──────────────────────────┘ +``` -### Step 3: Build, deploy, verify -1. Build v0.4.7 -2. Deploy to demo node -3. Update `/opt/docker/stacks/filebrowser/.felhom.yml` on demo node (add resources) -4. Verify FileBrowser card is clickable → detail page with memory badges -5. Verify backup page loads instantly -6. Trigger manual backup → verify page updates after completion +No new containers. Everything runs inside the existing controller. -### Step 4: Documentation +### 3A: SQLite Metrics Store (`internal/metrics/store.go`) + +**Database**: `/opt/docker/felhom-controller/data/metrics.db` (inside the controller-data volume — persists across restarts) + +**Tables**: + +```sql +-- System-wide metrics (1 row per sample) +CREATE TABLE IF NOT EXISTS system_metrics ( + ts INTEGER NOT NULL, -- Unix timestamp + cpu_percent REAL NOT NULL, + mem_used_mb INTEGER NOT NULL, + mem_total_mb INTEGER NOT NULL, + temp_celsius REAL, + load_avg_1 REAL, + load_avg_5 REAL, + load_avg_15 REAL, + disk_used_gb REAL, + disk_total_gb REAL, + hdd_used_gb REAL, + hdd_total_gb REAL +); +CREATE INDEX IF NOT EXISTS idx_system_ts ON system_metrics(ts); + +-- Per-container metrics (1 row per container per sample) +CREATE TABLE IF NOT EXISTS container_metrics ( + ts INTEGER NOT NULL, -- Unix timestamp + container_name TEXT NOT NULL, + cpu_percent REAL NOT NULL, + mem_usage_mb REAL NOT NULL, + mem_limit_mb REAL, + net_rx_bytes INTEGER, + net_tx_bytes INTEGER, + block_read_bytes INTEGER, + block_write_bytes INTEGER +); +CREATE INDEX IF NOT EXISTS idx_container_ts ON container_metrics(ts); +CREATE INDEX IF NOT EXISTS idx_container_name ON container_metrics(container_name, ts); +``` + +**Go struct**: + +```go +type MetricsStore struct { + db *sql.DB + logger *log.Logger +} + +func NewMetricsStore(dbPath string, logger *log.Logger) (*MetricsStore, error) +func (s *MetricsStore) Close() error +func (s *MetricsStore) InsertSystemMetrics(m SystemSample) error +func (s *MetricsStore) InsertContainerMetrics(samples []ContainerSample) error +func (s *MetricsStore) QuerySystemMetrics(from, to time.Time, resolution int) ([]SystemSample, error) +func (s *MetricsStore) QueryContainerMetrics(name string, from, to time.Time, resolution int) ([]ContainerSample, error) +func (s *MetricsStore) QueryContainerSummary() ([]ContainerCurrentStats, error) +func (s *MetricsStore) Prune(olderThan time.Duration) (int64, error) +``` + +**Resolution/downsampling**: The `resolution` parameter controls how many data points to return. For example, resolution=100 means "return ~100 points, averaging intermediate samples". Implementation: compute bucket size as `(to-from)/resolution`, group by `ts/bucketSeconds`, average the values. + +**Auto-prune**: Delete rows older than 30 days. Run daily via scheduler. + +**SQLite pragmas on open**: +```sql +PRAGMA journal_mode=WAL; +PRAGMA synchronous=NORMAL; +PRAGMA busy_timeout=5000; +``` + +WAL mode is critical — allows concurrent reads (page loads) while writer (collector) inserts. Without WAL, page loads would block during writes. + +**Dependencies**: Use `modernc.org/sqlite` (pure Go, no CGO — works in the existing Alpine-based Docker image without extra libs). Add to `go.mod`: +``` +go get modernc.org/sqlite +``` + +If `modernc.org/sqlite` is problematic (large binary size increase), alternative: use `github.com/mattn/go-sqlite3` but this requires CGO and the Dockerfile would need `gcc` + `musl-dev` in the build stage. Prefer `modernc.org/sqlite` for simplicity. + +### 3B: Metrics Collector (`internal/metrics/collector.go`) + +A single goroutine that samples both system and container metrics every 60 seconds: + +```go +type MetricsCollector struct { + store *MetricsStore + cpuCollector *system.CPUCollector + hddPath string + logger *log.Logger + cancel context.CancelFunc +} + +func NewMetricsCollector(store *MetricsStore, cpuCollector *system.CPUCollector, hddPath string, logger *log.Logger) *MetricsCollector +func (c *MetricsCollector) Start(ctx context.Context) +func (c *MetricsCollector) Stop() +``` + +**System sampling** (reuse existing functions): +```go +func (c *MetricsCollector) sampleSystem() SystemSample { + info := system.GetInfo(c.hddPath, c.cpuCollector) + return SystemSample{ + Timestamp: time.Now().Unix(), + CPUPercent: info.CPUPercent, + MemUsedMB: int(info.UsedMemMB), + MemTotalMB: int(info.TotalMemMB), + TempCelsius: info.TemperatureCelsius, + LoadAvg1: info.LoadAvg1, + LoadAvg5: info.LoadAvg5, + LoadAvg15: info.LoadAvg15, + DiskUsedGB: info.DiskUsedGB, + DiskTotalGB: info.DiskTotalGB, + HDDUsedGB: info.HDDUsedGB, + HDDTotalGB: info.HDDTotalGB, + } +} +``` + +**Container sampling** via `docker stats --no-stream`: + +```go +func (c *MetricsCollector) sampleContainers() []ContainerSample { + ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second) + defer cancel() + + cmd := exec.CommandContext(ctx, "docker", "stats", "--no-stream", + "--format", "{{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.MemPerc}}\t{{.NetIO}}\t{{.BlockIO}}") + out, err := cmd.Output() + // Parse each line... +} +``` + +Parse the output fields. `docker stats` returns values like: +- CPU%: "2.50%" → parse as float 2.50 +- MemUsage: "150.5MiB / 512MiB" → parse numerator as MB +- NetIO: "1.5MB / 2.3MB" → parse rx/tx bytes +- BlockIO: "50MB / 10MB" → parse read/write bytes + +**Filter out**: Skip infrastructure containers if desired (felhom-controller itself — avoid self-referential metrics noise). Actually, include all containers including infra — useful for debugging. + +**Collector loop**: +```go +func (c *MetricsCollector) loop(ctx context.Context) { + ticker := time.NewTicker(60 * time.Second) + defer ticker.Stop() + + // Sample immediately on start + c.sample() + + for { + select { + case <-ctx.Done(): + return + case <-ticker.C: + c.sample() + } + } +} + +func (c *MetricsCollector) sample() { + sys := c.sampleSystem() + if err := c.store.InsertSystemMetrics(sys); err != nil { + c.logger.Printf("[WARN] Failed to store system metrics: %v", err) + } + + containers := c.sampleContainers() + if err := c.store.InsertContainerMetrics(containers); err != nil { + c.logger.Printf("[WARN] Failed to store container metrics: %v", err) + } +} +``` + +### 3C: System Info Provider (`internal/metrics/sysinfo.go`) + +Static system information (doesn't change between samples): + +```go +type StaticSystemInfo struct { + Hostname string + OS string // e.g., "Debian GNU/Linux 12 (bookworm)" + Kernel string // e.g., "6.1.0-18-amd64" + Architecture string // e.g., "x86_64" + CPUModel string // e.g., "Intel N100" + CPUCores int // e.g., 4 + Uptime time.Duration + UptimeSince time.Time +} + +func GetStaticInfo() StaticSystemInfo +``` + +Read from: +- Hostname: `os.Hostname()` or `/etc/hostname` +- OS: `/etc/os-release` → `PRETTY_NAME` +- Kernel: `/proc/version` or `uname -r` (read `/proc/sys/kernel/osrelease`) +- Architecture: `runtime.GOARCH` (but for display, read `/proc/cpuinfo` or use `uname -m`) +- CPU model: `/proc/cpuinfo` → `model name` field +- CPU cores: `/proc/cpuinfo` → count `processor` lines, or `runtime.NumCPU()` +- Uptime: `/proc/uptime` → first field (seconds since boot) + +**IMPORTANT**: These are read from *inside the container*. `/proc/uptime`, `/proc/cpuinfo`, and `/proc/version` reflect the **host** (not the container) because they're from the host kernel. `/etc/os-release` inside the container shows the container's OS (Debian), not the host's. For displaying host OS: +- Mount `/etc/os-release` read-only from host: add to docker-compose.yml +- Or: Read from `/host/etc/os-release` with a `/etc:/host/etc:ro` mount + +**Decision**: Add `/etc/os-release:/host/etc/os-release:ro` to docker-compose.yml. Read host OS from `/host/etc/os-release`, fall back to container's `/etc/os-release`. + +### 3D: REST API Endpoints (`internal/api/router.go`) + +New endpoints: + +``` +GET /api/metrics/system?range=24h&resolution=200 +GET /api/metrics/system?from=2026-02-15T00:00:00Z&to=2026-02-16T00:00:00Z&resolution=200 +GET /api/metrics/containers/summary +GET /api/metrics/containers/{name}?range=7d&resolution=200 +GET /api/metrics/sysinfo +``` + +**Range presets**: `1h`, `6h`, `24h`, `7d`, `30d` — parsed in the handler and converted to `from`/`to` timestamps. + +**Response format** for system metrics: +```json +{ + "ok": true, + "data": { + "labels": [1708041600, 1708041660, ...], + "cpu": [5.2, 4.8, 6.1, ...], + "memory": [3200, 3250, 3180, ...], + "temp": [42, 41, 43, ...], + "load1": [0.3, 0.2, 0.4, ...] + } +} +``` + +Flat arrays are more efficient for Chart.js than arrays of objects. + +**Response format** for container summary: +```json +{ + "ok": true, + "data": [ + {"name": "immich-server", "cpu_percent": 2.5, "mem_usage_mb": 350, "mem_limit_mb": 2048}, + {"name": "paperless-webserver", "cpu_percent": 0.1, "mem_usage_mb": 280, "mem_limit_mb": 1024} + ] +} +``` + +### 3E: Monitoring Page Template (`internal/web/templates/monitoring.html`) + +Four sections: + +#### Section 1: Rendszer áttekintés (System Overview) + +Static system info card: + +``` +╔══════════════════════════════════════════════════════╗ +║ Rendszer áttekintés ║ +╠══════════════════════════════════════════════════════╣ +║ ║ +║ Gépnév: demo-felhom ║ +║ Operációs rendszer: Debian GNU/Linux 12 (bookworm) ║ +║ Kernel: 6.1.0-18-amd64 ║ +║ Processzor: Intel N100 (4 mag) ║ +║ Üzemidő: 15 nap, 3 óra ║ +║ Indítás: 2026-02-01 05:12 ║ +║ ║ +╚══════════════════════════════════════════════════════╝ +``` + +Hungarian labels: +- "Rendszer áttekintés" = System overview +- "Gépnév" = Hostname +- "Operációs rendszer" = Operating system +- "Kernel" = Kernel +- "Processzor" = Processor +- "mag" = cores +- "Üzemidő" = Uptime +- "nap" = days, "óra" = hours +- "Indítás" = Started at + +#### Section 2: Rendszer metrikák (System Metrics) — Charts + +Time range selector (pill buttons like filter bar): + +``` +[ 1 óra ] [ 6 óra ] [ 24 óra ] [ 7 nap ] [ 30 nap ] +``` + +Four charts in a 2×2 grid: + +``` +┌─────────────────────────┐ ┌─────────────────────────┐ +│ CPU használat (%) │ │ Memória használat (GB) │ +│ ▁▂▃▂▁▂▄▃▂▁▃▂▁▂▃ │ │ ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ │ +│ │ │ │ +└─────────────────────────┘ └─────────────────────────┘ +┌─────────────────────────┐ ┌─────────────────────────┐ +│ Hőmérséklet (°C) │ │ Terhelés (Load Average) │ +│ ▁▁▁▁▁▂▂▁▁▁▁▁▁▁▁ │ │ ▁▂▃▂▁▂▁▂▃▂▁▂▁▂▁ │ +│ │ │ │ +└─────────────────────────┘ └─────────────────────────┘ +``` + +Chart.js line charts with: +- Dark theme styling (match site theme — dark background, light grid, colored lines) +- Tooltips showing exact value + timestamp +- Y-axis: CPU 0-100%, Memory in GB (auto-scale), Temp in °C, Load auto-scale +- X-axis: time labels, format varies by range (HH:MM for 1h-24h, MM-DD for 7d-30d) +- Fill under the line with low-opacity color +- Responsive (resize with container) + +#### Section 3: Alkalmazás erőforrások (Application Resources) — Current snapshot + +Bar chart showing per-container resource usage RIGHT NOW: + +``` +╔══════════════════════════════════════════════════════════════╗ +║ Alkalmazás erőforrások ║ +╠══════════════════════════════════════════════════════════════╣ +║ ║ +║ CPU használat Memória használat ║ +║ immich-server ████████░░ 8.2% ████████░ 350 MB ║ +║ paperless-web ██░░░░░░░░ 1.5% ██████░░░ 280 MB ║ +║ immich-ml █░░░░░░░░░ 0.8% ████░░░░░ 180 MB ║ +║ romm █░░░░░░░░░ 0.3% ██░░░░░░░ 90 MB ║ +║ filebrowser ░░░░░░░░░░ 0.1% █░░░░░░░░ 25 MB ║ +║ ║ +╚══════════════════════════════════════════════════════════════╝ +``` + +Two horizontal bar charts side by side (Chart.js horizontal bar). Container names on the Y-axis. Sorted by CPU usage descending. + +Each container name is a clickable link that opens the detail view (Section 4 below). + +#### Section 4: Per-container detail (expandable / click-to-show) + +When clicking a container name, show a historical chart panel below the bar chart (or in a modal/expanded section): + +``` +╔══════════════════════════════════════════════════════════════╗ +║ immich-server — Erőforrás előzmények ║ +║ [1 óra] [6 óra] [24 óra] [7 nap] ║ +║ ║ +║ ┌─────────────────────┐ ┌─────────────────────┐ ║ +║ │ CPU % (line chart) │ │ Memória MB (line) │ ║ +║ └─────────────────────┘ └─────────────────────┘ ║ +║ ║ +╚══════════════════════════════════════════════════════════════╝ +``` + +Implementation: When a container name is clicked, JS fetches `GET /api/metrics/containers/{name}?range=24h&resolution=150` and renders two Chart.js line charts in an expandable panel below the bar charts. + +#### Section 5: Tárhely (Storage overview) + +Disk usage bars for all mounted filesystems: + +``` +╔══════════════════════════════════════════════════════════════╗ +║ Tárhely ║ +╠══════════════════════════════════════════════════════════════╣ +║ ║ +║ SSD (/) ████░░░░░░░░░░░ 17.5 GB / 460 GB (4%) ║ +║ Külső HDD (/mnt) ████████░░░░░░░ 500 GB / 1000 GB (50%) ║ +║ ║ +╚══════════════════════════════════════════════════════════════╝ +``` + +Reuse the progress bar styling from the dashboard's system info card. + +### 3F: Chart.js Integration + +**CDN import**: Include Chart.js from CDN in the monitoring page template (not site-wide — only needed on this page): + +```html + +``` + +Wait — the controller runs on customer hardware that may not have internet access (Cloudflare Tunnel handles external requests, but internal pages are loaded locally). The Chart.js library must be **embedded** or served locally. + +**Options**: +1. Download `chart.umd.min.js` (~200KB) and embed it via `//go:embed` in the binary, serve at `/static/chart.min.js` +2. Include it in the Docker image at build time + +**Decision**: Download chart.umd.min.js, place it at `internal/web/static/chart.min.js`, embed it alongside the templates. Serve it at `/static/chart.min.js`. + +Actually, the existing static serving only handles `style.css` and `felhom-logo.svg` via hardcoded handlers. Either: +- Add another hardcoded handler for `/static/chart.min.js` +- Or switch to a proper embedded static FS + +For simplicity, add another handler: +```go +case path == "/static/chart.min.js": + s.serveChartJS(w, r) +``` + +### 3G: Docker Compose changes + +Add host OS release mount: +```yaml +volumes: + # ... existing ... + # Host OS info — for monitoring page system info + - /etc/os-release:/host/etc/os-release:ro +``` + +### 3H: Config additions + +No new config needed. The metrics collection interval (60s) and retention (30 days) can be hardcoded for v0.5.0. Make them configurable later if needed. + +--- + +## Navigation + +### Sidebar (layout.html) + +Add fourth nav item: + +```html + +``` + +### Web route (server.go ServeHTTP) + +```go +case path == "/monitoring": + s.monitoringHandler(w, r) +``` + +### Handler (handlers.go) + +```go +func (s *Server) monitoringHandler(w http.ResponseWriter, _ *http.Request) { + data := s.baseData("monitoring", "Rendszermonitor") + data["SysInfo"] = metrics.GetStaticInfo() + data["SystemInfo"] = system.GetInfo(s.cfg.Paths.HDDPath, s.cpuCollector) + s.render(w, "monitoring", data) +} +``` + +The page itself is mostly JS-driven — static info is server-rendered, charts fetch data via API calls after page load. + +--- + +## Scheduler Integration + +New jobs in `main.go`: + +```go +// Metrics collection — every 60s +sched.Every("metrics-collect", 60*time.Second, func(ctx context.Context) error { + metricsCollector.Sample() + return nil +}) + +// Metrics pruning — daily at 04:00 +sched.Daily("metrics-prune", "04:00", func(ctx context.Context) error { + deleted, err := metricsStore.Prune(30 * 24 * time.Hour) + if err != nil { + return err + } + logger.Printf("[INFO] Pruned %d old metric rows", deleted) + return nil +}) +``` + +Note: The collector could run its own internal ticker, but using the scheduler is cleaner and consistent with the rest of the codebase. However, 60s is right at the scheduler's "quiet mode" boundary (30s). Make sure it doesn't spam logs — set quiet mode threshold to 60s or make the metrics job quiet explicitly. + +Actually, better approach: have the collector run its own internal ticker (like CPUCollector does), not via the scheduler. The scheduler is designed for heavier tasks. The collector is a lightweight background loop. Register prune-only via scheduler. + +```go +metricsCollector.Start(ctx) // starts internal 60s ticker +defer metricsCollector.Stop() + +sched.Daily("metrics-prune", "04:00", func(ctx context.Context) error { ... }) +``` + +--- + +## Implementation Order + +### Step 1: SQLite metrics store +1. Add `modernc.org/sqlite` to `go.mod` +2. Create `internal/metrics/store.go` — schema, CRUD, prune, query with downsampling +3. Write basic tests if convenient + +### Step 2: Static system info +1. Create `internal/metrics/sysinfo.go` — read hostname, OS, kernel, CPU, uptime +2. Handle Docker mount of `/etc/os-release` + +### Step 3: Metrics collector +1. Create `internal/metrics/collector.go` — system + container sampling +2. Parse `docker stats --no-stream` output +3. Start collector in `main.go`, register prune job + +### Step 4: REST API endpoints +1. Add `/api/metrics/*` routes to `router.go` +2. Implement handlers: system metrics, container summary, container history, sysinfo +3. Wire `MetricsStore` into API router + +### Step 5: Chart.js embedding +1. Download `chart.umd.min.js` (v4.4.x) +2. Place in `internal/web/static/` (or alongside templates) +3. Add serving handler in `server.go` + +### Step 6: Monitoring page template + CSS +1. Create `internal/web/templates/monitoring.html` — all 5 sections +2. Add JavaScript for Chart.js rendering, time range switching, container detail expand +3. Add CSS styles to `style.css` +4. Update `layout.html` — add sidebar nav item + +### Step 7: Backup page fixes (Tasks 1 + 2) +1. Fix "Helyi mentés" in `GetFullStatus()` — synthesize from snapshot history +2. Add caching: `RefreshCache()`, modify `GetFullStatus()`, register scheduler job +3. Add initial cache goroutine in `main.go` + +### Step 8: Docker Compose changes +1. Add `/etc/os-release:/host/etc/os-release:ro` mount +2. Update `controller.yaml.example` if needed + +### Step 9: Build, deploy, verify +1. Build v0.5.0 +2. Deploy to demo node (sync full docker-compose.yml) +3. Verify backup page loads instantly +4. Verify "Helyi mentés" shows green after restart +5. Verify monitoring page renders with system info +6. Wait 5 minutes for metrics to accumulate +7. Verify charts render with data +8. Verify container bar charts show current usage +9. Click a container → verify historical chart loads +10. Test time range switching (1h/6h/24h/7d/30d) + +### Step 10: Documentation 1. Update CONTEXT.md, README 2. Bump version --- +## Files to create + +``` +internal/metrics/store.go — SQLite metrics store +internal/metrics/collector.go — System + container metrics collector +internal/metrics/sysinfo.go — Static system info (OS, kernel, CPU, uptime) +internal/metrics/types.go — Shared types (SystemSample, ContainerSample, etc.) +internal/web/templates/monitoring.html — Monitoring page template +internal/web/static/chart.min.js — Chart.js library (embedded) +``` + +Note: `chart.min.js` can be placed alongside templates if using a shared `//go:embed` directive, or in a separate `static/` embed. Check existing embed patterns in `embed.go`. + ## Files to modify ``` -internal/backup/backup.go — add cachedStatus, RefreshCache(), modify GetFullStatus() -internal/web/templates/stacks.html — fix data-href gating, add Részletek button -internal/web/templates/dashboard.html — fix data-href gating -scripts/docker-setup.sh — add resources to filebrowser .felhom.yml -cmd/controller/main.go — register backup-cache job, initial goroutine +internal/backup/backup.go — Fix Helyi mentés synthesis + add caching +internal/api/router.go — Add /api/metrics/* endpoints +internal/web/server.go — Add /monitoring route, /static/chart.min.js handler, accept metricsStore +internal/web/handlers.go — Add monitoringHandler() +internal/web/templates/layout.html — Add sidebar nav item +internal/web/templates/style.css — Monitoring page styles (charts, tables, bars) +internal/web/embed.go — Include chart.min.js in embedded FS (if needed) +cmd/controller/main.go — Wire MetricsStore + collector, backup cache job, initial cache goroutine +controller/docker-compose.yml — Add /etc/os-release mount +go.mod — Add modernc.org/sqlite dependency +Dockerfile — May need adjustments for SQLite (verify modernc.org/sqlite works with current build) ``` --- -## Verification checklist +## Data Types Reference -- [ ] FileBrowser card on Alkalmazások page is clickable → opens `/apps/filebrowser` -- [ ] FileBrowser has "Részletek" button next to "Újraindítás" -- [ ] FileBrowser detail page shows ~128M / HDD szükséges / Pi kompatibilis badges -- [ ] FileBrowser detail page shows use cases, first steps, prerequisites -- [ ] Traefik/cloudflared cards do NOT show "Részletek" (no .felhom.yml/slug) -- [ ] Felhom-controller card does NOT show "Részletek" (no .felhom.yml) -- [ ] Backup page loads in <500ms (instant, cached) -- [ ] Backup page shows correct data after initial cache population -- [ ] Manual backup → page shows updated data after completion -- [ ] Cache refreshes every 5 minutes (check logs for "[INFO] Backup status cache refreshed") -- [ ] No regressions on dashboard, app detail pages, deploy flow \ No newline at end of file +```go +// internal/metrics/types.go + +type SystemSample struct { + Timestamp int64 `json:"ts"` + CPUPercent float64 `json:"cpu"` + MemUsedMB int `json:"mem_used"` + MemTotalMB int `json:"mem_total"` + TempCelsius float64 `json:"temp"` + LoadAvg1 float64 `json:"load1"` + LoadAvg5 float64 `json:"load5"` + LoadAvg15 float64 `json:"load15"` + DiskUsedGB float64 `json:"disk_used"` + DiskTotalGB float64 `json:"disk_total"` + HDDUsedGB float64 `json:"hdd_used"` + HDDTotalGB float64 `json:"hdd_total"` +} + +type ContainerSample struct { + Timestamp int64 `json:"ts"` + ContainerName string `json:"name"` + CPUPercent float64 `json:"cpu"` + MemUsageMB float64 `json:"mem_usage"` + MemLimitMB float64 `json:"mem_limit"` + NetRxBytes int64 `json:"net_rx"` + NetTxBytes int64 `json:"net_tx"` + BlockReadBytes int64 `json:"blk_read"` + BlockWriteBytes int64 `json:"blk_write"` +} + +type ContainerCurrentStats struct { + ContainerName string `json:"name"` + CPUPercent float64 `json:"cpu_percent"` + MemUsageMB float64 `json:"mem_usage_mb"` + MemLimitMB float64 `json:"mem_limit_mb"` +} + +type StaticSystemInfo struct { + Hostname string `json:"hostname"` + OS string `json:"os"` + Kernel string `json:"kernel"` + Architecture string `json:"architecture"` + CPUModel string `json:"cpu_model"` + CPUCores int `json:"cpu_cores"` + UptimeSeconds int64 `json:"uptime_seconds"` + BootTime time.Time `json:"boot_time"` +} +``` + +--- + +## Chart.js Dark Theme Configuration + +All charts should use a consistent dark theme: + +```javascript +const chartDefaults = { + backgroundColor: 'rgba(0, 136, 204, 0.1)', // accent-blue with low opacity + borderColor: '#0088cc', // accent-blue + borderWidth: 2, + pointRadius: 0, // hide dots (too many points) + pointHitRadius: 10, // but clickable for tooltips + tension: 0.3, // smooth curves + fill: true +}; + +const chartOptions = { + responsive: true, + maintainAspectRatio: false, + plugins: { + legend: { display: false }, + tooltip: { + backgroundColor: '#1c2128', + titleColor: '#e6edf3', + bodyColor: '#8b949e', + borderColor: '#30363d', + borderWidth: 1 + } + }, + scales: { + x: { + grid: { color: 'rgba(48,54,61,0.5)' }, + ticks: { color: '#8b949e', maxTicksLimit: 8 } + }, + y: { + grid: { color: 'rgba(48,54,61,0.5)' }, + ticks: { color: '#8b949e' }, + beginAtZero: true + } + } +}; +``` + +Different chart colors per metric: +- CPU: `#0088cc` (blue) +- Memory: `#238636` (green) +- Temperature: `#d29922` (yellow) +- Load: `#db6d28` (orange) +- Container CPU bars: `#0088cc` (blue) +- Container Memory bars: `#238636` (green) + +--- + +## Edge Cases + +- **No metrics yet**: First load after deploy — charts show "Még nincsenek adatok" (No data yet) message +- **Container appears/disappears**: Containers may start/stop between samples. The summary shows only currently running containers. Historical charts handle gaps gracefully (Chart.js `spanGaps: true`) +- **Pi 3B+ with 1G RAM**: SQLite + metrics should be lightweight. 60s interval = 1440 system rows/day + ~10k container rows/day (10 containers). With 30-day retention ≈ 350k rows max. SQLite handles this easily. +- **Docker socket permissions**: `docker stats` already works because the socket is mounted. No new permissions needed. +- **Timezone**: All timestamps stored as UTC Unix epoch. Chart.js tooltip formatter converts to Europe/Budapest for display. +- **Chart.js file size**: ~200KB minified. Acceptable for a single page load. Only loaded on the monitoring page. + +--- + +## Verification Checklist + +### Backup fixes +- [ ] "Helyi mentés" shows green ✓ after controller restart (not "–") +- [ ] Backup page loads in <500ms (cached, no subprocess calls) +- [ ] Cache refreshes after manual backup ("Mentés most") +- [ ] Backup page still shows correct data after cache refresh + +### Monitoring page +- [ ] Sidebar shows 4 nav items, "Rendszermonitor" is highlighted when active +- [ ] System overview card shows hostname, OS, kernel, CPU model, cores, uptime +- [ ] System metrics charts render (CPU, Memory, Temperature, Load) +- [ ] Time range buttons work (1h/6h/24h/7d/30d) — charts update +- [ ] Container resource bar charts show current per-container CPU + memory +- [ ] Clicking a container name shows historical charts +- [ ] Storage section shows SSD + HDD usage bars +- [ ] Charts use dark theme matching site design +- [ ] Page works on mobile (responsive charts) +- [ ] Metrics accumulate over time (check after 10+ minutes) +- [ ] SQLite DB created in data volume (survives restart) +- [ ] Metrics prune job runs daily (check scheduler logs) +- [ ] No regressions on dashboard, apps, backup pages \ No newline at end of file