Files
deploy-felhom-compose/TASK.md
T

35 KiB
Raw Blame History

TASK.md — v0.5.0: Backup Bugfixes + Monitoring Page with Metrics Store

Version bump: v0.5.0 Scope: 2 backup fixes + new metrics subsystem + new monitoring page


Overview

  1. Bugfix: "Helyi mentés" shows "" after controller restart (in-memory LastBackup lost)
  2. Performance: Backup page caching — GetFullStatus() calls restic/docker on every page load (3-4s)
  3. Feature: New Rendszermonitor (System Monitor) page with historical metrics stored in SQLite, rendered with Chart.js

Task 1: Fix "Helyi mentés" status after restart

Problem

After controller restart, LastBackup is nil (in-memory only). The template checks {{if .Backup.LastBackup}} and falls through to the "" state. However, snapshotHistory IS loaded on startup from LoadSnapshotHistory() — so we know backups exist.

Fix

In GetFullStatus(), after copying snapshot history, synthesize a LastBackup if it's nil but snapshots exist:

// After m.mu.Unlock() and snapshot reversal...

// Synthesize LastBackup from snapshot history if not in memory (e.g., after restart)
if status.LastBackup == nil && len(status.SnapshotHistory) > 0 {
    latest := status.SnapshotHistory[0] // already reversed, newest first
    status.LastBackup = &BackupStatus{
        LastRun:  latest.Time,
        Success:  latest.Success,
        Snapshot: &SnapshotResult{
            SnapshotID: latest.SnapshotID,
        },
    }
}

Also do the same for LastDBDump — synthesize from DumpFiles on disk:

if status.LastDBDump == nil && len(status.DumpFiles) > 0 {
    var results []DumpResult
    var latestTime time.Time
    for _, f := range status.DumpFiles {
        results = append(results, DumpResult{
            DB:       DiscoveredDB{StackName: f.StackName, DBType: f.DBType, ContainerName: f.StackName},
            FilePath: f.FileName,
            Size:     f.Size,
        })
        if f.ModTime.After(latestTime) {
            latestTime = f.ModTime
        }
    }
    status.LastDBDump = &DBDumpStatus{
        LastRun: latestTime,
        Results: results,
        Success: true,
    }
}

Verification

Restart the controller (docker compose restart), then load the backup page. "Helyi mentés" should show green checkmark "aktív" (not "").


Task 2: Backup page caching

Problem

GetFullStatus() runs restic stats --json, restic snapshots --json, docker ps, and docker inspect synchronously on every page load of /backups. Takes 3-4 seconds.

Fix: Background cache with periodic refresh

<!!! Should be already implemented, verify only !!!>

Add cached status to Manager:

type Manager struct {
    // ... existing fields ...
    cachedStatus *FullBackupStatus
    cacheTime    time.Time
}

New method:

// RefreshCache updates the cached backup status in the background.
// Called by scheduler every 5 minutes and after each backup run.
func (m *Manager) RefreshCache(nextDBDump, nextBackup time.Time) {
    // Execute all the expensive calls (restic stats, docker ps, etc.)
    // Same logic currently in GetFullStatus()...
    status := &FullBackupStatus{ ... }

    m.mu.Lock()
    m.cachedStatus = status
    m.cacheTime = time.Now()
    m.mu.Unlock()
}

Modified GetFullStatus() — reads from cache, updates only cheap dynamic fields:

func (m *Manager) GetFullStatus(nextDBDump, nextBackup time.Time) *FullBackupStatus {
    m.mu.Lock()
    defer m.mu.Unlock()

    if m.cachedStatus != nil {
        // Update dynamic fields without subprocess calls
        m.cachedStatus.Running = m.running
        m.cachedStatus.NextDBDump = nextDBDump
        m.cachedStatus.NextBackup = nextBackup
        m.cachedStatus.LastDBDump = m.lastDBDump
        m.cachedStatus.LastBackup = m.lastBackup
        m.cachedStatus.SnapshotHistory = make([]SnapshotRecord, len(m.snapshotHistory))
        copy(m.cachedStatus.SnapshotHistory, m.snapshotHistory)
        // Apply snapshot reversal + LastBackup synthesis (from Task 1)
        return m.cachedStatus
    }

    // No cache yet — return minimal status (before first refresh completes)
    return &FullBackupStatus{
        Enabled:        m.cfg.Backup.Enabled,
        Running:        m.running,
        LastDBDump:     m.lastDBDump,
        LastBackup:     m.lastBackup,
        DBDumpSchedule: m.cfg.Backup.DBDumpSchedule,
        ResticSchedule: m.cfg.Backup.ResticSchedule,
        PruneSchedule:  m.cfg.Backup.PruneSchedule,
        NextDBDump:     nextDBDump,
        NextBackup:     nextBackup,
        Retention:      m.cfg.Backup.Retention,
        RepoPath:       m.cfg.Backup.ResticRepo,
        BackupPaths: []string{m.cfg.Paths.StacksDir, m.cfg.Paths.DBDumpDir, "/opt/docker/felhom-controller/controller.yaml"},
    }
}

Scheduler + lifecycle integration

In main.go:

// Register cache refresh job (every 5 min)
sched.Every("backup-cache", 5*time.Minute, func(ctx context.Context) error {
    nextDBDump := scheduler.NextDailyRun(cfg.Backup.DBDumpSchedule)
    nextBackup := scheduler.NextDailyRun(cfg.Backup.ResticSchedule)
    backupMgr.RefreshCache(nextDBDump, nextBackup)
    return nil
})

// Initial cache population (non-blocking)
go func() {
    backupMgr.RefreshCache(
        scheduler.NextDailyRun(cfg.Backup.DBDumpSchedule),
        scheduler.NextDailyRun(cfg.Backup.ResticSchedule),
    )
}()

At the end of RunFullBackup() and RunBackup(), call RefreshCache() so the page shows updated data immediately.

Import note: RefreshCache takes nextDBDump, nextBackup time.Time as params to avoid circular import with scheduler package.

Add CacheTime to FullBackupStatus

type FullBackupStatus struct {
    // ... existing ...
    CacheTime time.Time  // when cache was last refreshed
}

Template can optionally show staleness at the bottom of the page in muted text.


Task 3: Monitoring Page — Metrics Store + Charts

Architecture

                          ┌──────────────────────────┐
                          │  felhom-controller       │
                          │                          │
  /proc/stat ────────────►│  MetricsCollector        │
  /proc/meminfo ─────────►│  (every 60s)             │
  /sys/thermal ──────────►│    ↓                     │
  docker stats ──────────►│  SQLite DB               │
                          │  (/data/metrics.db)      │
                          │    ↓                     │
                          │  REST API                │
                          │  /api/metrics/*          │
                          │    ↓                     │
                          │  Chart.js in browser     │
                          └──────────────────────────┘

No new containers. Everything runs inside the existing controller.

3A: SQLite Metrics Store (internal/metrics/store.go)

Database: /opt/docker/felhom-controller/data/metrics.db (inside the controller-data volume — persists across restarts)

Tables:

-- System-wide metrics (1 row per sample)
CREATE TABLE IF NOT EXISTS system_metrics (
    ts          INTEGER NOT NULL,  -- Unix timestamp
    cpu_percent REAL    NOT NULL,
    mem_used_mb INTEGER NOT NULL,
    mem_total_mb INTEGER NOT NULL,
    temp_celsius REAL,
    load_avg_1  REAL,
    load_avg_5  REAL,
    load_avg_15 REAL,
    disk_used_gb REAL,
    disk_total_gb REAL,
    hdd_used_gb  REAL,
    hdd_total_gb REAL
);
CREATE INDEX IF NOT EXISTS idx_system_ts ON system_metrics(ts);

-- Per-container metrics (1 row per container per sample)
CREATE TABLE IF NOT EXISTS container_metrics (
    ts             INTEGER NOT NULL,  -- Unix timestamp
    container_name TEXT    NOT NULL,
    cpu_percent    REAL    NOT NULL,
    mem_usage_mb   REAL    NOT NULL,
    mem_limit_mb   REAL,
    net_rx_bytes   INTEGER,
    net_tx_bytes   INTEGER,
    block_read_bytes  INTEGER,
    block_write_bytes INTEGER
);
CREATE INDEX IF NOT EXISTS idx_container_ts ON container_metrics(ts);
CREATE INDEX IF NOT EXISTS idx_container_name ON container_metrics(container_name, ts);

Go struct:

type MetricsStore struct {
    db     *sql.DB
    logger *log.Logger
}

func NewMetricsStore(dbPath string, logger *log.Logger) (*MetricsStore, error)
func (s *MetricsStore) Close() error
func (s *MetricsStore) InsertSystemMetrics(m SystemSample) error
func (s *MetricsStore) InsertContainerMetrics(samples []ContainerSample) error
func (s *MetricsStore) QuerySystemMetrics(from, to time.Time, resolution int) ([]SystemSample, error)
func (s *MetricsStore) QueryContainerMetrics(name string, from, to time.Time, resolution int) ([]ContainerSample, error)
func (s *MetricsStore) QueryContainerSummary() ([]ContainerCurrentStats, error)
func (s *MetricsStore) Prune(olderThan time.Duration) (int64, error)

Resolution/downsampling: The resolution parameter controls how many data points to return. For example, resolution=100 means "return ~100 points, averaging intermediate samples". Implementation: compute bucket size as (to-from)/resolution, group by ts/bucketSeconds, average the values.

Auto-prune: Delete rows older than 30 days. Run daily via scheduler.

SQLite pragmas on open:

PRAGMA journal_mode=WAL;
PRAGMA synchronous=NORMAL;
PRAGMA busy_timeout=5000;

WAL mode is critical — allows concurrent reads (page loads) while writer (collector) inserts. Without WAL, page loads would block during writes.

Dependencies: Use modernc.org/sqlite (pure Go, no CGO — works in the existing Alpine-based Docker image without extra libs). Add to go.mod:

go get modernc.org/sqlite

If modernc.org/sqlite is problematic (large binary size increase), alternative: use github.com/mattn/go-sqlite3 but this requires CGO and the Dockerfile would need gcc + musl-dev in the build stage. Prefer modernc.org/sqlite for simplicity.

3B: Metrics Collector (internal/metrics/collector.go)

A single goroutine that samples both system and container metrics every 60 seconds:

type MetricsCollector struct {
    store        *MetricsStore
    cpuCollector *system.CPUCollector
    hddPath      string
    logger       *log.Logger
    cancel       context.CancelFunc
}

func NewMetricsCollector(store *MetricsStore, cpuCollector *system.CPUCollector, hddPath string, logger *log.Logger) *MetricsCollector
func (c *MetricsCollector) Start(ctx context.Context)
func (c *MetricsCollector) Stop()

System sampling (reuse existing functions):

func (c *MetricsCollector) sampleSystem() SystemSample {
    info := system.GetInfo(c.hddPath, c.cpuCollector)
    return SystemSample{
        Timestamp:   time.Now().Unix(),
        CPUPercent:  info.CPUPercent,
        MemUsedMB:   int(info.UsedMemMB),
        MemTotalMB:  int(info.TotalMemMB),
        TempCelsius: info.TemperatureCelsius,
        LoadAvg1:    info.LoadAvg1,
        LoadAvg5:    info.LoadAvg5,
        LoadAvg15:   info.LoadAvg15,
        DiskUsedGB:  info.DiskUsedGB,
        DiskTotalGB: info.DiskTotalGB,
        HDDUsedGB:   info.HDDUsedGB,
        HDDTotalGB:  info.HDDTotalGB,
    }
}

Container sampling via docker stats --no-stream:

func (c *MetricsCollector) sampleContainers() []ContainerSample {
    ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
    defer cancel()

    cmd := exec.CommandContext(ctx, "docker", "stats", "--no-stream",
        "--format", "{{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.MemPerc}}\t{{.NetIO}}\t{{.BlockIO}}")
    out, err := cmd.Output()
    // Parse each line...
}

Parse the output fields. docker stats returns values like:

  • CPU%: "2.50%" → parse as float 2.50
  • MemUsage: "150.5MiB / 512MiB" → parse numerator as MB
  • NetIO: "1.5MB / 2.3MB" → parse rx/tx bytes
  • BlockIO: "50MB / 10MB" → parse read/write bytes

Filter out: Skip infrastructure containers if desired (felhom-controller itself — avoid self-referential metrics noise). Actually, include all containers including infra — useful for debugging.

Collector loop:

func (c *MetricsCollector) loop(ctx context.Context) {
    ticker := time.NewTicker(60 * time.Second)
    defer ticker.Stop()

    // Sample immediately on start
    c.sample()

    for {
        select {
        case <-ctx.Done():
            return
        case <-ticker.C:
            c.sample()
        }
    }
}

func (c *MetricsCollector) sample() {
    sys := c.sampleSystem()
    if err := c.store.InsertSystemMetrics(sys); err != nil {
        c.logger.Printf("[WARN] Failed to store system metrics: %v", err)
    }

    containers := c.sampleContainers()
    if err := c.store.InsertContainerMetrics(containers); err != nil {
        c.logger.Printf("[WARN] Failed to store container metrics: %v", err)
    }
}

3C: System Info Provider (internal/metrics/sysinfo.go)

Static system information (doesn't change between samples):

type StaticSystemInfo struct {
    Hostname     string
    OS           string  // e.g., "Debian GNU/Linux 12 (bookworm)"
    Kernel       string  // e.g., "6.1.0-18-amd64"
    Architecture string  // e.g., "x86_64"
    CPUModel     string  // e.g., "Intel N100"
    CPUCores     int     // e.g., 4
    Uptime       time.Duration
    UptimeSince  time.Time
}

func GetStaticInfo() StaticSystemInfo

Read from:

  • Hostname: os.Hostname() or /etc/hostname
  • OS: /etc/os-releasePRETTY_NAME
  • Kernel: /proc/version or uname -r (read /proc/sys/kernel/osrelease)
  • Architecture: runtime.GOARCH (but for display, read /proc/cpuinfo or use uname -m)
  • CPU model: /proc/cpuinfomodel name field
  • CPU cores: /proc/cpuinfo → count processor lines, or runtime.NumCPU()
  • Uptime: /proc/uptime → first field (seconds since boot)

IMPORTANT: These are read from inside the container. /proc/uptime, /proc/cpuinfo, and /proc/version reflect the host (not the container) because they're from the host kernel. /etc/os-release inside the container shows the container's OS (Debian), not the host's. For displaying host OS:

  • Mount /etc/os-release read-only from host: add to docker-compose.yml
  • Or: Read from /host/etc/os-release with a /etc:/host/etc:ro mount

Decision: Add /etc/os-release:/host/etc/os-release:ro to docker-compose.yml. Read host OS from /host/etc/os-release, fall back to container's /etc/os-release.

3D: REST API Endpoints (internal/api/router.go)

New endpoints:

GET /api/metrics/system?range=24h&resolution=200
GET /api/metrics/system?from=2026-02-15T00:00:00Z&to=2026-02-16T00:00:00Z&resolution=200
GET /api/metrics/containers/summary
GET /api/metrics/containers/{name}?range=7d&resolution=200
GET /api/metrics/sysinfo

Range presets: 1h, 6h, 24h, 7d, 30d — parsed in the handler and converted to from/to timestamps.

Response format for system metrics:

{
  "ok": true,
  "data": {
    "labels": [1708041600, 1708041660, ...],
    "cpu":    [5.2, 4.8, 6.1, ...],
    "memory": [3200, 3250, 3180, ...],
    "temp":   [42, 41, 43, ...],
    "load1":  [0.3, 0.2, 0.4, ...]
  }
}

Flat arrays are more efficient for Chart.js than arrays of objects.

Response format for container summary:

{
  "ok": true,
  "data": [
    {"name": "immich-server", "cpu_percent": 2.5, "mem_usage_mb": 350, "mem_limit_mb": 2048},
    {"name": "paperless-webserver", "cpu_percent": 0.1, "mem_usage_mb": 280, "mem_limit_mb": 1024}
  ]
}

3E: Monitoring Page Template (internal/web/templates/monitoring.html)

Four sections:

Section 1: Rendszer áttekintés (System Overview)

Static system info card:

╔══════════════════════════════════════════════════════╗
║  Rendszer áttekintés                                 ║
╠══════════════════════════════════════════════════════╣
║                                                      ║
║  Gépnév:          demo-felhom                        ║
║  Operációs rendszer: Debian GNU/Linux 12 (bookworm)  ║
║  Kernel:           6.1.0-18-amd64                    ║
║  Processzor:       Intel N100 (4 mag)                ║
║  Üzemidő:          15 nap, 3 óra                     ║
║  Indítás:          2026-02-01 05:12                   ║
║                                                      ║
╚══════════════════════════════════════════════════════╝

Hungarian labels:

  • "Rendszer áttekintés" = System overview
  • "Gépnév" = Hostname
  • "Operációs rendszer" = Operating system
  • "Kernel" = Kernel
  • "Processzor" = Processor
  • "mag" = cores
  • "Üzemidő" = Uptime
  • "nap" = days, "óra" = hours
  • "Indítás" = Started at

Section 2: Rendszer metrikák (System Metrics) — Charts

Time range selector (pill buttons like filter bar):

[ 1 óra ] [ 6 óra ] [ 24 óra ] [ 7 nap ] [ 30 nap ]

Four charts in a 2×2 grid:

┌─────────────────────────┐  ┌─────────────────────────┐
│  CPU használat (%)       │  │  Memória használat (GB)  │
│  ▁▂▃▂▁▂▄▃▂▁▃▂▁▂▃       │  │  ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁    │
│                          │  │                          │
└─────────────────────────┘  └─────────────────────────┘
┌─────────────────────────┐  ┌─────────────────────────┐
│  Hőmérséklet (°C)       │  │  Terhelés (Load Average)  │
│  ▁▁▁▁▁▂▂▁▁▁▁▁▁▁▁      │  │  ▁▂▃▂▁▂▁▂▃▂▁▂▁▂▁       │
│                          │  │                          │
└─────────────────────────┘  └─────────────────────────┘

Chart.js line charts with:

  • Dark theme styling (match site theme — dark background, light grid, colored lines)
  • Tooltips showing exact value + timestamp
  • Y-axis: CPU 0-100%, Memory in GB (auto-scale), Temp in °C, Load auto-scale
  • X-axis: time labels, format varies by range (HH:MM for 1h-24h, MM-DD for 7d-30d)
  • Fill under the line with low-opacity color
  • Responsive (resize with container)

Section 3: Alkalmazás erőforrások (Application Resources) — Current snapshot

Bar chart showing per-container resource usage RIGHT NOW:

╔══════════════════════════════════════════════════════════════╗
║  Alkalmazás erőforrások                                      ║
╠══════════════════════════════════════════════════════════════╣
║                                                              ║
║  CPU használat                           Memória használat   ║
║  immich-server     ████████░░ 8.2%       ████████░ 350 MB   ║
║  paperless-web     ██░░░░░░░░ 1.5%       ██████░░░ 280 MB   ║
║  immich-ml         █░░░░░░░░░ 0.8%       ████░░░░░ 180 MB   ║
║  romm              █░░░░░░░░░ 0.3%       ██░░░░░░░ 90 MB    ║
║  filebrowser       ░░░░░░░░░░ 0.1%       █░░░░░░░░ 25 MB    ║
║                                                              ║
╚══════════════════════════════════════════════════════════════╝

Two horizontal bar charts side by side (Chart.js horizontal bar). Container names on the Y-axis. Sorted by CPU usage descending.

Each container name is a clickable link that opens the detail view (Section 4 below).

Section 4: Per-container detail (expandable / click-to-show)

When clicking a container name, show a historical chart panel below the bar chart (or in a modal/expanded section):

╔══════════════════════════════════════════════════════════════╗
║  immich-server — Erőforrás előzmények                        ║
║  [1 óra] [6 óra] [24 óra] [7 nap]                           ║
║                                                              ║
║  ┌─────────────────────┐  ┌─────────────────────┐           ║
║  │ CPU % (line chart)   │  │ Memória MB (line)    │           ║
║  └─────────────────────┘  └─────────────────────┘           ║
║                                                              ║
╚══════════════════════════════════════════════════════════════╝

Implementation: When a container name is clicked, JS fetches GET /api/metrics/containers/{name}?range=24h&resolution=150 and renders two Chart.js line charts in an expandable panel below the bar charts.

Section 5: Tárhely (Storage overview)

Disk usage bars for all mounted filesystems:

╔══════════════════════════════════════════════════════════════╗
║  Tárhely                                                     ║
╠══════════════════════════════════════════════════════════════╣
║                                                              ║
║  SSD (/)           ████░░░░░░░░░░░  17.5 GB / 460 GB  (4%) ║
║  Külső HDD (/mnt)  ████████░░░░░░░  500 GB / 1000 GB (50%) ║
║                                                              ║
╚══════════════════════════════════════════════════════════════╝

Reuse the progress bar styling from the dashboard's system info card.

3F: Chart.js Integration

CDN import: Include Chart.js from CDN in the monitoring page template (not site-wide — only needed on this page):

<script src="https://cdn.jsdelivr.net/npm/chart.js@4.4.7/dist/chart.umd.min.js"></script>

Wait — the controller runs on customer hardware that may not have internet access (Cloudflare Tunnel handles external requests, but internal pages are loaded locally). The Chart.js library must be embedded or served locally.

Options:

  1. Download chart.umd.min.js (~200KB) and embed it via //go:embed in the binary, serve at /static/chart.min.js
  2. Include it in the Docker image at build time

Decision: Download chart.umd.min.js, place it at internal/web/static/chart.min.js, embed it alongside the templates. Serve it at /static/chart.min.js.

Actually, the existing static serving only handles style.css and felhom-logo.svg via hardcoded handlers. Either:

  • Add another hardcoded handler for /static/chart.min.js
  • Or switch to a proper embedded static FS

For simplicity, add another handler:

case path == "/static/chart.min.js":
    s.serveChartJS(w, r)

3G: Docker Compose changes

Add host OS release mount:

volumes:
  # ... existing ...
  # Host OS info — for monitoring page system info
  - /etc/os-release:/host/etc/os-release:ro

3H: Config additions

No new config needed. The metrics collection interval (60s) and retention (30 days) can be hardcoded for v0.5.0. Make them configurable later if needed.


Navigation

Sidebar (layout.html)

Add fourth nav item:

<ul class="nav-links">
    <li><a href="/" class="...">Vezérlőpult</a></li>
    <li><a href="/stacks" class="...">Alkalmazások</a></li>
    <li><a href="/backups" class="...">Biztonsági mentés</a></li>
    <li><a href="/monitoring" class="...">Rendszermonitor</a></li>
</ul>

Web route (server.go ServeHTTP)

case path == "/monitoring":
    s.monitoringHandler(w, r)

Handler (handlers.go)

func (s *Server) monitoringHandler(w http.ResponseWriter, _ *http.Request) {
    data := s.baseData("monitoring", "Rendszermonitor")
    data["SysInfo"] = metrics.GetStaticInfo()
    data["SystemInfo"] = system.GetInfo(s.cfg.Paths.HDDPath, s.cpuCollector)
    s.render(w, "monitoring", data)
}

The page itself is mostly JS-driven — static info is server-rendered, charts fetch data via API calls after page load.


Scheduler Integration

New jobs in main.go:

// Metrics collection — every 60s
sched.Every("metrics-collect", 60*time.Second, func(ctx context.Context) error {
    metricsCollector.Sample()
    return nil
})

// Metrics pruning — daily at 04:00
sched.Daily("metrics-prune", "04:00", func(ctx context.Context) error {
    deleted, err := metricsStore.Prune(30 * 24 * time.Hour)
    if err != nil {
        return err
    }
    logger.Printf("[INFO] Pruned %d old metric rows", deleted)
    return nil
})

Note: The collector could run its own internal ticker, but using the scheduler is cleaner and consistent with the rest of the codebase. However, 60s is right at the scheduler's "quiet mode" boundary (30s). Make sure it doesn't spam logs — set quiet mode threshold to 60s or make the metrics job quiet explicitly.

Actually, better approach: have the collector run its own internal ticker (like CPUCollector does), not via the scheduler. The scheduler is designed for heavier tasks. The collector is a lightweight background loop. Register prune-only via scheduler.

metricsCollector.Start(ctx) // starts internal 60s ticker
defer metricsCollector.Stop()

sched.Daily("metrics-prune", "04:00", func(ctx context.Context) error { ... })

Implementation Order

Step 1: SQLite metrics store

  1. Add modernc.org/sqlite to go.mod
  2. Create internal/metrics/store.go — schema, CRUD, prune, query with downsampling
  3. Write basic tests if convenient

Step 2: Static system info

  1. Create internal/metrics/sysinfo.go — read hostname, OS, kernel, CPU, uptime
  2. Handle Docker mount of /etc/os-release

Step 3: Metrics collector

  1. Create internal/metrics/collector.go — system + container sampling
  2. Parse docker stats --no-stream output
  3. Start collector in main.go, register prune job

Step 4: REST API endpoints

  1. Add /api/metrics/* routes to router.go
  2. Implement handlers: system metrics, container summary, container history, sysinfo
  3. Wire MetricsStore into API router

Step 5: Chart.js embedding

  1. Download chart.umd.min.js (v4.4.x)
  2. Place in internal/web/static/ (or alongside templates)
  3. Add serving handler in server.go

Step 6: Monitoring page template + CSS

  1. Create internal/web/templates/monitoring.html — all 5 sections
  2. Add JavaScript for Chart.js rendering, time range switching, container detail expand
  3. Add CSS styles to style.css
  4. Update layout.html — add sidebar nav item

Step 7: Backup page fixes (Tasks 1 + 2)

  1. Fix "Helyi mentés" in GetFullStatus() — synthesize from snapshot history
  2. Add caching: RefreshCache(), modify GetFullStatus(), register scheduler job
  3. Add initial cache goroutine in main.go

Step 8: Docker Compose changes

  1. Add /etc/os-release:/host/etc/os-release:ro mount
  2. Update controller.yaml.example if needed

Step 9: Build, deploy, verify

  1. Build v0.5.0
  2. Deploy to demo node (sync full docker-compose.yml)
  3. Verify backup page loads instantly
  4. Verify "Helyi mentés" shows green after restart
  5. Verify monitoring page renders with system info
  6. Wait 5 minutes for metrics to accumulate
  7. Verify charts render with data
  8. Verify container bar charts show current usage
  9. Click a container → verify historical chart loads
  10. Test time range switching (1h/6h/24h/7d/30d)

Step 10: Documentation

  1. Update CONTEXT.md, README
  2. Bump version

Files to create

internal/metrics/store.go              — SQLite metrics store
internal/metrics/collector.go          — System + container metrics collector
internal/metrics/sysinfo.go            — Static system info (OS, kernel, CPU, uptime)
internal/metrics/types.go              — Shared types (SystemSample, ContainerSample, etc.)
internal/web/templates/monitoring.html — Monitoring page template
internal/web/static/chart.min.js       — Chart.js library (embedded)

Note: chart.min.js can be placed alongside templates if using a shared //go:embed directive, or in a separate static/ embed. Check existing embed patterns in embed.go.

Files to modify

internal/backup/backup.go             — Fix Helyi mentés synthesis + add caching
internal/api/router.go                — Add /api/metrics/* endpoints
internal/web/server.go                — Add /monitoring route, /static/chart.min.js handler, accept metricsStore
internal/web/handlers.go              — Add monitoringHandler()
internal/web/templates/layout.html    — Add sidebar nav item
internal/web/templates/style.css      — Monitoring page styles (charts, tables, bars)
internal/web/embed.go                 — Include chart.min.js in embedded FS (if needed)
cmd/controller/main.go                — Wire MetricsStore + collector, backup cache job, initial cache goroutine
controller/docker-compose.yml         — Add /etc/os-release mount
go.mod                                — Add modernc.org/sqlite dependency
Dockerfile                            — May need adjustments for SQLite (verify modernc.org/sqlite works with current build)

Data Types Reference

// internal/metrics/types.go

type SystemSample struct {
    Timestamp   int64   `json:"ts"`
    CPUPercent  float64 `json:"cpu"`
    MemUsedMB   int     `json:"mem_used"`
    MemTotalMB  int     `json:"mem_total"`
    TempCelsius float64 `json:"temp"`
    LoadAvg1    float64 `json:"load1"`
    LoadAvg5    float64 `json:"load5"`
    LoadAvg15   float64 `json:"load15"`
    DiskUsedGB  float64 `json:"disk_used"`
    DiskTotalGB float64 `json:"disk_total"`
    HDDUsedGB   float64 `json:"hdd_used"`
    HDDTotalGB  float64 `json:"hdd_total"`
}

type ContainerSample struct {
    Timestamp     int64   `json:"ts"`
    ContainerName string  `json:"name"`
    CPUPercent    float64 `json:"cpu"`
    MemUsageMB    float64 `json:"mem_usage"`
    MemLimitMB    float64 `json:"mem_limit"`
    NetRxBytes    int64   `json:"net_rx"`
    NetTxBytes    int64   `json:"net_tx"`
    BlockReadBytes  int64 `json:"blk_read"`
    BlockWriteBytes int64 `json:"blk_write"`
}

type ContainerCurrentStats struct {
    ContainerName string  `json:"name"`
    CPUPercent    float64 `json:"cpu_percent"`
    MemUsageMB    float64 `json:"mem_usage_mb"`
    MemLimitMB    float64 `json:"mem_limit_mb"`
}

type StaticSystemInfo struct {
    Hostname     string        `json:"hostname"`
    OS           string        `json:"os"`
    Kernel       string        `json:"kernel"`
    Architecture string        `json:"architecture"`
    CPUModel     string        `json:"cpu_model"`
    CPUCores     int           `json:"cpu_cores"`
    UptimeSeconds int64        `json:"uptime_seconds"`
    BootTime     time.Time     `json:"boot_time"`
}

Chart.js Dark Theme Configuration

All charts should use a consistent dark theme:

const chartDefaults = {
    backgroundColor: 'rgba(0, 136, 204, 0.1)',  // accent-blue with low opacity
    borderColor: '#0088cc',                       // accent-blue
    borderWidth: 2,
    pointRadius: 0,                              // hide dots (too many points)
    pointHitRadius: 10,                          // but clickable for tooltips
    tension: 0.3,                                 // smooth curves
    fill: true
};

const chartOptions = {
    responsive: true,
    maintainAspectRatio: false,
    plugins: {
        legend: { display: false },
        tooltip: {
            backgroundColor: '#1c2128',
            titleColor: '#e6edf3',
            bodyColor: '#8b949e',
            borderColor: '#30363d',
            borderWidth: 1
        }
    },
    scales: {
        x: {
            grid: { color: 'rgba(48,54,61,0.5)' },
            ticks: { color: '#8b949e', maxTicksLimit: 8 }
        },
        y: {
            grid: { color: 'rgba(48,54,61,0.5)' },
            ticks: { color: '#8b949e' },
            beginAtZero: true
        }
    }
};

Different chart colors per metric:

  • CPU: #0088cc (blue)
  • Memory: #238636 (green)
  • Temperature: #d29922 (yellow)
  • Load: #db6d28 (orange)
  • Container CPU bars: #0088cc (blue)
  • Container Memory bars: #238636 (green)

Edge Cases

  • No metrics yet: First load after deploy — charts show "Még nincsenek adatok" (No data yet) message
  • Container appears/disappears: Containers may start/stop between samples. The summary shows only currently running containers. Historical charts handle gaps gracefully (Chart.js spanGaps: true)
  • Pi 3B+ with 1G RAM: SQLite + metrics should be lightweight. 60s interval = 1440 system rows/day + ~10k container rows/day (10 containers). With 30-day retention ≈ 350k rows max. SQLite handles this easily.
  • Docker socket permissions: docker stats already works because the socket is mounted. No new permissions needed.
  • Timezone: All timestamps stored as UTC Unix epoch. Chart.js tooltip formatter converts to Europe/Budapest for display.
  • Chart.js file size: ~200KB minified. Acceptable for a single page load. Only loaded on the monitoring page.

Verification Checklist

Backup fixes

  • "Helyi mentés" shows green ✓ after controller restart (not "")
  • Backup page loads in <500ms (cached, no subprocess calls)
  • Cache refreshes after manual backup ("Mentés most")
  • Backup page still shows correct data after cache refresh

Monitoring page

  • Sidebar shows 4 nav items, "Rendszermonitor" is highlighted when active
  • System overview card shows hostname, OS, kernel, CPU model, cores, uptime
  • System metrics charts render (CPU, Memory, Temperature, Load)
  • Time range buttons work (1h/6h/24h/7d/30d) — charts update
  • Container resource bar charts show current per-container CPU + memory
  • Clicking a container name shows historical charts
  • Storage section shows SSD + HDD usage bars
  • Charts use dark theme matching site design
  • Page works on mobile (responsive charts)
  • Metrics accumulate over time (check after 10+ minutes)
  • SQLite DB created in data volume (survives restart)
  • Metrics prune job runs daily (check scheduler logs)
  • No regressions on dashboard, apps, backup pages