Files
deploy-felhom-compose/TASK.md
T
2026-02-23 10:31:19 +01:00

32 KiB

TASK: App Telemetry & Analytics

Controller: v0.27.3 → v0.28.0 Hub: v0.3.8 → v0.4.0

Overview

Add per-app (per-stack) memory/CPU telemetry and container log error scanning to the controller's report push cycle, then build fleet-wide analytics dashboard pages in the hub.


Spec Issues Found (corrections already applied in this plan)

  1. Wrong column name in metrics DB: Spec uses memory_bytes — actual column is mem_usage_mb (already in MB). No byte→MB conversion needed.
  2. ts column is INTEGER (Unix timestamp), not datetime — WHERE clauses must use unix().
  3. metricsStore already passed to BuildReport() — no main.go wiring change needed for that dependency.
  4. Chart.js is NOT in the hub — needs to be added (copy from controller's internal/web/static/chart.min.js).
  5. Hub nav is header-based, not sidebar — add "Alkalmazások" to the header <nav>.
  6. StackInfo type undefined in spec — use existing stacks.Stack directly (report already imports stacks).
  7. Fingerprinting threshold: Use 6+ digits instead of 4+ to avoid mangling HTTP status codes (404, 503) and port numbers.

Phase 1: Controller — Metrics Telemetry

File: controller/internal/metrics/telemetry.go (NEW)

Create this new file in the existing metrics package.

package metrics

import (
    "time"
)

// ContainerTelemetry holds aggregated resource stats for one container.
type ContainerTelemetry struct {
    ContainerName  string  `json:"container_name"`
    MemoryCurrentMB float64 `json:"memory_current_mb"`
    MemoryAvgMB    float64 `json:"memory_avg_mb"`
    MemoryPeakMB   float64 `json:"memory_peak_mb"`
    CPUAvgPercent  float64 `json:"cpu_avg_percent"`
    SampleCount    int     `json:"sample_count"`
}

// GetContainerTelemetry queries the metrics DB for per-container resource
// summaries since the given time. Returns empty slice (not error) if no data.
func (s *MetricsStore) GetContainerTelemetry(since time.Time) ([]ContainerTelemetry, error) {
    sinceUnix := since.Unix()

    // Get averages and peaks
    rows, err := s.db.Query(`
        SELECT container_name,
               AVG(mem_usage_mb),
               MAX(mem_usage_mb),
               AVG(cpu_percent),
               COUNT(*)
        FROM container_metrics
        WHERE ts > ?
        GROUP BY container_name`, sinceUnix)
    if err != nil {
        return nil, err
    }
    defer rows.Close()

    var results []ContainerTelemetry
    for rows.Next() {
        var ct ContainerTelemetry
        if err := rows.Scan(&ct.ContainerName, &ct.MemoryAvgMB, &ct.MemoryPeakMB,
            &ct.CPUAvgPercent, &ct.SampleCount); err != nil {
            continue
        }
        results = append(results, ct)
    }

    // Get current (most recent) memory per container using QueryContainerSummary
    if stats, err := s.QueryContainerSummary(); err == nil {
        currentMap := make(map[string]float64, len(stats))
        for _, st := range stats {
            currentMap[st.ContainerName] = st.MemUsageMB
        }
        for i := range results {
            if cur, ok := currentMap[results[i].ContainerName]; ok {
                results[i].MemoryCurrentMB = cur
            }
        }
    }

    if results == nil {
        results = []ContainerTelemetry{}
    }
    return results, nil
}

Key details:

  • Method on existing *MetricsStore — no new struct needed
  • ts column is Unix INTEGER — compare with since.Unix()
  • mem_usage_mb is already in MB — no conversion
  • Uses existing QueryContainerSummary() for current values (returns latest row per container, ordered by CPU DESC)
  • Returns empty slice on no data, not error

Phase 2: Controller — Log Scanner

File: controller/internal/metrics/logscanner.go (NEW)

Create in the metrics package (it's data collection, same domain as metrics).

package metrics

import (
    "context"
    "os/exec"
    "regexp"
    "strings"
    "time"
    "unicode/utf8"
    "log"
    "sort"
)

Types:

type ContainerLogSummary struct {
    ContainerName string     `json:"container_name"`
    ErrorCount    int        `json:"error_count"`
    WarnCount     int        `json:"warn_count"`
    RecentIssues  []LogIssue `json:"recent_issues,omitempty"`
}

type LogIssue struct {
    Severity string    `json:"severity"`
    Message  string    `json:"message"`
    Count    int       `json:"count"`
    LastSeen time.Time `json:"last_seen"`
}

Function: ScanContainerLogs(containerNames []string, since time.Duration, logger *log.Logger) []ContainerLogSummary

Implementation notes:

  • Iterate containerNames sequentially (not parallel — avoid load spikes)
  • For each container, run: exec.CommandContext(ctx, "docker", "logs", "--since=15m", "--tail=1000", containerName)
    • Context timeout: 10 seconds per container
    • Merge stderr into stdout (cmd.CombinedOutput())
    • On error: log at DEBUG, skip container, continue
  • Skip non-UTF-8 lines using utf8.Valid([]byte(line))
  • Truncate lines to 500 chars before matching
  • Pattern matching — check first 5 space-separated words of each line (case-insensitive):
    • Error patterns: error, fatal, panic, crit, oom, killed, exception, traceback
    • Warning patterns: warn, warning
  • Fingerprinting for deduplication:
    • Strip leading timestamp (regex: ^\d{4}[-/]\d{2}[-/]\d{2}[T ]\d{2}:\d{2}:\d{2}[.\d]*[Z ]? and syslog-style ^[A-Z][a-z]{2} \d{1,2} \d{2}:\d{2}:\d{2} )
    • Replace sequences of 6+ digits with <N> (NOT 4+ — avoids mangling HTTP status codes, port numbers)
    • Replace hex strings of 8+ chars with <HEX>
    • Replace UUIDs ([0-9a-f]{8}-[0-9a-f]{4}-...) with <UUID>
    • Trim whitespace, lowercase for grouping key
  • Group by fingerprint, keep count + last_seen time
  • Limits: Max 10 RecentIssues per container (sorted by count DESC, then last_seen DESC). Cap total issues across all containers at 50.
  • Total scan warning: If total scan takes > 5 minutes, log a warning
  • Return []ContainerLogSummary (nil-safe — return empty slice)

The caller (report builder) is responsible for filtering out infrastructure containers before calling this function. The function doesn't need config access.


Phase 3: Controller — Report Integration

File: controller/internal/report/types.go (MODIFY)

Add these types and the new field to Report:

// Add to Report struct:
AppTelemetry []AppTelemetry `json:"app_telemetry,omitempty"`

// New types (add after StacksReport):

// AppTelemetry holds per-app (per-stack) resource and log telemetry.
type AppTelemetry struct {
    AppName         string            `json:"app_name"`
    DisplayName     string            `json:"display_name"`
    Containers      []string          `json:"containers"`
    MemoryCurrentMB float64           `json:"memory_current_mb"`
    MemoryAvgMB     float64           `json:"memory_avg_mb"`
    MemoryPeakMB    float64           `json:"memory_peak_mb"`
    CPUAvgPercent   float64           `json:"cpu_avg_percent"`
    CatalogEstimate string            `json:"catalog_estimate"`
    CatalogLimit    string            `json:"catalog_limit"`
    LogErrors       int               `json:"log_errors"`
    LogWarnings     int               `json:"log_warnings"`
    Issues          []metrics.LogIssue `json:"issues,omitempty"`
}

Note: LogIssue is defined in the metrics package (from logscanner.go). The report package already imports metrics.

File: controller/internal/report/telemetry.go (NEW)

Function: buildAppTelemetry(allStacks []stacks.Stack, telemetry []metrics.ContainerTelemetry, logs []metrics.ContainerLogSummary) []AppTelemetry

Private function (lowercase b), called from BuildReport. Logic:

  1. Build lookup maps: containerName → ContainerTelemetry and containerName → ContainerLogSummary
  2. Iterate allStacks. Skip stacks where s.Protected || !s.Deployed
  3. For each stack:
    • Collect container names from s.Containers
    • Sum MemoryCurrentMB, MemoryAvgMB, MemoryPeakMB, CPUAvgPercent across all containers in the stack
    • Sum ErrorCount, WarnCount across all containers
    • Merge RecentIssues from all containers, sort by count DESC, cap at 10
    • Get CatalogEstimate from s.Meta.Resources.MemRequest and CatalogLimit from s.Meta.Resources.MemLimit
    • Get DisplayName from s.Meta.DisplayName
  4. Return slice sorted by AppName

File: controller/internal/report/builder.go (MODIFY)

In BuildReport(), add AFTER the stacks section (before the final debug log), approximately at line 151:

// App telemetry (metrics + log scan)
r.AppTelemetry = buildAppTelemetrySection(cfg, stackMgr, metricsStore, logger)

Create helper function buildAppTelemetrySection:

func buildAppTelemetrySection(cfg *config.Config, stackMgr *stacks.Manager, metricsStore *metrics.MetricsStore, logger *log.Logger) []AppTelemetry {
    allStacks := stackMgr.GetStacks()

    // 1. Get metrics telemetry (last 15 minutes)
    var telemetry []metrics.ContainerTelemetry
    if metricsStore != nil {
        var err error
        telemetry, err = metricsStore.GetContainerTelemetry(time.Now().Add(-15 * time.Minute))
        if err != nil && logger != nil {
            logger.Printf("[WARN] Telemetry metrics query failed: %v", err)
        }
    }

    // 2. Collect non-protected container names for log scan
    var containerNames []string
    for _, s := range allStacks {
        if s.Protected || !s.Deployed {
            continue
        }
        for _, c := range s.Containers {
            containerNames = append(containerNames, c.Name)
        }
    }

    // 3. Scan logs
    logs := metrics.ScanContainerLogs(containerNames, 15*time.Minute, logger)

    // 4. Build per-app telemetry
    return buildAppTelemetry(allStacks, telemetry, logs)
}

Key: uses s.Protected (from config's protected list) to skip infrastructure containers — no hardcoded names.

File: controller/cmd/controller/main.go (MODIFY)

Minimal change needed. The metricsStore is already passed to BuildReport(). However, config.Config (the cfg parameter) is also already available in BuildReport. Check if BuildReport signature needs *config.Config for the protected stacks check — it already receives cfg *config.Config on line 25. No signature change needed.

Actually, looking at this more carefully: buildAppTelemetrySection needs cfg to check cfg.IsProtectedStack(). But we're NOT using cfg.IsProtectedStack() — we're using s.Protected which is already set on each stack by the manager during ScanStacks(). So no config dependency needed in the telemetry builder beyond what's already available.

Wait — double check: the buildAppTelemetrySection function as written takes cfg *config.Config but only uses stackMgr.GetStacks() which already has s.Protected set. We can simplify by removing the cfg parameter:

func buildAppTelemetrySection(stackMgr *stacks.Manager, metricsStore *metrics.MetricsStore, logger *log.Logger) []AppTelemetry {

No changes to main.go needed.


Phase 4: Hub — Store Changes

File: hub/internal/store/store.go (MODIFY)

4a. Add table creation in migrate() function

Add after the existing table creation statements:

CREATE TABLE IF NOT EXISTS app_telemetry (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    customer_id TEXT NOT NULL,
    app_name TEXT NOT NULL,
    display_name TEXT NOT NULL DEFAULT '',
    reported_at DATETIME NOT NULL,
    memory_current_mb REAL DEFAULT 0,
    memory_avg_mb REAL DEFAULT 0,
    memory_peak_mb REAL DEFAULT 0,
    cpu_avg_percent REAL DEFAULT 0,
    catalog_estimate TEXT DEFAULT '',
    catalog_limit TEXT DEFAULT '',
    log_errors INTEGER DEFAULT 0,
    log_warnings INTEGER DEFAULT 0,
    containers_json TEXT DEFAULT '[]',
    issues_json TEXT DEFAULT '[]'
);

CREATE INDEX IF NOT EXISTS idx_app_telemetry_lookup
    ON app_telemetry(app_name, reported_at);
CREATE INDEX IF NOT EXISTS idx_app_telemetry_customer
    ON app_telemetry(customer_id, app_name, reported_at);
CREATE INDEX IF NOT EXISTS idx_app_telemetry_prune
    ON app_telemetry(reported_at);

CREATE TABLE IF NOT EXISTS app_log_issues (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    app_name TEXT NOT NULL,
    fingerprint TEXT NOT NULL,
    severity TEXT NOT NULL,
    message TEXT NOT NULL,
    first_seen DATETIME NOT NULL,
    last_seen DATETIME NOT NULL,
    occurrence_count INTEGER DEFAULT 1,
    affected_customers TEXT DEFAULT '[]',
    UNIQUE(app_name, fingerprint)
);

CREATE INDEX IF NOT EXISTS idx_app_log_issues_app
    ON app_log_issues(app_name, last_seen DESC);

4b. Add types (top of file or separate types section)

type AppTelemetryRecord struct {
    AppName         string   `json:"app_name"`
    DisplayName     string   `json:"display_name"`
    Containers      []string `json:"containers"`
    MemoryCurrentMB float64  `json:"memory_current_mb"`
    MemoryAvgMB     float64  `json:"memory_avg_mb"`
    MemoryPeakMB    float64  `json:"memory_peak_mb"`
    CPUAvgPercent   float64  `json:"cpu_avg_percent"`
    CatalogEstimate string   `json:"catalog_estimate"`
    CatalogLimit    string   `json:"catalog_limit"`
    LogErrors       int      `json:"log_errors"`
    LogWarnings     int      `json:"log_warnings"`
    Issues          []struct {
        Severity string    `json:"severity"`
        Message  string    `json:"message"`
        Count    int       `json:"count"`
        LastSeen time.Time `json:"last_seen"`
    } `json:"issues,omitempty"`
}

type FleetAppSummary struct {
    AppName         string
    DisplayName     string
    DeploymentCount int
    AvgMemoryMB     float64
    PeakMemoryMB    float64  // max across fleet
    P95MemoryMB     float64
    AvgCPU          float64
    TotalErrors     int
    TotalWarnings   int
    CatalogEstimate string
    CatalogLimit    string
}

type AppTelemetryPoint struct {
    ReportedAt      time.Time
    CustomerID      string
    MemoryAvgMB     float64
    MemoryPeakMB    float64
    CPUAvgPercent   float64
    LogErrors       int
    LogWarnings     int
}

type AppCustomerStats struct {
    CustomerID      string
    AvgMemoryMB     float64
    PeakMemoryMB    float64
    AvgCPU          float64
    TotalErrors     int
    LastReport      time.Time
}

type CustomerAppSummary struct {
    AppName         string
    DisplayName     string
    MemoryCurrentMB float64
    MemoryAvgMB     float64
    MemoryPeakMB    float64
    CatalogLimit    string
    LogErrors       int
    LogWarnings     int
}

type AppIssue struct {
    ID                int
    AppName           string
    Fingerprint       string
    Severity          string
    Message           string
    FirstSeen         time.Time
    LastSeen          time.Time
    OccurrenceCount   int
    AffectedCustomers []string
}

4c. Add store methods

SaveAppTelemetry(customerID string, reportedAt time.Time, records []AppTelemetryRecord) error

  • Insert each record into app_telemetry table
  • Serialize Containers to JSON for containers_json
  • Serialize Issues to JSON for issues_json
  • Use a transaction for batch insert
  • For each record with non-empty issues, call upsertAppIssue for each issue

upsertAppIssue(appName, fingerprint, severity, message, customerID string, lastSeen time.Time) error

  • Private helper
  • Use INSERT INTO app_log_issues ... ON CONFLICT(app_name, fingerprint) DO UPDATE SET last_seen=MAX(last_seen, excluded.last_seen), occurrence_count=occurrence_count+1, ...
  • For affected_customers: parse existing JSON array, add customerID if not present, re-serialize

GetFleetAppSummary(since time.Time) ([]FleetAppSummary, error)

SELECT app_name,
       MAX(display_name) as display_name,
       COUNT(DISTINCT customer_id) as deployment_count,
       AVG(memory_avg_mb) as avg_memory_mb,
       MAX(memory_peak_mb) as peak_memory_mb,
       AVG(cpu_avg_percent) as avg_cpu,
       SUM(log_errors) as total_errors,
       SUM(log_warnings) as total_warnings,
       MAX(catalog_estimate) as catalog_estimate,
       MAX(catalog_limit) as catalog_limit
FROM app_telemetry
WHERE reported_at > ?
GROUP BY app_name
ORDER BY deployment_count DESC, avg_memory_mb DESC

For P95 memory: separate query per app (can be batched in Go):

SELECT memory_peak_mb FROM app_telemetry
WHERE app_name = ? AND reported_at > ?
ORDER BY memory_peak_mb ASC
LIMIT 1 OFFSET (
    SELECT CAST(COUNT(*) * 0.95 AS INTEGER)
    FROM app_telemetry WHERE app_name = ? AND reported_at > ?
)

GetAppTelemetryHistory(appName string, since time.Time) ([]AppTelemetryPoint, error)

SELECT reported_at, customer_id, memory_avg_mb, memory_peak_mb, cpu_avg_percent, log_errors, log_warnings
FROM app_telemetry
WHERE app_name = ? AND reported_at > ?
ORDER BY reported_at ASC

GetAppCustomerBreakdown(appName string, since time.Time) ([]AppCustomerStats, error)

SELECT customer_id, AVG(memory_avg_mb), MAX(memory_peak_mb), AVG(cpu_avg_percent),
       SUM(log_errors), MAX(reported_at)
FROM app_telemetry
WHERE app_name = ? AND reported_at > ?
GROUP BY customer_id
ORDER BY AVG(memory_avg_mb) DESC

GetCustomerAppSummary(customerID string, since time.Time) ([]CustomerAppSummary, error)

SELECT t1.app_name, t1.display_name, t1.memory_current_mb,
       AVG(t2.memory_avg_mb), MAX(t2.memory_peak_mb),
       MAX(t2.catalog_limit),
       SUM(t2.log_errors), SUM(t2.log_warnings)
FROM app_telemetry t1
INNER JOIN (
    SELECT app_name, MAX(reported_at) as max_reported
    FROM app_telemetry WHERE customer_id = ? AND reported_at > ?
    GROUP BY app_name
) latest ON t1.app_name = latest.app_name AND t1.reported_at = latest.max_reported AND t1.customer_id = ?
LEFT JOIN app_telemetry t2 ON t2.app_name = t1.app_name AND t2.customer_id = ? AND t2.reported_at > ?
GROUP BY t1.app_name
ORDER BY AVG(t2.memory_avg_mb) DESC

Actually, this query is overly complex. Simpler approach:

-- Get 7d averages/peaks
SELECT app_name, MAX(display_name), AVG(memory_avg_mb), MAX(memory_peak_mb),
       MAX(catalog_limit), SUM(log_errors), SUM(log_warnings)
FROM app_telemetry WHERE customer_id = ? AND reported_at > ?
GROUP BY app_name ORDER BY AVG(memory_avg_mb) DESC

Then for current memory, get from the latest record per app:

SELECT app_name, memory_current_mb FROM app_telemetry
WHERE customer_id = ? AND reported_at = (
    SELECT MAX(reported_at) FROM app_telemetry WHERE customer_id = ? AND app_name = app_telemetry.app_name
)

Or just do two queries and merge in Go. Simpler and more readable.

GetAppIssues(appName string, limit int) ([]AppIssue, error)

SELECT id, app_name, fingerprint, severity, message, first_seen, last_seen,
       occurrence_count, affected_customers
FROM app_log_issues WHERE app_name = ?
ORDER BY last_seen DESC LIMIT ?

Parse affected_customers from JSON string to []string in Go.

GetRecentIssuesAllApps(limit int) ([]AppIssue, error)

SELECT id, app_name, fingerprint, severity, message, first_seen, last_seen,
       occurrence_count, affected_customers
FROM app_log_issues ORDER BY last_seen DESC LIMIT ?

PruneAppTelemetry(before time.Time) (int64, error)

DELETE FROM app_telemetry WHERE reported_at < ?

PruneStaleIssues(notSeenSince time.Time) (int64, error)

DELETE FROM app_log_issues WHERE last_seen < ?

Phase 5: Hub — API Handler Changes

File: hub/internal/api/handler.go (MODIFY)

In the report handler (POST /api/v1/report), AFTER store.SaveReport(customerID, body):

// Parse and save app telemetry (backward-compatible — old controllers won't have this field)
var telemetryPayload struct {
    AppTelemetry []store.AppTelemetryRecord `json:"app_telemetry"`
}
if err := json.Unmarshal(body, &telemetryPayload); err == nil && len(telemetryPayload.AppTelemetry) > 0 {
    if err := h.store.SaveAppTelemetry(customerID, time.Now(), telemetryPayload.AppTelemetry); err != nil {
        h.logger.Printf("[WARN] Failed to save app telemetry for %s: %v", customerID, err)
    }
}

This is non-breaking: if app_telemetry is absent or null, the slice will be empty and nothing is stored.


Phase 6: Hub — Prune Updates

File: hub/cmd/hub/main.go (MODIFY)

In the prune goroutine (find where store.Prune(maxDays) is called), add after it:

if n, err := st.PruneAppTelemetry(time.Now().Add(-90 * 24 * time.Hour)); err != nil {
    logger.Printf("[ERROR] Prune app telemetry: %v", err)
} else if n > 0 {
    logger.Printf("[INFO] Pruned %d old app telemetry rows", n)
}
if n, err := st.PruneStaleIssues(time.Now().Add(-30 * 24 * time.Hour)); err != nil {
    logger.Printf("[ERROR] Prune stale issues: %v", err)
} else if n > 0 {
    logger.Printf("[INFO] Pruned %d stale app issues", n)
}

Phase 7: Hub — Web Dashboard Pages

7a. Add Chart.js to hub

Copy controller/internal/web/static/chart.min.js to hub/internal/web/static/chart.min.js.

Create or modify hub/internal/web/embed.go:

package web

import "embed"

//go:embed templates/*.html templates/*.css
var templateFS embed.FS

//go:embed static/chart.min.js
var chartJS []byte

The hub already has hub/internal/web/embed.go with //go:embed templates/* and var templateFS embed.FS. Add the chart.min.js embed directive there alongside the existing ones.

Add route in server.go ServeHTTP:

case path == "/static/chart.min.js":
    w.Header().Set("Content-Type", "application/javascript")
    w.Header().Set("Cache-Control", "public, max-age=86400")
    w.Write(chartJS)

7b. Add routes in server.go ServeHTTP

Add BEFORE the case strings.HasPrefix(path, "/customers/") block (to avoid the prefix match catching /apps):

case path == "/apps" || path == "/apps/":
    s.handleApps(w, r)
case strings.HasPrefix(path, "/apps/"):
    appName := strings.TrimPrefix(path, "/apps/")
    s.handleAppDetail(w, r, appName)

7c. Navigation update

In hub/internal/web/templates/dashboard.html (or wherever the header nav is defined), add between "Dashboard" and "Customers" links:

<a href="/apps" class="nav-link">Alkalmazások</a>

If the nav is in a shared layout/header included in all templates, update it there. If each template has its own nav block, update all of them.

7d. Handler: handleApps

File: hub/internal/web/server.go (or a new hub/internal/web/apps.go for cleaner organization)

func (s *Server) handleApps(w http.ResponseWriter, r *http.Request) {
    // Parse time range from query param: ?period=24h|7d|30d (default 7d)
    period := r.URL.Query().Get("period")
    since := parsePeriod(period, 7*24*time.Hour) // helper: returns time.Now().Add(-duration)

    // Sort param: ?sort=memory|deployments|errors&order=asc|desc
    sortBy := r.URL.Query().Get("sort")
    order := r.URL.Query().Get("order")

    summary, err := s.store.GetFleetAppSummary(since)
    if err != nil { ... }

    // Sort in Go based on query params (default: by deployment count DESC)
    sortFleetSummary(summary, sortBy, order)

    // Summary cards
    totalApps := len(summary)
    totalDeployments := 0
    appsWithErrors := 0
    for _, s := range summary {
        totalDeployments += s.DeploymentCount
        if s.TotalErrors > 0 { appsWithErrors++ }
    }

    data := map[string]interface{}{
        "Apps": summary, "Period": period,
        "TotalApps": totalApps, "TotalDeployments": totalDeployments,
        "AppsWithErrors": appsWithErrors,
        "Sort": sortBy, "Order": order,
        "CSRFToken": csrfToken,
    }
    s.templates.ExecuteTemplate(w, "apps.html", data)
}

Add helper parsePeriod(s string, defaultDur time.Duration) time.Time:

func parsePeriod(s string, defaultDur time.Duration) time.Time {
    switch s {
    case "24h": return time.Now().Add(-24 * time.Hour)
    case "7d": return time.Now().Add(-7 * 24 * time.Hour)
    case "30d": return time.Now().Add(-30 * 24 * time.Hour)
    default: return time.Now().Add(-defaultDur)
    }
}

7e. Handler: handleAppDetail

func (s *Server) handleAppDetail(w http.ResponseWriter, r *http.Request, appName string) {
    period := r.URL.Query().Get("period")
    since := parsePeriod(period, 7*24*time.Hour)

    // Get customer breakdown
    customers, _ := s.store.GetAppCustomerBreakdown(appName, since)

    // Get telemetry history for chart
    history, _ := s.store.GetAppTelemetryHistory(appName, since)

    // Get issues
    issues, _ := s.store.GetAppIssues(appName, 20)

    // Compute fleet summary for this single app (for overview card)
    fleetAll, _ := s.store.GetFleetAppSummary(since)
    var appSummary *FleetAppSummary
    for i := range fleetAll {
        if fleetAll[i].AppName == appName { appSummary = &fleetAll[i]; break }
    }

    // Suggested mem_limit: ceil(P95 * 1.2), rounded to nearest 32M
    var suggestedLimit int
    if appSummary != nil && appSummary.P95MemoryMB > 0 {
        raw := appSummary.P95MemoryMB * 1.2
        suggestedLimit = ((int(raw) + 31) / 32) * 32 // round up to nearest 32
    }

    // Prepare chart data (aggregate by time bucket for fleet-wide view)
    // Group history points by reported_at hour, compute avg of avgs and max of peaks
    chartData := aggregateHistoryForChart(history)

    data := map[string]interface{}{
        "AppName": appName, "Summary": appSummary,
        "Customers": customers, "Issues": issues,
        "ChartData": chartData, "SuggestedLimit": suggestedLimit,
        "Period": period, "CSRFToken": csrfToken,
    }
    s.templates.ExecuteTemplate(w, "app_detail.html", data)
}

aggregateHistoryForChart groups data points into hourly buckets, returns {Labels []string, AvgMemory []float64, PeakMemory []float64, CatalogLimit []float64} for Chart.js.

7f. Extend handleCustomerUnified

In hub/internal/web/configs.go where handleCustomerUnified builds its template data, add:

// App telemetry section
appTelemetry, _ := s.store.GetCustomerAppSummary(customerID, time.Now().Add(-7*24*time.Hour))
// Only show section if data exists
data["AppTelemetry"] = appTelemetry
data["HasAppTelemetry"] = len(appTelemetry) > 0

7g. Template: hub/internal/web/templates/apps.html (NEW)

Fleet-wide app list page. Follow existing hub template patterns:

  • Same dark theme, same table styles
  • Header nav with "Alkalmazások" active
  • Summary cards row at top (3 cards: Total apps, Total deployments, Apps with errors)
  • Time range selector buttons (24h / 7d / 30d) — links to ?period=...
  • Main table with columns: App name (link to /apps/{name}), Deployments, Avg Memory, P95 Memory, Catalog Estimate, Catalog Limit, Estimate Accuracy (icon), Errors (24h badge), Warnings (24h badge)
  • Sortable column headers (links to ?sort=...&order=...)
  • Estimate accuracy: green dot if P95 < limit, yellow if P95 > 50% of limit, red if P95 > limit
  • All text in Hungarian

7h. Template: hub/internal/web/templates/app_detail.html (NEW)

Per-app detail page with:

  1. Overview card: App name, display name, catalog estimates, deployment count, suggested mem_limit
  2. Memory trend chart: <canvas id="memoryChart">, Chart.js line chart with:
    • Lines: Avg Memory (blue), Peak Memory (red)
    • Dashed horizontal line: Catalog mem_limit (green)
    • Data from chartData (injected via {{ json .ChartData }})
  3. Customer breakdown table: Customer link, Avg Memory, Peak Memory, CPU, Errors, Last Report
  4. Common issues table: Severity badge, Message, Occurrences, Affected Customers, First/Last Seen

Include Chart.js: <script src="/static/chart.min.js"></script>

7i. Template: hub/internal/web/templates/customer_unified.html (MODIFY)

Add new section "Alkalmazás telemetria" (conditionally shown):

{{ if .HasAppTelemetry }}
<div class="section">
  <h2>Alkalmazás telemetria</h2>
  <table class="data-table">
    <thead>
      <tr>
        <th>Alkalmazás</th>
        <th>Memória (jelenlegi)</th>
        <th>Memória (átlag 7d)</th>
        <th>Memória (csúcs 7d)</th>
        <th>Katalógus limit</th>
        <th>Hibák (24ó)</th>
        <th>Figyelmeztetések (24ó)</th>
      </tr>
    </thead>
    <tbody>
      {{ range .AppTelemetry }}
      <tr>
        <td><a href="/apps/{{ .AppName }}">{{ .DisplayName }}</a></td>
        <td>{{ formatFloat .MemoryCurrentMB }} MB</td>
        <td>{{ formatFloat .MemoryAvgMB }} MB</td>
        <td>{{ formatFloat .MemoryPeakMB }} MB</td>
        <td>{{ .CatalogLimit }}</td>
        <td>{{ if gt .LogErrors 0 }}<span class="badge badge-error">{{ .LogErrors }}</span>{{ else }}0{{ end }}</td>
        <td>{{ if gt .LogWarnings 0 }}<span class="badge badge-warn">{{ .LogWarnings }}</span>{{ else }}0{{ end }}</td>
      </tr>
      {{ end }}
    </tbody>
  </table>
</div>
{{ end }}

Memory cell coloring: Use inline style or CSS class based on percentage of catalog_limit. This requires a template function to parse the limit string and compare — add a memoryColor(currentMB float64, limitStr string) string template function that returns a CSS color.

7j. CSS additions in hub/internal/web/templates/style.css (MODIFY)

Add styles for:

  • .badge, .badge-error (red), .badge-warn (yellow)
  • .summary-cards (flex row of 3 cards)
  • .summary-card (dark card with number + label)
  • .chart-container (responsive canvas wrapper)
  • .period-selector (button group for time ranges)
  • .accuracy-dot (small colored circle for estimate accuracy)
  • Memory cell colors: .mem-ok (green text), .mem-warn (yellow), .mem-danger (red)

Follow existing hub dark theme color variables.


Phase 8: Version Bumps & Changelog

Controller

  • Update version constant to v0.28.0 in the appropriate file (likely cmd/controller/main.go or a version.go file)
  • Add CHANGELOG.md entry

Hub

  • Update version constant to v0.4.0
  • Add CHANGELOG.md entry (if hub has one)

Implementation Checklist

Controller (can be deployed and tested independently)

  • internal/metrics/telemetry.go — GetContainerTelemetry method
  • internal/metrics/logscanner.go — ScanContainerLogs + types
  • internal/report/types.go — Add AppTelemetry field + type definitions
  • internal/report/telemetry.go — buildAppTelemetry + buildAppTelemetrySection
  • internal/report/builder.go — Call buildAppTelemetrySection in BuildReport
  • Test: build, deploy to demo, verify app_telemetry in report JSON

Hub (deploy after controller verified)

  • internal/store/store.go — Tables + types + all store methods
  • internal/api/handler.go — Parse & save telemetry from reports
  • cmd/hub/main.go — Add prune calls
  • internal/web/static/chart.min.js — Copy from controller
  • internal/web/embed.go — Add chart.min.js embed
  • internal/web/server.go — Routes + handlers + chart.js serving + parsePeriod helper
  • internal/web/templates/apps.html — Fleet app list page
  • internal/web/templates/app_detail.html — App detail page with chart
  • internal/web/templates/customer_unified.html — Add telemetry section
  • internal/web/templates/style.css — New styles
  • Navigation update in all templates (add "Alkalmazások" link)
  • Test: deploy hub, verify pages after a few report cycles

Notes for Implementation

  1. Run go build ./... after each phase to catch compile errors early
  2. The report package already imports metrics and stacks — no new import cycles
  3. Hub templates use template.Must(template.New("").Funcs(funcMap).ParseFS(templateFS, "templates/*.html")) — new .html files are automatically picked up
  4. Hub already has embed.go at hub/internal/web/embed.go — add chart.min.js embed directive there. Create hub/internal/web/static/ directory
  5. All hub template functions are defined in server.go New() constructor — add any new functions (like memoryColor) there
  6. CSRF tokens: All POST forms need <input type="hidden" name="_csrf" value="{{ .CSRFToken }}"> — but the apps pages are read-only (GET only), so no CSRF needed
  7. Hub's ServeHTTP uses hasSuffix/hasPrefix routing — put exact matches (/apps) BEFORE prefix matches. The /apps/{name} route must come before /apps/ fallback