Files
deploy-felhom-compose/TASK.md
T
2026-02-23 10:31:19 +01:00

867 lines
32 KiB
Markdown

# TASK: App Telemetry & Analytics
**Controller:** v0.27.3 → v0.28.0
**Hub:** v0.3.8 → v0.4.0
## Overview
Add per-app (per-stack) memory/CPU telemetry and container log error scanning to the controller's report push cycle, then build fleet-wide analytics dashboard pages in the hub.
---
## Spec Issues Found (corrections already applied in this plan)
1. **Wrong column name in metrics DB**: Spec uses `memory_bytes` — actual column is `mem_usage_mb` (already in MB). No byte→MB conversion needed.
2. **`ts` column is INTEGER (Unix timestamp)**, not datetime — WHERE clauses must use `unix()`.
3. **`metricsStore` already passed to `BuildReport()`** — no main.go wiring change needed for that dependency.
4. **Chart.js is NOT in the hub** — needs to be added (copy from controller's `internal/web/static/chart.min.js`).
5. **Hub nav is header-based**, not sidebar — add "Alkalmazások" to the header `<nav>`.
6. **`StackInfo` type undefined** in spec — use existing `stacks.Stack` directly (report already imports stacks).
7. **Fingerprinting threshold**: Use 6+ digits instead of 4+ to avoid mangling HTTP status codes (404, 503) and port numbers.
---
## Phase 1: Controller — Metrics Telemetry
### File: `controller/internal/metrics/telemetry.go` (NEW)
Create this new file in the existing `metrics` package.
```go
package metrics
import (
"time"
)
// ContainerTelemetry holds aggregated resource stats for one container.
type ContainerTelemetry struct {
ContainerName string `json:"container_name"`
MemoryCurrentMB float64 `json:"memory_current_mb"`
MemoryAvgMB float64 `json:"memory_avg_mb"`
MemoryPeakMB float64 `json:"memory_peak_mb"`
CPUAvgPercent float64 `json:"cpu_avg_percent"`
SampleCount int `json:"sample_count"`
}
// GetContainerTelemetry queries the metrics DB for per-container resource
// summaries since the given time. Returns empty slice (not error) if no data.
func (s *MetricsStore) GetContainerTelemetry(since time.Time) ([]ContainerTelemetry, error) {
sinceUnix := since.Unix()
// Get averages and peaks
rows, err := s.db.Query(`
SELECT container_name,
AVG(mem_usage_mb),
MAX(mem_usage_mb),
AVG(cpu_percent),
COUNT(*)
FROM container_metrics
WHERE ts > ?
GROUP BY container_name`, sinceUnix)
if err != nil {
return nil, err
}
defer rows.Close()
var results []ContainerTelemetry
for rows.Next() {
var ct ContainerTelemetry
if err := rows.Scan(&ct.ContainerName, &ct.MemoryAvgMB, &ct.MemoryPeakMB,
&ct.CPUAvgPercent, &ct.SampleCount); err != nil {
continue
}
results = append(results, ct)
}
// Get current (most recent) memory per container using QueryContainerSummary
if stats, err := s.QueryContainerSummary(); err == nil {
currentMap := make(map[string]float64, len(stats))
for _, st := range stats {
currentMap[st.ContainerName] = st.MemUsageMB
}
for i := range results {
if cur, ok := currentMap[results[i].ContainerName]; ok {
results[i].MemoryCurrentMB = cur
}
}
}
if results == nil {
results = []ContainerTelemetry{}
}
return results, nil
}
```
**Key details:**
- Method on existing `*MetricsStore` — no new struct needed
- `ts` column is Unix INTEGER — compare with `since.Unix()`
- `mem_usage_mb` is already in MB — no conversion
- Uses existing `QueryContainerSummary()` for current values (returns latest row per container, ordered by CPU DESC)
- Returns empty slice on no data, not error
---
## Phase 2: Controller — Log Scanner
### File: `controller/internal/metrics/logscanner.go` (NEW)
Create in the `metrics` package (it's data collection, same domain as metrics).
```go
package metrics
import (
"context"
"os/exec"
"regexp"
"strings"
"time"
"unicode/utf8"
"log"
"sort"
)
```
**Types:**
```go
type ContainerLogSummary struct {
ContainerName string `json:"container_name"`
ErrorCount int `json:"error_count"`
WarnCount int `json:"warn_count"`
RecentIssues []LogIssue `json:"recent_issues,omitempty"`
}
type LogIssue struct {
Severity string `json:"severity"`
Message string `json:"message"`
Count int `json:"count"`
LastSeen time.Time `json:"last_seen"`
}
```
**Function: `ScanContainerLogs(containerNames []string, since time.Duration, logger *log.Logger) []ContainerLogSummary`**
Implementation notes:
- Iterate `containerNames` **sequentially** (not parallel — avoid load spikes)
- For each container, run: `exec.CommandContext(ctx, "docker", "logs", "--since=15m", "--tail=1000", containerName)`
- Context timeout: 10 seconds per container
- Merge stderr into stdout (`cmd.CombinedOutput()`)
- On error: log at DEBUG, skip container, continue
- **Skip non-UTF-8 lines** using `utf8.Valid([]byte(line))`
- **Truncate lines** to 500 chars before matching
- **Pattern matching** — check first 5 space-separated words of each line (case-insensitive):
- Error patterns: `error`, `fatal`, `panic`, `crit`, `oom`, `killed`, `exception`, `traceback`
- Warning patterns: `warn`, `warning`
- **Fingerprinting for deduplication:**
- Strip leading timestamp (regex: `^\d{4}[-/]\d{2}[-/]\d{2}[T ]\d{2}:\d{2}:\d{2}[.\d]*[Z ]?` and syslog-style `^[A-Z][a-z]{2} \d{1,2} \d{2}:\d{2}:\d{2} `)
- Replace sequences of 6+ digits with `<N>` (NOT 4+ — avoids mangling HTTP status codes, port numbers)
- Replace hex strings of 8+ chars with `<HEX>`
- Replace UUIDs (`[0-9a-f]{8}-[0-9a-f]{4}-...`) with `<UUID>`
- Trim whitespace, lowercase for grouping key
- **Group by fingerprint**, keep count + last_seen time
- **Limits:** Max 10 `RecentIssues` per container (sorted by count DESC, then last_seen DESC). Cap total issues across all containers at 50.
- **Total scan warning:** If total scan takes > 5 minutes, log a warning
- Return `[]ContainerLogSummary` (nil-safe — return empty slice)
**The caller (report builder) is responsible for filtering out infrastructure containers before calling this function.** The function doesn't need config access.
---
## Phase 3: Controller — Report Integration
### File: `controller/internal/report/types.go` (MODIFY)
Add these types and the new field to `Report`:
```go
// Add to Report struct:
AppTelemetry []AppTelemetry `json:"app_telemetry,omitempty"`
// New types (add after StacksReport):
// AppTelemetry holds per-app (per-stack) resource and log telemetry.
type AppTelemetry struct {
AppName string `json:"app_name"`
DisplayName string `json:"display_name"`
Containers []string `json:"containers"`
MemoryCurrentMB float64 `json:"memory_current_mb"`
MemoryAvgMB float64 `json:"memory_avg_mb"`
MemoryPeakMB float64 `json:"memory_peak_mb"`
CPUAvgPercent float64 `json:"cpu_avg_percent"`
CatalogEstimate string `json:"catalog_estimate"`
CatalogLimit string `json:"catalog_limit"`
LogErrors int `json:"log_errors"`
LogWarnings int `json:"log_warnings"`
Issues []metrics.LogIssue `json:"issues,omitempty"`
}
```
Note: `LogIssue` is defined in the `metrics` package (from logscanner.go). The `report` package already imports `metrics`.
### File: `controller/internal/report/telemetry.go` (NEW)
**Function: `buildAppTelemetry(allStacks []stacks.Stack, telemetry []metrics.ContainerTelemetry, logs []metrics.ContainerLogSummary) []AppTelemetry`**
Private function (lowercase `b`), called from `BuildReport`. Logic:
1. Build lookup maps: `containerName → ContainerTelemetry` and `containerName → ContainerLogSummary`
2. Iterate `allStacks`. Skip stacks where `s.Protected || !s.Deployed`
3. For each stack:
- Collect container names from `s.Containers`
- Sum `MemoryCurrentMB`, `MemoryAvgMB`, `MemoryPeakMB`, `CPUAvgPercent` across all containers in the stack
- Sum `ErrorCount`, `WarnCount` across all containers
- Merge `RecentIssues` from all containers, sort by count DESC, cap at 10
- Get `CatalogEstimate` from `s.Meta.Resources.MemRequest` and `CatalogLimit` from `s.Meta.Resources.MemLimit`
- Get `DisplayName` from `s.Meta.DisplayName`
4. Return slice sorted by `AppName`
### File: `controller/internal/report/builder.go` (MODIFY)
In `BuildReport()`, add AFTER the stacks section (before the final debug log), approximately at line 151:
```go
// App telemetry (metrics + log scan)
r.AppTelemetry = buildAppTelemetrySection(cfg, stackMgr, metricsStore, logger)
```
Create helper function `buildAppTelemetrySection`:
```go
func buildAppTelemetrySection(cfg *config.Config, stackMgr *stacks.Manager, metricsStore *metrics.MetricsStore, logger *log.Logger) []AppTelemetry {
allStacks := stackMgr.GetStacks()
// 1. Get metrics telemetry (last 15 minutes)
var telemetry []metrics.ContainerTelemetry
if metricsStore != nil {
var err error
telemetry, err = metricsStore.GetContainerTelemetry(time.Now().Add(-15 * time.Minute))
if err != nil && logger != nil {
logger.Printf("[WARN] Telemetry metrics query failed: %v", err)
}
}
// 2. Collect non-protected container names for log scan
var containerNames []string
for _, s := range allStacks {
if s.Protected || !s.Deployed {
continue
}
for _, c := range s.Containers {
containerNames = append(containerNames, c.Name)
}
}
// 3. Scan logs
logs := metrics.ScanContainerLogs(containerNames, 15*time.Minute, logger)
// 4. Build per-app telemetry
return buildAppTelemetry(allStacks, telemetry, logs)
}
```
**Key: uses `s.Protected` (from config's protected list) to skip infrastructure containers** — no hardcoded names.
### File: `controller/cmd/controller/main.go` (MODIFY)
**Minimal change needed.** The `metricsStore` is already passed to `BuildReport()`. However, `config.Config` (the `cfg` parameter) is also already available in `BuildReport`. Check if `BuildReport` signature needs `*config.Config` for the protected stacks check — **it already receives `cfg *config.Config`** on line 25. No signature change needed.
Actually, looking at this more carefully: `buildAppTelemetrySection` needs `cfg` to check `cfg.IsProtectedStack()`. But we're NOT using `cfg.IsProtectedStack()` — we're using `s.Protected` which is already set on each stack by the manager during `ScanStacks()`. So **no config dependency needed in the telemetry builder** beyond what's already available.
Wait — double check: the `buildAppTelemetrySection` function as written takes `cfg *config.Config` but only uses `stackMgr.GetStacks()` which already has `s.Protected` set. We can simplify by removing the `cfg` parameter:
```go
func buildAppTelemetrySection(stackMgr *stacks.Manager, metricsStore *metrics.MetricsStore, logger *log.Logger) []AppTelemetry {
```
**No changes to main.go needed.**
---
## Phase 4: Hub — Store Changes
### File: `hub/internal/store/store.go` (MODIFY)
#### 4a. Add table creation in `migrate()` function
Add after the existing table creation statements:
```sql
CREATE TABLE IF NOT EXISTS app_telemetry (
id INTEGER PRIMARY KEY AUTOINCREMENT,
customer_id TEXT NOT NULL,
app_name TEXT NOT NULL,
display_name TEXT NOT NULL DEFAULT '',
reported_at DATETIME NOT NULL,
memory_current_mb REAL DEFAULT 0,
memory_avg_mb REAL DEFAULT 0,
memory_peak_mb REAL DEFAULT 0,
cpu_avg_percent REAL DEFAULT 0,
catalog_estimate TEXT DEFAULT '',
catalog_limit TEXT DEFAULT '',
log_errors INTEGER DEFAULT 0,
log_warnings INTEGER DEFAULT 0,
containers_json TEXT DEFAULT '[]',
issues_json TEXT DEFAULT '[]'
);
CREATE INDEX IF NOT EXISTS idx_app_telemetry_lookup
ON app_telemetry(app_name, reported_at);
CREATE INDEX IF NOT EXISTS idx_app_telemetry_customer
ON app_telemetry(customer_id, app_name, reported_at);
CREATE INDEX IF NOT EXISTS idx_app_telemetry_prune
ON app_telemetry(reported_at);
CREATE TABLE IF NOT EXISTS app_log_issues (
id INTEGER PRIMARY KEY AUTOINCREMENT,
app_name TEXT NOT NULL,
fingerprint TEXT NOT NULL,
severity TEXT NOT NULL,
message TEXT NOT NULL,
first_seen DATETIME NOT NULL,
last_seen DATETIME NOT NULL,
occurrence_count INTEGER DEFAULT 1,
affected_customers TEXT DEFAULT '[]',
UNIQUE(app_name, fingerprint)
);
CREATE INDEX IF NOT EXISTS idx_app_log_issues_app
ON app_log_issues(app_name, last_seen DESC);
```
#### 4b. Add types (top of file or separate types section)
```go
type AppTelemetryRecord struct {
AppName string `json:"app_name"`
DisplayName string `json:"display_name"`
Containers []string `json:"containers"`
MemoryCurrentMB float64 `json:"memory_current_mb"`
MemoryAvgMB float64 `json:"memory_avg_mb"`
MemoryPeakMB float64 `json:"memory_peak_mb"`
CPUAvgPercent float64 `json:"cpu_avg_percent"`
CatalogEstimate string `json:"catalog_estimate"`
CatalogLimit string `json:"catalog_limit"`
LogErrors int `json:"log_errors"`
LogWarnings int `json:"log_warnings"`
Issues []struct {
Severity string `json:"severity"`
Message string `json:"message"`
Count int `json:"count"`
LastSeen time.Time `json:"last_seen"`
} `json:"issues,omitempty"`
}
type FleetAppSummary struct {
AppName string
DisplayName string
DeploymentCount int
AvgMemoryMB float64
PeakMemoryMB float64 // max across fleet
P95MemoryMB float64
AvgCPU float64
TotalErrors int
TotalWarnings int
CatalogEstimate string
CatalogLimit string
}
type AppTelemetryPoint struct {
ReportedAt time.Time
CustomerID string
MemoryAvgMB float64
MemoryPeakMB float64
CPUAvgPercent float64
LogErrors int
LogWarnings int
}
type AppCustomerStats struct {
CustomerID string
AvgMemoryMB float64
PeakMemoryMB float64
AvgCPU float64
TotalErrors int
LastReport time.Time
}
type CustomerAppSummary struct {
AppName string
DisplayName string
MemoryCurrentMB float64
MemoryAvgMB float64
MemoryPeakMB float64
CatalogLimit string
LogErrors int
LogWarnings int
}
type AppIssue struct {
ID int
AppName string
Fingerprint string
Severity string
Message string
FirstSeen time.Time
LastSeen time.Time
OccurrenceCount int
AffectedCustomers []string
}
```
#### 4c. Add store methods
**`SaveAppTelemetry(customerID string, reportedAt time.Time, records []AppTelemetryRecord) error`**
- Insert each record into `app_telemetry` table
- Serialize `Containers` to JSON for `containers_json`
- Serialize `Issues` to JSON for `issues_json`
- Use a transaction for batch insert
- For each record with non-empty issues, call `upsertAppIssue` for each issue
**`upsertAppIssue(appName, fingerprint, severity, message, customerID string, lastSeen time.Time) error`**
- Private helper
- Use `INSERT INTO app_log_issues ... ON CONFLICT(app_name, fingerprint) DO UPDATE SET last_seen=MAX(last_seen, excluded.last_seen), occurrence_count=occurrence_count+1, ...`
- For `affected_customers`: parse existing JSON array, add customerID if not present, re-serialize
**`GetFleetAppSummary(since time.Time) ([]FleetAppSummary, error)`**
```sql
SELECT app_name,
MAX(display_name) as display_name,
COUNT(DISTINCT customer_id) as deployment_count,
AVG(memory_avg_mb) as avg_memory_mb,
MAX(memory_peak_mb) as peak_memory_mb,
AVG(cpu_avg_percent) as avg_cpu,
SUM(log_errors) as total_errors,
SUM(log_warnings) as total_warnings,
MAX(catalog_estimate) as catalog_estimate,
MAX(catalog_limit) as catalog_limit
FROM app_telemetry
WHERE reported_at > ?
GROUP BY app_name
ORDER BY deployment_count DESC, avg_memory_mb DESC
```
For P95 memory: separate query per app (can be batched in Go):
```sql
SELECT memory_peak_mb FROM app_telemetry
WHERE app_name = ? AND reported_at > ?
ORDER BY memory_peak_mb ASC
LIMIT 1 OFFSET (
SELECT CAST(COUNT(*) * 0.95 AS INTEGER)
FROM app_telemetry WHERE app_name = ? AND reported_at > ?
)
```
**`GetAppTelemetryHistory(appName string, since time.Time) ([]AppTelemetryPoint, error)`**
```sql
SELECT reported_at, customer_id, memory_avg_mb, memory_peak_mb, cpu_avg_percent, log_errors, log_warnings
FROM app_telemetry
WHERE app_name = ? AND reported_at > ?
ORDER BY reported_at ASC
```
**`GetAppCustomerBreakdown(appName string, since time.Time) ([]AppCustomerStats, error)`**
```sql
SELECT customer_id, AVG(memory_avg_mb), MAX(memory_peak_mb), AVG(cpu_avg_percent),
SUM(log_errors), MAX(reported_at)
FROM app_telemetry
WHERE app_name = ? AND reported_at > ?
GROUP BY customer_id
ORDER BY AVG(memory_avg_mb) DESC
```
**`GetCustomerAppSummary(customerID string, since time.Time) ([]CustomerAppSummary, error)`**
```sql
SELECT t1.app_name, t1.display_name, t1.memory_current_mb,
AVG(t2.memory_avg_mb), MAX(t2.memory_peak_mb),
MAX(t2.catalog_limit),
SUM(t2.log_errors), SUM(t2.log_warnings)
FROM app_telemetry t1
INNER JOIN (
SELECT app_name, MAX(reported_at) as max_reported
FROM app_telemetry WHERE customer_id = ? AND reported_at > ?
GROUP BY app_name
) latest ON t1.app_name = latest.app_name AND t1.reported_at = latest.max_reported AND t1.customer_id = ?
LEFT JOIN app_telemetry t2 ON t2.app_name = t1.app_name AND t2.customer_id = ? AND t2.reported_at > ?
GROUP BY t1.app_name
ORDER BY AVG(t2.memory_avg_mb) DESC
```
Actually, this query is overly complex. Simpler approach:
```sql
-- Get 7d averages/peaks
SELECT app_name, MAX(display_name), AVG(memory_avg_mb), MAX(memory_peak_mb),
MAX(catalog_limit), SUM(log_errors), SUM(log_warnings)
FROM app_telemetry WHERE customer_id = ? AND reported_at > ?
GROUP BY app_name ORDER BY AVG(memory_avg_mb) DESC
```
Then for current memory, get from the latest record per app:
```sql
SELECT app_name, memory_current_mb FROM app_telemetry
WHERE customer_id = ? AND reported_at = (
SELECT MAX(reported_at) FROM app_telemetry WHERE customer_id = ? AND app_name = app_telemetry.app_name
)
```
Or just do two queries and merge in Go. Simpler and more readable.
**`GetAppIssues(appName string, limit int) ([]AppIssue, error)`**
```sql
SELECT id, app_name, fingerprint, severity, message, first_seen, last_seen,
occurrence_count, affected_customers
FROM app_log_issues WHERE app_name = ?
ORDER BY last_seen DESC LIMIT ?
```
Parse `affected_customers` from JSON string to `[]string` in Go.
**`GetRecentIssuesAllApps(limit int) ([]AppIssue, error)`**
```sql
SELECT id, app_name, fingerprint, severity, message, first_seen, last_seen,
occurrence_count, affected_customers
FROM app_log_issues ORDER BY last_seen DESC LIMIT ?
```
**`PruneAppTelemetry(before time.Time) (int64, error)`**
```sql
DELETE FROM app_telemetry WHERE reported_at < ?
```
**`PruneStaleIssues(notSeenSince time.Time) (int64, error)`**
```sql
DELETE FROM app_log_issues WHERE last_seen < ?
```
---
## Phase 5: Hub — API Handler Changes
### File: `hub/internal/api/handler.go` (MODIFY)
In the report handler (`POST /api/v1/report`), AFTER `store.SaveReport(customerID, body)`:
```go
// Parse and save app telemetry (backward-compatible — old controllers won't have this field)
var telemetryPayload struct {
AppTelemetry []store.AppTelemetryRecord `json:"app_telemetry"`
}
if err := json.Unmarshal(body, &telemetryPayload); err == nil && len(telemetryPayload.AppTelemetry) > 0 {
if err := h.store.SaveAppTelemetry(customerID, time.Now(), telemetryPayload.AppTelemetry); err != nil {
h.logger.Printf("[WARN] Failed to save app telemetry for %s: %v", customerID, err)
}
}
```
This is non-breaking: if `app_telemetry` is absent or null, the slice will be empty and nothing is stored.
---
## Phase 6: Hub — Prune Updates
### File: `hub/cmd/hub/main.go` (MODIFY)
In the prune goroutine (find where `store.Prune(maxDays)` is called), add after it:
```go
if n, err := st.PruneAppTelemetry(time.Now().Add(-90 * 24 * time.Hour)); err != nil {
logger.Printf("[ERROR] Prune app telemetry: %v", err)
} else if n > 0 {
logger.Printf("[INFO] Pruned %d old app telemetry rows", n)
}
if n, err := st.PruneStaleIssues(time.Now().Add(-30 * 24 * time.Hour)); err != nil {
logger.Printf("[ERROR] Prune stale issues: %v", err)
} else if n > 0 {
logger.Printf("[INFO] Pruned %d stale app issues", n)
}
```
---
## Phase 7: Hub — Web Dashboard Pages
### 7a. Add Chart.js to hub
**Copy** `controller/internal/web/static/chart.min.js` to `hub/internal/web/static/chart.min.js`.
**Create or modify** `hub/internal/web/embed.go`:
```go
package web
import "embed"
//go:embed templates/*.html templates/*.css
var templateFS embed.FS
//go:embed static/chart.min.js
var chartJS []byte
```
The hub already has `hub/internal/web/embed.go` with `//go:embed templates/*` and `var templateFS embed.FS`. Add the chart.min.js embed directive there alongside the existing ones.
**Add route** in `server.go` ServeHTTP:
```go
case path == "/static/chart.min.js":
w.Header().Set("Content-Type", "application/javascript")
w.Header().Set("Cache-Control", "public, max-age=86400")
w.Write(chartJS)
```
### 7b. Add routes in `server.go` ServeHTTP
Add BEFORE the `case strings.HasPrefix(path, "/customers/")` block (to avoid the prefix match catching `/apps`):
```go
case path == "/apps" || path == "/apps/":
s.handleApps(w, r)
case strings.HasPrefix(path, "/apps/"):
appName := strings.TrimPrefix(path, "/apps/")
s.handleAppDetail(w, r, appName)
```
### 7c. Navigation update
In **`hub/internal/web/templates/dashboard.html`** (or wherever the header nav is defined), add between "Dashboard" and "Customers" links:
```html
<a href="/apps" class="nav-link">Alkalmazások</a>
```
If the nav is in a shared layout/header included in all templates, update it there. If each template has its own nav block, update all of them.
### 7d. Handler: `handleApps`
**File: `hub/internal/web/server.go`** (or a new `hub/internal/web/apps.go` for cleaner organization)
```go
func (s *Server) handleApps(w http.ResponseWriter, r *http.Request) {
// Parse time range from query param: ?period=24h|7d|30d (default 7d)
period := r.URL.Query().Get("period")
since := parsePeriod(period, 7*24*time.Hour) // helper: returns time.Now().Add(-duration)
// Sort param: ?sort=memory|deployments|errors&order=asc|desc
sortBy := r.URL.Query().Get("sort")
order := r.URL.Query().Get("order")
summary, err := s.store.GetFleetAppSummary(since)
if err != nil { ... }
// Sort in Go based on query params (default: by deployment count DESC)
sortFleetSummary(summary, sortBy, order)
// Summary cards
totalApps := len(summary)
totalDeployments := 0
appsWithErrors := 0
for _, s := range summary {
totalDeployments += s.DeploymentCount
if s.TotalErrors > 0 { appsWithErrors++ }
}
data := map[string]interface{}{
"Apps": summary, "Period": period,
"TotalApps": totalApps, "TotalDeployments": totalDeployments,
"AppsWithErrors": appsWithErrors,
"Sort": sortBy, "Order": order,
"CSRFToken": csrfToken,
}
s.templates.ExecuteTemplate(w, "apps.html", data)
}
```
Add helper `parsePeriod(s string, defaultDur time.Duration) time.Time`:
```go
func parsePeriod(s string, defaultDur time.Duration) time.Time {
switch s {
case "24h": return time.Now().Add(-24 * time.Hour)
case "7d": return time.Now().Add(-7 * 24 * time.Hour)
case "30d": return time.Now().Add(-30 * 24 * time.Hour)
default: return time.Now().Add(-defaultDur)
}
}
```
### 7e. Handler: `handleAppDetail`
```go
func (s *Server) handleAppDetail(w http.ResponseWriter, r *http.Request, appName string) {
period := r.URL.Query().Get("period")
since := parsePeriod(period, 7*24*time.Hour)
// Get customer breakdown
customers, _ := s.store.GetAppCustomerBreakdown(appName, since)
// Get telemetry history for chart
history, _ := s.store.GetAppTelemetryHistory(appName, since)
// Get issues
issues, _ := s.store.GetAppIssues(appName, 20)
// Compute fleet summary for this single app (for overview card)
fleetAll, _ := s.store.GetFleetAppSummary(since)
var appSummary *FleetAppSummary
for i := range fleetAll {
if fleetAll[i].AppName == appName { appSummary = &fleetAll[i]; break }
}
// Suggested mem_limit: ceil(P95 * 1.2), rounded to nearest 32M
var suggestedLimit int
if appSummary != nil && appSummary.P95MemoryMB > 0 {
raw := appSummary.P95MemoryMB * 1.2
suggestedLimit = ((int(raw) + 31) / 32) * 32 // round up to nearest 32
}
// Prepare chart data (aggregate by time bucket for fleet-wide view)
// Group history points by reported_at hour, compute avg of avgs and max of peaks
chartData := aggregateHistoryForChart(history)
data := map[string]interface{}{
"AppName": appName, "Summary": appSummary,
"Customers": customers, "Issues": issues,
"ChartData": chartData, "SuggestedLimit": suggestedLimit,
"Period": period, "CSRFToken": csrfToken,
}
s.templates.ExecuteTemplate(w, "app_detail.html", data)
}
```
`aggregateHistoryForChart` groups data points into hourly buckets, returns `{Labels []string, AvgMemory []float64, PeakMemory []float64, CatalogLimit []float64}` for Chart.js.
### 7f. Extend `handleCustomerUnified`
In **`hub/internal/web/configs.go`** where `handleCustomerUnified` builds its template data, add:
```go
// App telemetry section
appTelemetry, _ := s.store.GetCustomerAppSummary(customerID, time.Now().Add(-7*24*time.Hour))
// Only show section if data exists
data["AppTelemetry"] = appTelemetry
data["HasAppTelemetry"] = len(appTelemetry) > 0
```
### 7g. Template: `hub/internal/web/templates/apps.html` (NEW)
Fleet-wide app list page. Follow existing hub template patterns:
- Same dark theme, same table styles
- Header nav with "Alkalmazások" active
- Summary cards row at top (3 cards: Total apps, Total deployments, Apps with errors)
- Time range selector buttons (24h / 7d / 30d) — links to `?period=...`
- Main table with columns: App name (link to /apps/{name}), Deployments, Avg Memory, P95 Memory, Catalog Estimate, Catalog Limit, Estimate Accuracy (icon), Errors (24h badge), Warnings (24h badge)
- Sortable column headers (links to `?sort=...&order=...`)
- Estimate accuracy: green dot if P95 < limit, yellow if P95 > 50% of limit, red if P95 > limit
- All text in Hungarian
### 7h. Template: `hub/internal/web/templates/app_detail.html` (NEW)
Per-app detail page with:
1. **Overview card**: App name, display name, catalog estimates, deployment count, suggested mem_limit
2. **Memory trend chart**: `<canvas id="memoryChart">`, Chart.js line chart with:
- Lines: Avg Memory (blue), Peak Memory (red)
- Dashed horizontal line: Catalog mem_limit (green)
- Data from `chartData` (injected via `{{ json .ChartData }}`)
3. **Customer breakdown table**: Customer link, Avg Memory, Peak Memory, CPU, Errors, Last Report
4. **Common issues table**: Severity badge, Message, Occurrences, Affected Customers, First/Last Seen
Include Chart.js: `<script src="/static/chart.min.js"></script>`
### 7i. Template: `hub/internal/web/templates/customer_unified.html` (MODIFY)
Add new section "Alkalmazás telemetria" (conditionally shown):
```html
{{ if .HasAppTelemetry }}
<div class="section">
<h2>Alkalmazás telemetria</h2>
<table class="data-table">
<thead>
<tr>
<th>Alkalmazás</th>
<th>Memória (jelenlegi)</th>
<th>Memória (átlag 7d)</th>
<th>Memória (csúcs 7d)</th>
<th>Katalógus limit</th>
<th>Hibák (24ó)</th>
<th>Figyelmeztetések (24ó)</th>
</tr>
</thead>
<tbody>
{{ range .AppTelemetry }}
<tr>
<td><a href="/apps/{{ .AppName }}">{{ .DisplayName }}</a></td>
<td>{{ formatFloat .MemoryCurrentMB }} MB</td>
<td>{{ formatFloat .MemoryAvgMB }} MB</td>
<td>{{ formatFloat .MemoryPeakMB }} MB</td>
<td>{{ .CatalogLimit }}</td>
<td>{{ if gt .LogErrors 0 }}<span class="badge badge-error">{{ .LogErrors }}</span>{{ else }}0{{ end }}</td>
<td>{{ if gt .LogWarnings 0 }}<span class="badge badge-warn">{{ .LogWarnings }}</span>{{ else }}0{{ end }}</td>
</tr>
{{ end }}
</tbody>
</table>
</div>
{{ end }}
```
Memory cell coloring: Use inline style or CSS class based on percentage of catalog_limit. This requires a template function to parse the limit string and compare — add a `memoryColor(currentMB float64, limitStr string) string` template function that returns a CSS color.
### 7j. CSS additions in `hub/internal/web/templates/style.css` (MODIFY)
Add styles for:
- `.badge`, `.badge-error` (red), `.badge-warn` (yellow)
- `.summary-cards` (flex row of 3 cards)
- `.summary-card` (dark card with number + label)
- `.chart-container` (responsive canvas wrapper)
- `.period-selector` (button group for time ranges)
- `.accuracy-dot` (small colored circle for estimate accuracy)
- Memory cell colors: `.mem-ok` (green text), `.mem-warn` (yellow), `.mem-danger` (red)
Follow existing hub dark theme color variables.
---
## Phase 8: Version Bumps & Changelog
### Controller
- Update version constant to `v0.28.0` in the appropriate file (likely `cmd/controller/main.go` or a `version.go` file)
- Add CHANGELOG.md entry
### Hub
- Update version constant to `v0.4.0`
- Add CHANGELOG.md entry (if hub has one)
---
## Implementation Checklist
### Controller (can be deployed and tested independently)
- [ ] `internal/metrics/telemetry.go` — GetContainerTelemetry method
- [ ] `internal/metrics/logscanner.go` — ScanContainerLogs + types
- [ ] `internal/report/types.go` — Add AppTelemetry field + type definitions
- [ ] `internal/report/telemetry.go` — buildAppTelemetry + buildAppTelemetrySection
- [ ] `internal/report/builder.go` — Call buildAppTelemetrySection in BuildReport
- [ ] Test: build, deploy to demo, verify `app_telemetry` in report JSON
### Hub (deploy after controller verified)
- [ ] `internal/store/store.go` — Tables + types + all store methods
- [ ] `internal/api/handler.go` — Parse & save telemetry from reports
- [ ] `cmd/hub/main.go` — Add prune calls
- [ ] `internal/web/static/chart.min.js` — Copy from controller
- [ ] `internal/web/embed.go` — Add chart.min.js embed
- [ ] `internal/web/server.go` — Routes + handlers + chart.js serving + parsePeriod helper
- [ ] `internal/web/templates/apps.html` — Fleet app list page
- [ ] `internal/web/templates/app_detail.html` — App detail page with chart
- [ ] `internal/web/templates/customer_unified.html` — Add telemetry section
- [ ] `internal/web/templates/style.css` — New styles
- [ ] Navigation update in all templates (add "Alkalmazások" link)
- [ ] Test: deploy hub, verify pages after a few report cycles
---
## Notes for Implementation
1. **Run `go build ./...` after each phase** to catch compile errors early
2. **The report package already imports `metrics` and `stacks`** — no new import cycles
3. **Hub templates use `template.Must(template.New("").Funcs(funcMap).ParseFS(templateFS, "templates/*.html"))`** — new .html files are automatically picked up
4. **Hub already has `embed.go`** at `hub/internal/web/embed.go` — add chart.min.js embed directive there. Create `hub/internal/web/static/` directory
5. **All hub template functions** are defined in `server.go` New() constructor — add any new functions (like `memoryColor`) there
6. **CSRF tokens**: All POST forms need `<input type="hidden" name="_csrf" value="{{ .CSRFToken }}">` — but the apps pages are read-only (GET only), so no CSRF needed
7. **Hub's ServeHTTP uses hasSuffix/hasPrefix routing** — put exact matches (`/apps`) BEFORE prefix matches. The `/apps/{name}` route must come before `/apps/` fallback