Files

15 KiB
Raw Permalink Blame History

TASK: Add Telemetry Debug Section to Controller Debug Page

Scope: Controller only — add a "Telemetria teszt" section to the existing Debug page

Overview

The App Telemetry feature (metrics collection, log scanning, hub reporting, hub dashboard) is fully implemented and deployed. The remaining task is to add a debug section that lets the operator run telemetry collection on-demand and see the results — useful for verifying container→stack mapping, checking metrics DB values, and testing log scanner pattern matching without waiting for the 15-minute report cycle.


What to Implement

1. New debug endpoint: GET /api/debug/telemetry

File: controller/internal/web/handler_debug.go

Add a new route case in handleDebugAPI() (between the existing "Section 5: Hub" and "Section 6: Self-Update" blocks, around line 83):

// Section: Telemetry testing
case subpath == "telemetry" && r.Method == http.MethodGet:
    s.debugTelemetry(w, r)

Add the handler function:

func (s *Server) debugTelemetry(w http.ResponseWriter, r *http.Request) {
    start := time.Now()
    telemetry := report.BuildAppTelemetryForDebug(s.stackMgr, s.metricsStore, s.logger)
    latency := time.Since(start).Milliseconds()

    writeDebugJSON(w, http.StatusOK, true, fmt.Sprintf("Telemetria összegyűjtve: %d alkalmazás (%dms)", len(telemetry), latency), map[string]interface{}{
        "latency_ms":    latency,
        "app_count":     len(telemetry),
        "app_telemetry": telemetry,
    })
}

Problem: The Server struct doesn't have metricsStore. There are two approaches:

Approach A (preferred): Add a TriggerTelemetryTest callback to DebugCallbacks

This follows the existing pattern for operations needing main.go wiring. The Server struct lacks metricsStore but main.go has it.

In handler_debug.go, add to the DebugCallbacks struct:

type DebugCallbacks struct {
    TriggerHubReportPush   func() error
    TriggerHubInfraPush    func() error
    TriggerLocalInfraWrite func() error
    TriggerSetupMode       func() error
    HubConnectivityTest    func() (statusCode int, latencyMs int64, err error)
    GiteaConnectivityTest  func() (statusCode int, latencyMs int64, err error)
    GetTelemetryPreview    func() ([]report.AppTelemetry, error)  // NEW
}

Then the handler becomes:

func (s *Server) debugTelemetry(w http.ResponseWriter, r *http.Request) {
    if s.debugCallbacks == nil || s.debugCallbacks.GetTelemetryPreview == nil {
        writeDebugJSON(w, http.StatusNotImplemented, false, "Nem bekötött", nil)
        return
    }
    start := time.Now()
    telemetry, err := s.debugCallbacks.GetTelemetryPreview()
    latency := time.Since(start).Milliseconds()
    if err != nil {
        writeDebugJSON(w, http.StatusOK, false, err.Error(), map[string]interface{}{"latency_ms": latency})
        return
    }

    // Compute summary stats
    totalErrors := 0
    totalWarnings := 0
    for _, app := range telemetry {
        totalErrors += app.LogErrors
        totalWarnings += app.LogWarnings
    }

    writeDebugJSON(w, http.StatusOK, true,
        fmt.Sprintf("Telemetria összegyűjtve: %d app, %d hiba, %d figyelmeztetés (%dms)",
            len(telemetry), totalErrors, totalWarnings, latency),
        map[string]interface{}{
            "latency_ms":     latency,
            "app_count":      len(telemetry),
            "total_errors":   totalErrors,
            "total_warnings": totalWarnings,
            "app_telemetry":  telemetry,
        })
}

Approach B: Export buildAppTelemetrySection from the report package

In controller/internal/report/telemetry.go, the function buildAppTelemetrySection is private (lowercase). Create a public wrapper:

// BuildAppTelemetryForDebug runs the telemetry collection pipeline (metrics + log scan)
// and returns the result. Used by the debug endpoint.
func BuildAppTelemetryForDebug(stackMgr *stacks.Manager, metricsStore *metrics.MetricsStore, logger *log.Logger) []AppTelemetry {
    return buildAppTelemetrySection(stackMgr, metricsStore, logger)
}

Then wire the callback in main.go using this exported function.

2. Wire the callback in cmd/controller/main.go

In the DebugCallbacks setup block (around line 615), add:

dc.GetTelemetryPreview = func() ([]report.AppTelemetry, error) {
    return report.BuildAppTelemetryForDebug(stackMgr, metricsStore, logger), nil
}

This goes inside the existing if cfg.Logging.Level == "debug" block, alongside the other callback assignments. It does NOT need to be inside the if hubPusher != nil guard — telemetry works regardless of hub config.

3. Add the debug page section in debug.html

File: controller/internal/web/templates/debug.html

Add a new section between "Hub & Kapcsolatok" (section-hub) and "Önfrissítés teszt" (section-selfupdate). Insert after the closing </div> of section-hub (line 128) and before the <!-- Section 6: Self-Update Testing --> comment (line 130):

<!-- Section: Telemetry Testing -->
<div class="card debug-section" id="section-telemetry">
    <div class="card-header debug-section-header" onclick="toggleSection('telemetry')">
        <h3>Telemetria teszt</h3>
        <span class="section-toggle"></span>
    </div>
    <div class="card-body debug-section-body" style="display:none">
        <div id="telemetry-status"><span class="text-muted">Kattintson a gombra a telemetria futtatásához.</span></div>
        <div class="debug-actions">
            <button class="btn btn-primary btn-sm" id="btn-telemetry-run" data-label="Telemetria futtatása" onclick="runTelemetryTest()">Telemetria futtatása</button>
            <span class="debug-result" id="btn-telemetry-run-result"></span>
        </div>
        <div id="telemetry-detail" style="display:none; margin-top:1rem;"></div>
    </div>
</div>

4. Add JavaScript for the telemetry section in debug.html

Add in the <script> block, alongside the other section functions. Also update loadSectionData to add the telemetry case (though this section doesn't auto-load — the user clicks the button):

In loadSectionData, add:

case 'telemetry': break; // no auto-load, user triggers manually

Add the runTelemetryTest() function:

// ── Telemetry test ──
function runTelemetryTest() {
    var btn = document.getElementById('btn-telemetry-run');
    var result = document.getElementById('btn-telemetry-run-result');
    var detail = document.getElementById('telemetry-detail');
    btn.disabled = true;
    btn.textContent = 'Folyamatban...';
    result.className = 'debug-result';
    result.textContent = '';
    detail.style.display = 'none';

    fetch('/api/debug/telemetry', {headers: csrfHeaders()}).then(function(r){return r.json()}).then(function(data) {
        if (data.ok) {
            result.className = 'debug-result debug-result-ok';
            result.textContent = data.message;
            if (data.data && data.data.app_telemetry) {
                renderTelemetryDetail(data.data);
            }
        } else {
            result.className = 'debug-result debug-result-error';
            result.textContent = data.error || 'Hiba';
        }
    }).catch(function(e) {
        result.className = 'debug-result debug-result-error';
        result.textContent = 'Hálózati hiba: ' + e.message;
    }).finally(function() {
        btn.disabled = false;
        btn.textContent = btn.dataset.label;
    });
}

function renderTelemetryDetail(data) {
    var detail = document.getElementById('telemetry-detail');
    var apps = data.app_telemetry || [];
    if (apps.length === 0) {
        detail.innerHTML = '<span class="text-muted">Nincs telepített alkalmazás vagy nincs mérési adat.</span>';
        detail.style.display = 'block';
        return;
    }

    var html = '<table class="debug-table" style="width:100%;font-size:.85rem">';
    html += '<thead><tr><th>Alkalmazás</th><th>Konténerek</th><th>Memória (jelen.)</th><th>Memória (átlag)</th><th>Memória (csúcs)</th><th>CPU (átlag)</th><th>Katalógus limit</th><th>Hibák</th><th>Figyelmeztetések</th></tr></thead><tbody>';

    for (var i = 0; i < apps.length; i++) {
        var a = apps[i];
        var errorClass = a.log_errors > 0 ? ' style="color:var(--red);font-weight:600"' : '';
        var warnClass = a.log_warnings > 0 ? ' style="color:var(--yellow);font-weight:600"' : '';
        html += '<tr>';
        html += '<td><strong>' + escHtml(a.display_name || a.app_name) + '</strong></td>';
        html += '<td class="mono" style="font-size:.8rem">' + (a.containers||[]).join(', ') + '</td>';
        html += '<td>' + a.memory_current_mb.toFixed(1) + ' MB</td>';
        html += '<td>' + a.memory_avg_mb.toFixed(1) + ' MB</td>';
        html += '<td>' + a.memory_peak_mb.toFixed(1) + ' MB</td>';
        html += '<td>' + a.cpu_avg_percent.toFixed(1) + '%</td>';
        html += '<td class="mono">' + (a.catalog_limit || '-') + '</td>';
        html += '<td' + errorClass + '>' + a.log_errors + '</td>';
        html += '<td' + warnClass + '>' + a.log_warnings + '</td>';
        html += '</tr>';

        // Show issues as sub-rows if any
        if (a.issues && a.issues.length > 0) {
            for (var j = 0; j < a.issues.length; j++) {
                var issue = a.issues[j];
                var sevColor = issue.severity === 'error' ? 'var(--red)' : 'var(--yellow)';
                html += '<tr style="font-size:.8rem;opacity:.85"><td></td>';
                html += '<td colspan="6" style="padding-left:1.5rem"><span style="color:' + sevColor + '">' + issue.severity.toUpperCase() + '</span> ' + escHtml(issue.message) + '</td>';
                html += '<td colspan="2">×' + issue.count + '</td></tr>';
            }
        }
    }
    html += '</tbody></table>';

    // Also show raw JSON in a collapsible detail
    html += '<details style="margin-top:1rem"><summary class="text-muted" style="cursor:pointer;font-size:.85rem">Nyers JSON</summary>';
    html += '<pre class="mono" style="font-size:.75rem;max-height:400px;overflow:auto;padding:.5rem;background:rgba(0,0,0,.3);border-radius:.25rem;margin-top:.5rem">' + escHtml(JSON.stringify(apps, null, 2)) + '</pre>';
    html += '</details>';

    detail.innerHTML = html;
    detail.style.display = 'block';
}

Note: The escHtml function should already exist in the debug page JS (check if it's already defined; if not, add a simple one):

function escHtml(s) {
    var d = document.createElement('div');
    d.textContent = s;
    return d.innerHTML;
}

5. Export the telemetry builder from the report package

File: controller/internal/report/telemetry.go

Add this exported function at the end of the file:

// BuildAppTelemetryForDebug runs the full telemetry collection pipeline
// (metrics query + log scan) and returns per-app telemetry data.
// Used by the debug endpoint to preview telemetry without pushing to hub.
func BuildAppTelemetryForDebug(stackMgr *stacks.Manager, metricsStore *metrics.MetricsStore, logger *log.Logger) []AppTelemetry {
	return buildAppTelemetrySection(stackMgr, metricsStore, logger)
}

Files Changed Summary

File Change
controller/internal/report/telemetry.go MODIFY — Add BuildAppTelemetryForDebug() public wrapper
controller/internal/web/handler_debug.go MODIFY — Add GetTelemetryPreview to DebugCallbacks, add debugTelemetry() handler, add route case
controller/cmd/controller/main.go MODIFY — Wire GetTelemetryPreview callback in debug callbacks block
controller/internal/web/templates/debug.html MODIFY — Add "Telemetria teszt" section + JS rendering

Key Implementation Details

Existing patterns to follow exactly:

  1. DebugCallbacks pattern — See existing callbacks like TriggerHubReportPush at handler_debug.go:24-30. The new GetTelemetryPreview follows the same nil-check guard pattern used by debugHubPush() at line 460.

  2. writeDebugJSON — Standard response helper at handler_debug.go:102-120. Always use this for debug endpoint responses.

  3. Button pattern — Button needs id, data-label (original text, restored after action), and onclick calling the JS function. Result span has id="btn-{id}-result".

  4. JS function pattern — Disable button → change text to "Folyamatban..." → fetch endpoint → show result → restore button. See triggerAction() at approximately line 220 in debug.html.

  5. Section HTML patterndiv.card.debug-section > div.card-header.debug-section-header + div.card-body.debug-section-body. The body starts with style="display:none".

  6. loadSectionData switch — At approximately line 262 in debug.html. Add the 'telemetry' case to the switch (even though it's a no-op, for consistency).

  7. All UI text in Hungarian — "Telemetria teszt", "Telemetria futtatása", "Folyamatban...", "Nincs telepített alkalmazás", "Nyers JSON", "Hibák", "Figyelmeztetések", etc.

Import needed in handler_debug.go:

The debugTelemetry handler uses report.AppTelemetry type via the callback return. The import for the report package:

import "gitea.dooplex.hu/admin/felhom-controller/internal/report"

Check if this import already exists in handler_debug.go — it likely doesn't (current imports are for backup, monitor, stacks, system). Add it.

CSS already available:

The debug page already has styles for debug-table, debug-result, debug-result-ok, debug-result-error, debug-actions, debug-kv-grid, mono, text-muted, btn, btn-primary, btn-sm, etc. No new CSS needed.

CSS variables available:

  • var(--red) — for errors
  • var(--yellow) — for warnings
  • var(--green) — for success/ok

Testing

After deployment:

  1. Go to Debug page on the controller
  2. Expand "Telemetria teszt" section
  3. Click "Telemetria futtatása"
  4. Verify: table shows all deployed non-protected stacks
  5. Verify: memory values match what you see on the Rendszermonitor page
  6. Verify: log errors/warnings appear (intentionally stop a dependent container to generate errors)
  7. Verify: "Nyers JSON" collapsible shows the exact payload that would go to the hub
  8. Verify: the scan completes in reasonable time (shown in the result message as "Xms")

Build & Deploy

SSH=/c/Windows/System32/OpenSSH/ssh.exe
# After committing and pushing:
$SSH kisfenyo@192.168.0.180 "cd ~/build/felhom-controller && git -C ~/git/deploy-felhom-compose pull && ./build.sh v0.28.0 --push"
$SSH kisfenyo@192.168.0.162 "cd /opt/docker/felhom-controller && sudo docker pull gitea.dooplex.hu/admin/felhom-controller:v0.28.0 && sudo sed -i 's|image: gitea.dooplex.hu/admin/felhom-controller:.*|image: gitea.dooplex.hu/admin/felhom-controller:v0.28.0|' docker-compose.yml && sudo docker compose up -d"

Note: If v0.28.0 is already built/deployed with the telemetry feature, bump to v0.28.1 for this debug addition.