Files
deploy-felhom-compose/BUGHUNT.md
T

32 KiB
Raw Blame History

Bug Hunt Report — DR Implementation (TASK.md + TASK2.md)

Date: 2026-02-19 Scope: All code implemented in TASK.md (docker-setup.sh v5.0) and TASK2.md (DR phases 1-4) Purpose: Detailed fix instructions for Sonnet to implement


Priority Legend

Priority Meaning
P0 Must fix before production — crashes, data loss, security
P1 Should fix before production — logic errors, race conditions
P2 Fix soon — defensive improvements, consistency
P3 Nice to have — code quality, edge cases

AREA 1: Controller — handler_restore.go (Race Conditions)

All restore handlers read s.restorePlan without holding s.restoreMu, causing nil pointer panics and double-restore races. This is the single most critical area.

BUG 1.1 — P0: restorePageHandler reads restorePlan without lock

File: controller/internal/web/handler_restore.go lines 14-28 Impact: Nil pointer dereference panic if clearRestoreMode() runs between nil-check (line 14) and field access (lines 24-28).

Current code:

func (s *Server) restorePageHandler(w http.ResponseWriter, r *http.Request) {
    if s.restorePlan == nil {           // line 14: UNLOCKED
        http.Redirect(w, r, "/", http.StatusFound)
        return
    }
    data := map[string]interface{}{
        "CustomerID":   s.restorePlan.CustomerID,  // line 24: UNLOCKED
        "Timestamp":    s.restorePlan.Timestamp,    // line 25
        "Apps":         s.restorePlan.GetApps(),    // line 26
        "Drives":       s.restorePlan.Drives,       // line 27: direct ref
        "PlanStatus":   s.restorePlan.Status,       // line 28: UNLOCKED
    }

Fix: Hold restoreMu.RLock() from the nil-check through all field reads. Copy everything under lock, then release before rendering:

func (s *Server) restorePageHandler(w http.ResponseWriter, r *http.Request) {
    s.restoreMu.RLock()
    plan := s.restorePlan
    if plan == nil {
        s.restoreMu.RUnlock()
        http.Redirect(w, r, "/", http.StatusFound)
        return
    }
    // Snapshot all needed fields under lock
    customerID := plan.CustomerID
    timestamp := plan.Timestamp
    apps := plan.GetApps()
    drives := make([]backup.DriveInfo, len(plan.Drives))
    copy(drives, plan.Drives)
    status := plan.Status
    s.restoreMu.RUnlock()

    data := map[string]interface{}{
        "Title":        "Katasztrófa utáni visszaállítás",
        "CustomerName": s.cfg.Customer.Name,
        "Domain":       s.cfg.Customer.Domain,
        "Version":      s.version,
        "CustomerID":   customerID,
        "Timestamp":    timestamp,
        "Apps":         apps,
        "Drives":       drives,
        "PlanStatus":   status,
    }
    s.render(w, "restore", data)
}

Note: plan.GetApps() already holds its own internal rp.mu.RLock() — this is fine, nested RLocks are safe. However, plan.Status and plan.Drives need the outer restoreMu to prevent nil dereference.


BUG 1.2 — P0: apiRestoreStatus reads restorePlan without lock

File: controller/internal/web/handler_restore.go lines 35-42 Impact: Nil pointer panic if clearRestoreMode() sets s.restorePlan = nil between lines 36 and 42.

Fix: Same pattern — hold restoreMu.RLock:

func (s *Server) apiRestoreStatus(w http.ResponseWriter, r *http.Request) {
    s.restoreMu.RLock()
    plan := s.restorePlan
    if plan == nil {
        s.restoreMu.RUnlock()
        jsonError(w, "not in restore mode", http.StatusBadRequest)
        return
    }
    snapshot := plan.Snapshot()
    s.restoreMu.RUnlock()

    w.Header().Set("Content-Type", "application/json; charset=utf-8")
    json.NewEncoder(w).Encode(snapshot)
}

BUG 1.3 — P0: apiRestoreAll has double-restore race + unlocked writes

File: controller/internal/web/handler_restore.go lines 46-63 Impact: Two concurrent POST requests both pass the Status == "restoring" check and spawn two parallel goroutines restoring the same apps, causing data corruption.

Lines 51 and 56 read/write s.restorePlan.Status without any lock.

Fix: Hold restoreMu.RLock for the nil check, then use the plan's internal mu.Lock for the status check-and-set:

func (s *Server) apiRestoreAll(w http.ResponseWriter, r *http.Request) {
    s.restoreMu.RLock()
    plan := s.restorePlan
    s.restoreMu.RUnlock()
    if plan == nil {
        jsonError(w, "not in restore mode", http.StatusBadRequest)
        return
    }

    // Atomic check-and-set under plan's internal lock
    plan.mu.Lock()
    if plan.Status == "restoring" {
        plan.mu.Unlock()
        jsonError(w, "restore already in progress", http.StatusConflict)
        return
    }
    plan.Status = "restoring"
    plan.mu.Unlock()

    go s.executeAllRestores()

    jsonResponse(w, map[string]interface{}{
        "ok":      true,
        "message": "Visszaállítás elindítva",
    })
}

Important: This requires RestorePlan.mu to be exported OR adding a helper method. The simplest approach: add a SetStatus(status string) bool method to RestorePlan that returns false if already "restoring":

// Add to restore_scan.go:
// TryStartRestore atomically sets status to "restoring" if currently "pending".
// Returns false if already restoring.
func (rp *RestorePlan) TryStartRestore() bool {
    rp.mu.Lock()
    defer rp.mu.Unlock()
    if rp.Status == "restoring" {
        return false
    }
    rp.Status = "restoring"
    return true
}

// SetStatus sets the overall plan status.
func (rp *RestorePlan) SetStatus(status string) {
    rp.mu.Lock()
    defer rp.mu.Unlock()
    rp.Status = status
}

// GetStatus returns the current plan status.
func (rp *RestorePlan) GetStatus() string {
    rp.mu.RLock()
    defer rp.mu.RUnlock()
    return rp.Status
}

Then apiRestoreAll becomes:

func (s *Server) apiRestoreAll(w http.ResponseWriter, r *http.Request) {
    s.restoreMu.RLock()
    plan := s.restorePlan
    s.restoreMu.RUnlock()
    if plan == nil {
        jsonError(w, "not in restore mode", http.StatusBadRequest)
        return
    }
    if !plan.TryStartRestore() {
        jsonError(w, "restore already in progress", http.StatusConflict)
        return
    }
    go s.executeAllRestores()
    jsonResponse(w, map[string]interface{}{
        "ok":      true,
        "message": "Visszaállítás elindítva",
    })
}

BUG 1.4 — P0: apiRestoreSkip reads restorePlan without lock

File: controller/internal/web/handler_restore.go lines 66-70 Impact: Same nil pointer race as other handlers.

Fix:

func (s *Server) apiRestoreSkip(w http.ResponseWriter, r *http.Request) {
    s.restoreMu.RLock()
    plan := s.restorePlan
    s.restoreMu.RUnlock()
    if plan == nil {
        jsonError(w, "not in restore mode", http.StatusBadRequest)
        return
    }

    s.logger.Println("[INFO] User skipped DR restore — entering normal mode")
    s.clearRestoreMode()

    jsonResponse(w, map[string]interface{}{
        "ok":      true,
        "message": "Visszaállítás kihagyva",
    })
}

BUG 1.5 — P1: executeAllRestores writes Status/Apps without lock

File: controller/internal/web/handler_restore.go lines 82-125 Impact:

  • Line 85: s.restorePlan.Apps accessed without lock (could be nil if cleared)
  • Line 86: &s.restorePlan.Apps[i] — pointer into slice that could be invalidated
  • Line 107: s.restorePlan.Status = "done" — unlocked write

Fix: Use the helper methods from BUG 1.3. Also guard against restorePlan being nil:

func (s *Server) executeAllRestores() {
    s.logger.Println("[INFO] Starting DR restore for all apps")

    s.restoreMu.RLock()
    plan := s.restorePlan
    s.restoreMu.RUnlock()
    if plan == nil {
        s.logger.Println("[WARN] Restore plan cleared before execution could start")
        return
    }

    for i := range plan.Apps {
        app := &plan.Apps[i]
        if app.Status != "pending" {
            continue
        }

        plan.UpdateApp(app.Name, "restoring", "")
        s.logger.Printf("[INFO] Restoring app %s (%s)", app.Name, app.DisplayName)

        ctx, cancel := context.WithTimeout(context.Background(), 10*time.Minute)
        err := backup.RestoreAppFromBackup(ctx, app, s.cfg.Paths.StacksDir, s.logger)
        cancel()

        if err != nil {
            plan.UpdateApp(app.Name, "failed", err.Error())
            s.logger.Printf("[ERROR] Restore failed for %s: %v", app.Name, err)
        } else {
            plan.UpdateApp(app.Name, "done", "")
            s.logger.Printf("[INFO] Restore completed for %s", app.Name)
        }
    }

    plan.SetStatus("done")   // <-- use new helper instead of direct write
    s.logger.Println("[INFO] All app restores completed")

    // Re-scan stacks so dashboard picks up restored apps
    if s.stackMgr != nil {
        if err := s.stackMgr.ScanStacks(); err != nil {
            s.logger.Printf("[WARN] Post-restore stack scan failed: %v", err)
        }
    }
}

Also delete the no-op goroutine at lines 117-124 (see BUG 1.6).


BUG 1.6 — P2: No-op auto-clear goroutine (dead code)

File: controller/internal/web/handler_restore.go lines 117-124 Impact: Spawns a goroutine that sleeps 5s and does nothing. Confusing code.

Fix: Delete lines 117-124 entirely. The user clicks "continue to dashboard" to clear restore mode — this is already implemented and correct.


AREA 2: Controller — restore_scan.go

BUG 2.1 — P0: dirIsEmpty() returns true on read errors

File: controller/internal/backup/restore_scan.go lines 234-237 Impact: If os.ReadDir() fails (permission denied, I/O error), dirIsEmpty() returns true because err != nil satisfies the condition. This causes the scan to skip valid DB dump directories that are merely unreadable, silently losing backup data.

Current code:

func dirIsEmpty(path string) bool {
    entries, err := os.ReadDir(path)
    return err != nil || len(entries) == 0
}

Fix: Only return true for genuinely empty directories, return false on errors (assume not empty):

func dirIsEmpty(path string) bool {
    entries, err := os.ReadDir(path)
    if err != nil {
        return false // assume non-empty on error — safer for backup detection
    }
    return len(entries) == 0
}

BUG 2.2 — P1: Snapshot() returns direct slice references (shallow copy)

File: controller/internal/backup/restore_scan.go lines 71-80 Impact: The returned map holds direct references to rp.Apps and rp.Drives. While the RLock is held during json.Encode, if the encoding is ever separated from the lock (e.g., by future code changes), modifications to Apps/Drives would corrupt the snapshot. Inconsistent with GetApps() which properly copies.

Fix: Make defensive copies like GetApps() does:

func (rp *RestorePlan) Snapshot() map[string]interface{} {
    rp.mu.RLock()
    defer rp.mu.RUnlock()

    apps := make([]RestorableApp, len(rp.Apps))
    copy(apps, rp.Apps)

    drives := make([]DriveInfo, len(rp.Drives))
    copy(drives, rp.Drives)

    return map[string]interface{}{
        "ok":     true,
        "status": rp.Status,
        "apps":   apps,
        "drives": drives,
    }
}

AREA 3: Controller — infra_backup.go (Silent Failures)

BUG 3.1 — P0: BuildInfraBackup silently drops file read errors

File: controller/internal/report/infra_backup.go lines 55-67 Impact: If controller.yaml, settings.json, or the restic password file can't be read (permissions, moved, disk error), the if err == nil pattern silently stores an empty string. The Hub receives an infra backup with missing critical data, and disaster recovery will fail silently (no config, no settings, no password → broken restore).

Current code:

if data, err := os.ReadFile(controllerYAMLPath); err == nil {
    ib.ControllerConfigB64 = base64.StdEncoding.EncodeToString(data)
}
// same for settings.json and restic password

Fix: Log warnings for missing files so the operator knows the backup is incomplete. Return error for controller.yaml (truly critical):

// Read and encode controller.yaml (critical — fail if unreadable)
data, err := os.ReadFile(controllerYAMLPath)
if err != nil {
    return nil, fmt.Errorf("reading controller config %s: %w", controllerYAMLPath, err)
}
ib.ControllerConfigB64 = base64.StdEncoding.EncodeToString(data)

// Read and encode settings.json (important but non-fatal)
if data, err := os.ReadFile(settingsPath); err == nil {
    ib.SettingsJSONB64 = base64.StdEncoding.EncodeToString(data)
} else if !os.IsNotExist(err) {
    // Log warning: file exists but unreadable
    // (Note: BuildInfraBackup doesn't have a logger param — either add one,
    //  or add a Warnings []string field to InfraBackup, or wrap the error)
}

// Read primary restic password (important but non-fatal)
if data, err := os.ReadFile(resticPasswordFile); err == nil {
    ib.ResticPassword = base64.StdEncoding.EncodeToString(data)
} else if !os.IsNotExist(err) {
    // Same: log or accumulate warning
}

Simplest approach: Add a logger *log.Logger parameter to BuildInfraBackup (already passed in the caller pushInfraBackup in main.go, so just thread it through). Then log warnings:

func BuildInfraBackup(
    customerID, domain, version string,
    controllerYAMLPath string,
    settingsPath string,
    resticPasswordFile string,
    systemDataPath string,
    sett *settings.Settings,
    stackProvider backup.StackDataProvider,
    logger *log.Logger,       // <-- ADD THIS
) (*InfraBackup, error) {

In pushInfraBackup in main.go, update the call to pass logger.


BUG 3.2 — P2: CrossDrivePassword encoding inconsistency

File: controller/internal/report/infra_backup.go lines 70-72 Impact: ResticPassword is base64-encoded (line 66) but CrossDrivePassword is stored raw (line 71). While the restore code in main.go handles both correctly (decodes ResticPassword from base64, uses CrossDrivePassword as-is), this inconsistency could cause bugs if someone adds base64 decoding to CrossDrivePassword in the future.

Fix (optional — low priority): Add a comment explaining the asymmetry:

// Cross-drive password is stored as plain text (not base64) because it's
// already a string in settings, unlike ResticPassword which comes from a file.
if pw := sett.GetCrossDriveResticPassword(); pw != "" {
    ib.CrossDrivePassword = pw
}

AREA 4: Controller — main.go DR Wiring

BUG 4.1 — P1: Cross-drive password may be lost on settings reload

File: controller/cmd/controller/main.go lines 80-89 Impact: restorePasswordsFromHub() sets the cross-drive password on sett (line 80), which also saves it to settings.json. Then restoreSettingsFromHub() (line 83) overwrites settings.json with the Hub backup version. Then sett is replaced with a fresh load (line 87). The password is preserved IF the Hub backup's settings.json already contains it. But if the explicit ib.CrossDrivePassword field was updated after the settings.json was last saved, the password is lost.

Fix: Move password restoration AFTER settings reload:

// Restore settings.json from Hub backup (do this FIRST)
restoreSettingsFromHub(ib, cfg, logger)

// Re-load settings (now from restored file)
if restoredSett, loadErr := settings.Load(settingsPath, logger); loadErr == nil {
    sett = restoredSett
    logger.Println("[INFO] Settings reloaded after Hub restore")
}

// Restore restic passwords (AFTER settings reload so cross-drive password persists)
restorePasswordsFromHub(ib, cfg, sett, logger)

BUG 4.2 — P1: No nil check after ScanDrivesForBackups

File: controller/cmd/controller/main.go lines 124-127 Impact: If ScanDrivesForBackups() were to return nil (currently it doesn't, but it could in a future change), lines 125-127 would panic with nil pointer dereference.

Fix: Add nil guard:

restorePlan = backup.ScanDrivesForBackups(drivePaths, infraStacks, logger)
if restorePlan != nil {
    restorePlan.CustomerID = ib.CustomerID
    restorePlan.Domain = ib.Domain
    restorePlan.Timestamp = ib.Timestamp
    logger.Printf("[INFO] DR restore plan ready: %d apps to restore", len(restorePlan.Apps))
} else {
    logger.Println("[WARN] ScanDrivesForBackups returned nil — no restore plan created")
}

BUG 4.3 — P2: os.MkdirAll error ignored for restic password dir

File: controller/cmd/controller/main.go line 624 Impact: If directory creation fails, the subsequent WriteFile fails with a confusing "no such file or directory" error.

Fix:

dir := filepath.Dir(cfg.Backup.ResticPasswordFile)
if err := os.MkdirAll(dir, 0700); err != nil {
    logger.Printf("[WARN] Failed to create restic password directory %s: %v", dir, err)
} else if err := os.WriteFile(cfg.Backup.ResticPasswordFile, decoded, 0600); err == nil {
    logger.Println("[INFO] Primary restic password restored from Hub")
} else {
    logger.Printf("[WARN] Failed to write restic password file: %v", err)
}

BUG 4.4 — P3: restoreSettingsFromHub doesn't ensure data directory exists

File: controller/cmd/controller/main.go lines 652-653 Impact: On a truly fresh deployment, if cfg.Paths.DataDir doesn't exist yet, WriteFile will fail.

Fix: Add os.MkdirAll before write:

func restoreSettingsFromHub(ib *report.InfraBackup, cfg *config.Config, logger *log.Logger) {
    if ib.SettingsJSONB64 == "" {
        return
    }
    decoded, err := base64.StdEncoding.DecodeString(ib.SettingsJSONB64)
    if err != nil {
        logger.Printf("[WARN] Failed to decode settings from Hub: %v", err)
        return
    }
    if err := os.MkdirAll(cfg.Paths.DataDir, 0755); err != nil {
        logger.Printf("[WARN] Failed to create data directory: %v", err)
        return
    }
    settingsPath := filepath.Join(cfg.Paths.DataDir, "settings.json")
    // ... rest unchanged

AREA 5: Hub — store.go and handler.go

BUG 5.1 — P2: Unchecked json.Unmarshal errors throughout store.go

File: hub/internal/store/store.go lines 126, 217, 286, 336, 502 Impact: If stored JSON is corrupt, json.Unmarshal errors are silently ignored. Functions proceed with zero-valued structs, returning incorrect data.

Affected locations:

  • Line 126: json.Unmarshal([]byte(eventsJSON), &events) in GetNotificationPrefs
  • Line 217: json.Unmarshal(reportJSON, &parsed) in SaveReport
  • Line 286: json.Unmarshal([]byte(c.ReportJSON), &report) in GetCustomers
  • Line 336: json.Unmarshal([]byte(c.ReportJSON), &report) in GetCustomer
  • Line 502: json.Unmarshal([]byte(reportJSON), &report) in parseDiskSummary

Fix for each: Add error logging (not fatal — corrupted data shouldn't crash the hub):

Line 126:

if err := json.Unmarshal([]byte(eventsJSON), &events); err != nil {
    s.logger.Printf("[WARN] Corrupt enabled_events JSON for %s: %v", customerID, err)
}

Line 217:

if err := json.Unmarshal(reportJSON, &parsed); err != nil {
    s.logger.Printf("[WARN] Cannot parse report fields for denormalization: %v", err)
}

Lines 286, 336, 502: Same pattern — log warning, continue with defaults.

Note for Sonnet: parseDiskSummary (line 502) doesn't have access to a logger. Either add a logger parameter or just add // ignore parse errors — show "" comment. The function already returns "" on failure.


BUG 5.2 — P2: GetInfraBackupMeta swallows unmarshal error

File: hub/internal/store/store.go line 450 Impact: If the stored infra backup JSON is corrupt, the function returns a meta struct with StackCount=0, DiskCount=0 — misleading the dashboard into showing "no stacks, no disks" instead of an error.

Current code:

if json.Unmarshal([]byte(backupJSON), &parsed) == nil {

Fix: Log on unmarshal failure:

if err := json.Unmarshal([]byte(backupJSON), &parsed); err != nil {
    s.logger.Printf("[WARN] Failed to parse infra backup metadata for %s: %v", customerID, err)
} else {
    meta.StackCount = len(parsed.DeployedStacks)
    meta.DiskCount = len(parsed.DiskLayout.Mounts)
}

BUG 5.3 — P3: No customer_id content validation on ingest endpoints

File: hub/internal/api/handler.go lines 88, 226, 313, 348 Impact: Customer IDs with special characters, extremely long strings, or SQL injection attempts are accepted and stored. While Go's SQLite driver uses parameterized queries (safe from SQL injection), excessively long IDs waste storage and could be used in denial-of-service.

Fix (optional): Add a simple validation helper and use it in all handlers:

func isValidCustomerID(id string) bool {
    if len(id) > 128 || len(id) == 0 {
        return false
    }
    for _, c := range id {
        if !((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || (c >= '0' && c <= '9') || c == '-' || c == '_' || c == '.') {
            return false
        }
    }
    return true
}

AREA 6: docker-setup.sh

BUG 6.1 — P0: YAML syntax error — networks: on same line as last volume

File: scripts/docker-setup.sh line ~1297 (in install_filebrowser) Impact: When volume_lines is populated (drives discovered), the networks: key ends up on the same line as the last volume entry, producing invalid YAML. Docker Compose fails → FileBrowser doesn't deploy.

The heredoc line likely looks like:

${volume_lines}    networks:

When volume_lines has content ending with a newline, the output is correct. However, if volume_lines is empty (no drives found), the output becomes:

    volumes:
      - filebrowser_data:/home/filebrowser/data
    networks:

which IS correct. And when volume_lines has entries, each ends with $'\n', so it also works.

Verification needed: Read the actual heredoc to confirm whether this is a real bug or false alarm. The agent may have been wrong about the indentation.

Action: Sonnet should read the install_filebrowser function and verify the heredoc YAML output is correct by checking that volume_lines entries end with proper newlines and the networks: key starts on a new line. If it doesn't, add $'\n' before networks:.


BUG 6.2 — P0: Wizard prompts user even in --dry-run mode

File: scripts/docker-setup.shrun_config_wizard() function (around line 1456) Impact: The DRY_RUN check is at the END of the wizard, after all interactive read prompts. User runs --dry-run expecting a non-interactive preview but gets prompted for customer ID, domain, password, etc.

Also, after the wizard returns in dry-run, WIZ_* variables are empty, so downstream functions (install_cloudflare_tunnel, install_filebrowser, install_controller) see empty values and skip their dry-run output.

Fix: Move the if [[ "$DRY_RUN" == true ]] check to the TOP of run_config_wizard(), right after the step log line and the banner. Set dummy wizard values for downstream functions:

run_config_wizard() {
    local step_num=5
    [[ "$SELF_SIGNED_CERT" == true ]] && ((step_num++))
    log_step "${step_num}/$(get_total_steps) - Running configuration wizard..."

    echo ""
    echo -e "${BOLD}${CYAN}===========================================================${NC}"
    echo -e "${BOLD}${CYAN}  Felhom Controller Configuration Wizard${NC}"
    echo -e "${BOLD}${CYAN}===========================================================${NC}"
    echo ""

    if [[ "$DRY_RUN" == true ]]; then
        echo -e "${CYAN}[DRY-RUN]${NC} Would run interactive wizard and generate controller.yaml"
        # Set dummy values so downstream functions produce correct dry-run output
        WIZ_CUSTOMER_ID="${CUSTOMER_ID:-demo-felhom}"
        WIZ_CUSTOMER_NAME="${WIZ_CUSTOMER_ID}"
        WIZ_DOMAIN="${BASE_DOMAIN:-homeserver.local}"
        WIZ_EMAIL="${ACME_EMAIL:-admin@example.com}"
        WIZ_CF_TUNNEL_TOKEN=""  # Empty = no tunnel in dry-run
        WIZ_CF_API_TOKEN="${CF_DNS_API_TOKEN:-}"
        WIZ_SYSTEM_DATA_PATH="/mnt/sys_drive"
        WIZ_PASSWORD_HASH='<would-be-generated>'
        WIZ_SESSION_SECRET='<would-be-generated>'
        return
    fi

    # ... rest of interactive prompts unchanged

Also: Remove the old DRY_RUN check at the end of the function (around line 1553-1557).


BUG 6.3 — P1: Unquoted Cloudflare Tunnel token in docker-compose env

File: scripts/docker-setup.sh line ~1055 (in install_cloudflare_tunnel) Impact: Tokens containing $, backticks, or other shell/Docker special characters will be misinterpreted.

Fix: Quote the value:

# Change:
      - TUNNEL_TOKEN=${WIZ_CF_TUNNEL_TOKEN}
# To:
      - TUNNEL_TOKEN="${WIZ_CF_TUNNEL_TOKEN}"

BUG 6.4 — P1: htpasswd hash extraction with tr -d ':\n' may include junk

File: scripts/docker-setup.sh line ~1514 Impact: htpasswd -bnBC 10 "" "$password" outputs :<hash>\n. The tr -d ':\n' removes ALL colons and newlines, which works. But if the htpasswd command fails silently, the hash is empty and no error is raised.

Fix: Use cut -d: -f2 instead of tr -d for cleaner extraction, and validate the result:

if command -v htpasswd &>/dev/null; then
    WIZ_PASSWORD_HASH=$(htpasswd -bnBC 10 "" "$wiz_password" 2>/dev/null | cut -d: -f2)
    if [[ ! "$WIZ_PASSWORD_HASH" =~ ^\$2[aby]\$ ]]; then
        log_warn "htpasswd failed — trying Python fallback"
        WIZ_PASSWORD_HASH=""
    fi
fi
if [[ -z "$WIZ_PASSWORD_HASH" ]] && command -v python3 &>/dev/null; then
    WIZ_PASSWORD_HASH=$(python3 -c "import bcrypt; print(bcrypt.hashpw(b'${wiz_password}', bcrypt.gensalt(10)).decode())" 2>/dev/null || echo "")
fi
if [[ -z "$WIZ_PASSWORD_HASH" ]]; then
    log_warn "Could not hash password — dashboard will prompt on first visit"
fi

BUG 6.5 — P1: grep without -F for literal path matching

File: scripts/docker-setup.sh line ~1255 (in install_filebrowser) Impact: Paths like /mnt/data[1] or /mnt/test.dir are treated as regex patterns in grep, causing false matches or missed matches.

Fix:

# Change:
if ! echo "$volume_lines" | grep -q "${WIZ_SYSTEM_DATA_PATH}"; then
# To:
if ! echo "$volume_lines" | grep -qF "${WIZ_SYSTEM_DATA_PATH}"; then

BUG 6.6 — P2: Spaces in mount point names not quoted in YAML

File: scripts/docker-setup.sh lines ~1238-1248 (in install_filebrowser) Impact: Mount points with spaces (e.g., /mnt/my drive/) produce invalid YAML volume mappings.

Fix: Quote the path in volume_lines:

# Change:
volume_lines+="      - ${mp%/}:/srv/${name}"$'\n'
# To:
volume_lines+="      - \"${mp%/}:/srv/${name}\""$'\n'

BUG 6.7 — P2: No post-wizard validation of required fields

File: scripts/docker-setup.sh (end of run_config_wizard(), before YAML generation) Impact: User pressing Enter through all prompts without CLI flags generates config with dummy defaults like demo-felhom and homeserver.local. These produce a non-functional deployment with no clear error.

Fix: Add validation before writing controller.yaml:

# After all read prompts, before writing YAML:
if [[ -z "$WIZ_CUSTOMER_ID" ]] || [[ "$WIZ_CUSTOMER_ID" == "demo-felhom" ]]; then
    log_error "Customer ID is required (not the default 'demo-felhom')"
    exit 1
fi
if [[ -z "$WIZ_DOMAIN" ]] || [[ "$WIZ_DOMAIN" == "homeserver.local" ]]; then
    log_error "A real domain is required (not 'homeserver.local')"
    exit 1
fi

AREA 7: Template — restore.html (Lower Priority)

BUG 7.1 — P2: Error messages inserted via innerHTML without escaping

File: controller/internal/web/templates/restore.html line ~286 Impact: Error messages from RestoreAppFromBackup are concatenated into HTML via innerHTML. While these errors come from Go code (not user input), it's a bad practice. An error containing < would break the HTML rendering.

Fix: Use textContent for the error portion:

// Instead of:
html += ' <span style="font-size:.8rem;color:var(--danger)">(' + app.error.substring(0, 60) + ')</span>';

// Do:
// After setting cell.innerHTML for the safe part, create the error span with textContent:
if (app.error) {
    var errSpan = document.createElement('span');
    errSpan.style.cssText = 'font-size:.8rem;color:var(--danger)';
    errSpan.textContent = ' (' + app.error.substring(0, 60) + ')';
    cell.querySelector('.status-' + app.status).appendChild(errSpan);
}

BUG 7.2 — P3: Polling continues silently on network errors

File: controller/internal/web/templates/restore.html lines ~258-274 Impact: If the server restarts during a restore, the .catch(function() {}) silently swallows all errors and polling continues every 2 seconds forever. User sees a frozen UI with no error indication.

Fix: Add error handling and retry limit:

var pollErrors = 0;
function pollStatus() {
    fetch('/api/restore/status')
        .then(function(resp) {
            if (!resp.ok) throw new Error('HTTP ' + resp.status);
            return resp.json();
        })
        .then(function(data) {
            pollErrors = 0;  // reset on success
            if (!data.ok) return;
            updateTable(data.apps || []);
            updateProgress(data.apps || []);
            if (data.status === 'done') {
                clearInterval(polling);
                polling = null;
                planStatus = 'done';
                updateActions();
            }
        })
        .catch(function(err) {
            pollErrors++;
            console.error('Poll error:', err);
            if (pollErrors >= 10) {
                clearInterval(polling);
                polling = null;
                // Show error to user
                var actions = document.getElementById('dr-actions');
                if (actions) {
                    actions.innerHTML = '<p style="color:var(--danger)">Kapcsolat megszakadt. <a href="/restore">Oldal frissítése</a></p>';
                }
            }
        });
}

Summary Table

ID Priority Area File Line(s) Description
1.1 P0 Handler handler_restore.go 14-28 restorePageHandler unlocked reads → nil panic
1.2 P0 Handler handler_restore.go 35-42 apiRestoreStatus unlocked reads → nil panic
1.3 P0 Handler handler_restore.go 46-63 apiRestoreAll double-restore race + unlocked writes
1.4 P0 Handler handler_restore.go 66-70 apiRestoreSkip unlocked read → nil panic
1.5 P1 Handler handler_restore.go 82-125 executeAllRestores unlocked reads/writes
1.6 P2 Handler handler_restore.go 117-124 No-op goroutine (dead code)
2.1 P0 Scan restore_scan.go 234-237 dirIsEmpty returns true on read error
2.2 P1 Scan restore_scan.go 71-80 Snapshot() shallow copy of slices
3.1 P0 Infra infra_backup.go 55-67 Silent file read failures in BuildInfraBackup
3.2 P2 Infra infra_backup.go 70-72 CrossDrivePassword encoding inconsistency
4.1 P1 Main main.go 80-89 Cross-drive password lost on settings reload
4.2 P1 Main main.go 124-127 No nil check after ScanDrivesForBackups
4.3 P2 Main main.go 624 os.MkdirAll error ignored
4.4 P3 Main main.go 652-653 Missing directory creation for settings
5.1 P2 Hub store.go 126+ Unchecked json.Unmarshal errors
5.2 P2 Hub store.go 450 GetInfraBackupMeta swallows error
5.3 P3 Hub handler.go 88+ No customer_id validation
6.1 P0 Script docker-setup.sh ~1297 YAML networks: indentation (verify)
6.2 P0 Script docker-setup.sh ~1456 Wizard prompts in --dry-run mode
6.3 P1 Script docker-setup.sh ~1055 Unquoted CF tunnel token
6.4 P1 Script docker-setup.sh ~1514 htpasswd hash extraction fragile
6.5 P1 Script docker-setup.sh ~1255 grep without -F for paths
6.6 P2 Script docker-setup.sh ~1238 Spaces in mount paths unquoted
6.7 P2 Script docker-setup.sh wizard No post-wizard field validation
7.1 P2 Template restore.html ~286 innerHTML with unescaped error text
7.2 P3 Template restore.html ~258 Silent polling failure

New methods needed in restore_scan.go: TryStartRestore() bool, SetStatus(string), GetStatus() string

Recommended fix order:

  1. Bugs 1.1-1.4, 2.1 (P0 Go — crash/data-loss prevention)
  2. Bugs 3.1, 6.2 (P0 — silent data loss, broken dry-run)
  3. Bug 6.1 (P0 — verify YAML, fix if needed)
  4. Bugs 1.5, 2.2, 4.1, 4.2 (P1 Go — race conditions, logic)
  5. Bugs 6.3-6.5 (P1 Script — security, correctness)
  6. Everything else (P2-P3)