32 KiB
Bug Hunt Report — DR Implementation (TASK.md + TASK2.md)
Date: 2026-02-19 Scope: All code implemented in TASK.md (docker-setup.sh v5.0) and TASK2.md (DR phases 1-4) Purpose: Detailed fix instructions for Sonnet to implement
Priority Legend
| Priority | Meaning |
|---|---|
| P0 | Must fix before production — crashes, data loss, security |
| P1 | Should fix before production — logic errors, race conditions |
| P2 | Fix soon — defensive improvements, consistency |
| P3 | Nice to have — code quality, edge cases |
AREA 1: Controller — handler_restore.go (Race Conditions)
All restore handlers read s.restorePlan without holding s.restoreMu, causing nil pointer panics and double-restore races. This is the single most critical area.
BUG 1.1 — P0: restorePageHandler reads restorePlan without lock
File: controller/internal/web/handler_restore.go lines 14-28
Impact: Nil pointer dereference panic if clearRestoreMode() runs between nil-check (line 14) and field access (lines 24-28).
Current code:
func (s *Server) restorePageHandler(w http.ResponseWriter, r *http.Request) {
if s.restorePlan == nil { // line 14: UNLOCKED
http.Redirect(w, r, "/", http.StatusFound)
return
}
data := map[string]interface{}{
"CustomerID": s.restorePlan.CustomerID, // line 24: UNLOCKED
"Timestamp": s.restorePlan.Timestamp, // line 25
"Apps": s.restorePlan.GetApps(), // line 26
"Drives": s.restorePlan.Drives, // line 27: direct ref
"PlanStatus": s.restorePlan.Status, // line 28: UNLOCKED
}
Fix: Hold restoreMu.RLock() from the nil-check through all field reads. Copy everything under lock, then release before rendering:
func (s *Server) restorePageHandler(w http.ResponseWriter, r *http.Request) {
s.restoreMu.RLock()
plan := s.restorePlan
if plan == nil {
s.restoreMu.RUnlock()
http.Redirect(w, r, "/", http.StatusFound)
return
}
// Snapshot all needed fields under lock
customerID := plan.CustomerID
timestamp := plan.Timestamp
apps := plan.GetApps()
drives := make([]backup.DriveInfo, len(plan.Drives))
copy(drives, plan.Drives)
status := plan.Status
s.restoreMu.RUnlock()
data := map[string]interface{}{
"Title": "Katasztrófa utáni visszaállítás",
"CustomerName": s.cfg.Customer.Name,
"Domain": s.cfg.Customer.Domain,
"Version": s.version,
"CustomerID": customerID,
"Timestamp": timestamp,
"Apps": apps,
"Drives": drives,
"PlanStatus": status,
}
s.render(w, "restore", data)
}
Note: plan.GetApps() already holds its own internal rp.mu.RLock() — this is fine, nested RLocks are safe. However, plan.Status and plan.Drives need the outer restoreMu to prevent nil dereference.
BUG 1.2 — P0: apiRestoreStatus reads restorePlan without lock
File: controller/internal/web/handler_restore.go lines 35-42
Impact: Nil pointer panic if clearRestoreMode() sets s.restorePlan = nil between lines 36 and 42.
Fix: Same pattern — hold restoreMu.RLock:
func (s *Server) apiRestoreStatus(w http.ResponseWriter, r *http.Request) {
s.restoreMu.RLock()
plan := s.restorePlan
if plan == nil {
s.restoreMu.RUnlock()
jsonError(w, "not in restore mode", http.StatusBadRequest)
return
}
snapshot := plan.Snapshot()
s.restoreMu.RUnlock()
w.Header().Set("Content-Type", "application/json; charset=utf-8")
json.NewEncoder(w).Encode(snapshot)
}
BUG 1.3 — P0: apiRestoreAll has double-restore race + unlocked writes
File: controller/internal/web/handler_restore.go lines 46-63
Impact: Two concurrent POST requests both pass the Status == "restoring" check and spawn two parallel goroutines restoring the same apps, causing data corruption.
Lines 51 and 56 read/write s.restorePlan.Status without any lock.
Fix: Hold restoreMu.RLock for the nil check, then use the plan's internal mu.Lock for the status check-and-set:
func (s *Server) apiRestoreAll(w http.ResponseWriter, r *http.Request) {
s.restoreMu.RLock()
plan := s.restorePlan
s.restoreMu.RUnlock()
if plan == nil {
jsonError(w, "not in restore mode", http.StatusBadRequest)
return
}
// Atomic check-and-set under plan's internal lock
plan.mu.Lock()
if plan.Status == "restoring" {
plan.mu.Unlock()
jsonError(w, "restore already in progress", http.StatusConflict)
return
}
plan.Status = "restoring"
plan.mu.Unlock()
go s.executeAllRestores()
jsonResponse(w, map[string]interface{}{
"ok": true,
"message": "Visszaállítás elindítva",
})
}
Important: This requires RestorePlan.mu to be exported OR adding a helper method. The simplest approach: add a SetStatus(status string) bool method to RestorePlan that returns false if already "restoring":
// Add to restore_scan.go:
// TryStartRestore atomically sets status to "restoring" if currently "pending".
// Returns false if already restoring.
func (rp *RestorePlan) TryStartRestore() bool {
rp.mu.Lock()
defer rp.mu.Unlock()
if rp.Status == "restoring" {
return false
}
rp.Status = "restoring"
return true
}
// SetStatus sets the overall plan status.
func (rp *RestorePlan) SetStatus(status string) {
rp.mu.Lock()
defer rp.mu.Unlock()
rp.Status = status
}
// GetStatus returns the current plan status.
func (rp *RestorePlan) GetStatus() string {
rp.mu.RLock()
defer rp.mu.RUnlock()
return rp.Status
}
Then apiRestoreAll becomes:
func (s *Server) apiRestoreAll(w http.ResponseWriter, r *http.Request) {
s.restoreMu.RLock()
plan := s.restorePlan
s.restoreMu.RUnlock()
if plan == nil {
jsonError(w, "not in restore mode", http.StatusBadRequest)
return
}
if !plan.TryStartRestore() {
jsonError(w, "restore already in progress", http.StatusConflict)
return
}
go s.executeAllRestores()
jsonResponse(w, map[string]interface{}{
"ok": true,
"message": "Visszaállítás elindítva",
})
}
BUG 1.4 — P0: apiRestoreSkip reads restorePlan without lock
File: controller/internal/web/handler_restore.go lines 66-70
Impact: Same nil pointer race as other handlers.
Fix:
func (s *Server) apiRestoreSkip(w http.ResponseWriter, r *http.Request) {
s.restoreMu.RLock()
plan := s.restorePlan
s.restoreMu.RUnlock()
if plan == nil {
jsonError(w, "not in restore mode", http.StatusBadRequest)
return
}
s.logger.Println("[INFO] User skipped DR restore — entering normal mode")
s.clearRestoreMode()
jsonResponse(w, map[string]interface{}{
"ok": true,
"message": "Visszaállítás kihagyva",
})
}
BUG 1.5 — P1: executeAllRestores writes Status/Apps without lock
File: controller/internal/web/handler_restore.go lines 82-125
Impact:
- Line 85:
s.restorePlan.Appsaccessed without lock (could be nil if cleared) - Line 86:
&s.restorePlan.Apps[i]— pointer into slice that could be invalidated - Line 107:
s.restorePlan.Status = "done"— unlocked write
Fix: Use the helper methods from BUG 1.3. Also guard against restorePlan being nil:
func (s *Server) executeAllRestores() {
s.logger.Println("[INFO] Starting DR restore for all apps")
s.restoreMu.RLock()
plan := s.restorePlan
s.restoreMu.RUnlock()
if plan == nil {
s.logger.Println("[WARN] Restore plan cleared before execution could start")
return
}
for i := range plan.Apps {
app := &plan.Apps[i]
if app.Status != "pending" {
continue
}
plan.UpdateApp(app.Name, "restoring", "")
s.logger.Printf("[INFO] Restoring app %s (%s)", app.Name, app.DisplayName)
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Minute)
err := backup.RestoreAppFromBackup(ctx, app, s.cfg.Paths.StacksDir, s.logger)
cancel()
if err != nil {
plan.UpdateApp(app.Name, "failed", err.Error())
s.logger.Printf("[ERROR] Restore failed for %s: %v", app.Name, err)
} else {
plan.UpdateApp(app.Name, "done", "")
s.logger.Printf("[INFO] Restore completed for %s", app.Name)
}
}
plan.SetStatus("done") // <-- use new helper instead of direct write
s.logger.Println("[INFO] All app restores completed")
// Re-scan stacks so dashboard picks up restored apps
if s.stackMgr != nil {
if err := s.stackMgr.ScanStacks(); err != nil {
s.logger.Printf("[WARN] Post-restore stack scan failed: %v", err)
}
}
}
Also delete the no-op goroutine at lines 117-124 (see BUG 1.6).
BUG 1.6 — P2: No-op auto-clear goroutine (dead code)
File: controller/internal/web/handler_restore.go lines 117-124
Impact: Spawns a goroutine that sleeps 5s and does nothing. Confusing code.
Fix: Delete lines 117-124 entirely. The user clicks "continue to dashboard" to clear restore mode — this is already implemented and correct.
AREA 2: Controller — restore_scan.go
BUG 2.1 — P0: dirIsEmpty() returns true on read errors
File: controller/internal/backup/restore_scan.go lines 234-237
Impact: If os.ReadDir() fails (permission denied, I/O error), dirIsEmpty() returns true because err != nil satisfies the condition. This causes the scan to skip valid DB dump directories that are merely unreadable, silently losing backup data.
Current code:
func dirIsEmpty(path string) bool {
entries, err := os.ReadDir(path)
return err != nil || len(entries) == 0
}
Fix: Only return true for genuinely empty directories, return false on errors (assume not empty):
func dirIsEmpty(path string) bool {
entries, err := os.ReadDir(path)
if err != nil {
return false // assume non-empty on error — safer for backup detection
}
return len(entries) == 0
}
BUG 2.2 — P1: Snapshot() returns direct slice references (shallow copy)
File: controller/internal/backup/restore_scan.go lines 71-80
Impact: The returned map holds direct references to rp.Apps and rp.Drives. While the RLock is held during json.Encode, if the encoding is ever separated from the lock (e.g., by future code changes), modifications to Apps/Drives would corrupt the snapshot. Inconsistent with GetApps() which properly copies.
Fix: Make defensive copies like GetApps() does:
func (rp *RestorePlan) Snapshot() map[string]interface{} {
rp.mu.RLock()
defer rp.mu.RUnlock()
apps := make([]RestorableApp, len(rp.Apps))
copy(apps, rp.Apps)
drives := make([]DriveInfo, len(rp.Drives))
copy(drives, rp.Drives)
return map[string]interface{}{
"ok": true,
"status": rp.Status,
"apps": apps,
"drives": drives,
}
}
AREA 3: Controller — infra_backup.go (Silent Failures)
BUG 3.1 — P0: BuildInfraBackup silently drops file read errors
File: controller/internal/report/infra_backup.go lines 55-67
Impact: If controller.yaml, settings.json, or the restic password file can't be read (permissions, moved, disk error), the if err == nil pattern silently stores an empty string. The Hub receives an infra backup with missing critical data, and disaster recovery will fail silently (no config, no settings, no password → broken restore).
Current code:
if data, err := os.ReadFile(controllerYAMLPath); err == nil {
ib.ControllerConfigB64 = base64.StdEncoding.EncodeToString(data)
}
// same for settings.json and restic password
Fix: Log warnings for missing files so the operator knows the backup is incomplete. Return error for controller.yaml (truly critical):
// Read and encode controller.yaml (critical — fail if unreadable)
data, err := os.ReadFile(controllerYAMLPath)
if err != nil {
return nil, fmt.Errorf("reading controller config %s: %w", controllerYAMLPath, err)
}
ib.ControllerConfigB64 = base64.StdEncoding.EncodeToString(data)
// Read and encode settings.json (important but non-fatal)
if data, err := os.ReadFile(settingsPath); err == nil {
ib.SettingsJSONB64 = base64.StdEncoding.EncodeToString(data)
} else if !os.IsNotExist(err) {
// Log warning: file exists but unreadable
// (Note: BuildInfraBackup doesn't have a logger param — either add one,
// or add a Warnings []string field to InfraBackup, or wrap the error)
}
// Read primary restic password (important but non-fatal)
if data, err := os.ReadFile(resticPasswordFile); err == nil {
ib.ResticPassword = base64.StdEncoding.EncodeToString(data)
} else if !os.IsNotExist(err) {
// Same: log or accumulate warning
}
Simplest approach: Add a logger *log.Logger parameter to BuildInfraBackup (already passed in the caller pushInfraBackup in main.go, so just thread it through). Then log warnings:
func BuildInfraBackup(
customerID, domain, version string,
controllerYAMLPath string,
settingsPath string,
resticPasswordFile string,
systemDataPath string,
sett *settings.Settings,
stackProvider backup.StackDataProvider,
logger *log.Logger, // <-- ADD THIS
) (*InfraBackup, error) {
In pushInfraBackup in main.go, update the call to pass logger.
BUG 3.2 — P2: CrossDrivePassword encoding inconsistency
File: controller/internal/report/infra_backup.go lines 70-72
Impact: ResticPassword is base64-encoded (line 66) but CrossDrivePassword is stored raw (line 71). While the restore code in main.go handles both correctly (decodes ResticPassword from base64, uses CrossDrivePassword as-is), this inconsistency could cause bugs if someone adds base64 decoding to CrossDrivePassword in the future.
Fix (optional — low priority): Add a comment explaining the asymmetry:
// Cross-drive password is stored as plain text (not base64) because it's
// already a string in settings, unlike ResticPassword which comes from a file.
if pw := sett.GetCrossDriveResticPassword(); pw != "" {
ib.CrossDrivePassword = pw
}
AREA 4: Controller — main.go DR Wiring
BUG 4.1 — P1: Cross-drive password may be lost on settings reload
File: controller/cmd/controller/main.go lines 80-89
Impact: restorePasswordsFromHub() sets the cross-drive password on sett (line 80), which also saves it to settings.json. Then restoreSettingsFromHub() (line 83) overwrites settings.json with the Hub backup version. Then sett is replaced with a fresh load (line 87). The password is preserved IF the Hub backup's settings.json already contains it. But if the explicit ib.CrossDrivePassword field was updated after the settings.json was last saved, the password is lost.
Fix: Move password restoration AFTER settings reload:
// Restore settings.json from Hub backup (do this FIRST)
restoreSettingsFromHub(ib, cfg, logger)
// Re-load settings (now from restored file)
if restoredSett, loadErr := settings.Load(settingsPath, logger); loadErr == nil {
sett = restoredSett
logger.Println("[INFO] Settings reloaded after Hub restore")
}
// Restore restic passwords (AFTER settings reload so cross-drive password persists)
restorePasswordsFromHub(ib, cfg, sett, logger)
BUG 4.2 — P1: No nil check after ScanDrivesForBackups
File: controller/cmd/controller/main.go lines 124-127
Impact: If ScanDrivesForBackups() were to return nil (currently it doesn't, but it could in a future change), lines 125-127 would panic with nil pointer dereference.
Fix: Add nil guard:
restorePlan = backup.ScanDrivesForBackups(drivePaths, infraStacks, logger)
if restorePlan != nil {
restorePlan.CustomerID = ib.CustomerID
restorePlan.Domain = ib.Domain
restorePlan.Timestamp = ib.Timestamp
logger.Printf("[INFO] DR restore plan ready: %d apps to restore", len(restorePlan.Apps))
} else {
logger.Println("[WARN] ScanDrivesForBackups returned nil — no restore plan created")
}
BUG 4.3 — P2: os.MkdirAll error ignored for restic password dir
File: controller/cmd/controller/main.go line 624
Impact: If directory creation fails, the subsequent WriteFile fails with a confusing "no such file or directory" error.
Fix:
dir := filepath.Dir(cfg.Backup.ResticPasswordFile)
if err := os.MkdirAll(dir, 0700); err != nil {
logger.Printf("[WARN] Failed to create restic password directory %s: %v", dir, err)
} else if err := os.WriteFile(cfg.Backup.ResticPasswordFile, decoded, 0600); err == nil {
logger.Println("[INFO] Primary restic password restored from Hub")
} else {
logger.Printf("[WARN] Failed to write restic password file: %v", err)
}
BUG 4.4 — P3: restoreSettingsFromHub doesn't ensure data directory exists
File: controller/cmd/controller/main.go lines 652-653
Impact: On a truly fresh deployment, if cfg.Paths.DataDir doesn't exist yet, WriteFile will fail.
Fix: Add os.MkdirAll before write:
func restoreSettingsFromHub(ib *report.InfraBackup, cfg *config.Config, logger *log.Logger) {
if ib.SettingsJSONB64 == "" {
return
}
decoded, err := base64.StdEncoding.DecodeString(ib.SettingsJSONB64)
if err != nil {
logger.Printf("[WARN] Failed to decode settings from Hub: %v", err)
return
}
if err := os.MkdirAll(cfg.Paths.DataDir, 0755); err != nil {
logger.Printf("[WARN] Failed to create data directory: %v", err)
return
}
settingsPath := filepath.Join(cfg.Paths.DataDir, "settings.json")
// ... rest unchanged
AREA 5: Hub — store.go and handler.go
BUG 5.1 — P2: Unchecked json.Unmarshal errors throughout store.go
File: hub/internal/store/store.go lines 126, 217, 286, 336, 502
Impact: If stored JSON is corrupt, json.Unmarshal errors are silently ignored. Functions proceed with zero-valued structs, returning incorrect data.
Affected locations:
- Line 126:
json.Unmarshal([]byte(eventsJSON), &events)in GetNotificationPrefs - Line 217:
json.Unmarshal(reportJSON, &parsed)in SaveReport - Line 286:
json.Unmarshal([]byte(c.ReportJSON), &report)in GetCustomers - Line 336:
json.Unmarshal([]byte(c.ReportJSON), &report)in GetCustomer - Line 502:
json.Unmarshal([]byte(reportJSON), &report)in parseDiskSummary
Fix for each: Add error logging (not fatal — corrupted data shouldn't crash the hub):
Line 126:
if err := json.Unmarshal([]byte(eventsJSON), &events); err != nil {
s.logger.Printf("[WARN] Corrupt enabled_events JSON for %s: %v", customerID, err)
}
Line 217:
if err := json.Unmarshal(reportJSON, &parsed); err != nil {
s.logger.Printf("[WARN] Cannot parse report fields for denormalization: %v", err)
}
Lines 286, 336, 502: Same pattern — log warning, continue with defaults.
Note for Sonnet: parseDiskSummary (line 502) doesn't have access to a logger. Either add a logger parameter or just add // ignore parse errors — show "–" comment. The function already returns "–" on failure.
BUG 5.2 — P2: GetInfraBackupMeta swallows unmarshal error
File: hub/internal/store/store.go line 450
Impact: If the stored infra backup JSON is corrupt, the function returns a meta struct with StackCount=0, DiskCount=0 — misleading the dashboard into showing "no stacks, no disks" instead of an error.
Current code:
if json.Unmarshal([]byte(backupJSON), &parsed) == nil {
Fix: Log on unmarshal failure:
if err := json.Unmarshal([]byte(backupJSON), &parsed); err != nil {
s.logger.Printf("[WARN] Failed to parse infra backup metadata for %s: %v", customerID, err)
} else {
meta.StackCount = len(parsed.DeployedStacks)
meta.DiskCount = len(parsed.DiskLayout.Mounts)
}
BUG 5.3 — P3: No customer_id content validation on ingest endpoints
File: hub/internal/api/handler.go lines 88, 226, 313, 348
Impact: Customer IDs with special characters, extremely long strings, or SQL injection attempts are accepted and stored. While Go's SQLite driver uses parameterized queries (safe from SQL injection), excessively long IDs waste storage and could be used in denial-of-service.
Fix (optional): Add a simple validation helper and use it in all handlers:
func isValidCustomerID(id string) bool {
if len(id) > 128 || len(id) == 0 {
return false
}
for _, c := range id {
if !((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || (c >= '0' && c <= '9') || c == '-' || c == '_' || c == '.') {
return false
}
}
return true
}
AREA 6: docker-setup.sh
BUG 6.1 — P0: YAML syntax error — networks: on same line as last volume
File: scripts/docker-setup.sh line ~1297 (in install_filebrowser)
Impact: When volume_lines is populated (drives discovered), the networks: key ends up on the same line as the last volume entry, producing invalid YAML. Docker Compose fails → FileBrowser doesn't deploy.
The heredoc line likely looks like:
${volume_lines} networks:
When volume_lines has content ending with a newline, the output is correct. However, if volume_lines is empty (no drives found), the output becomes:
volumes:
- filebrowser_data:/home/filebrowser/data
networks:
which IS correct. And when volume_lines has entries, each ends with $'\n', so it also works.
Verification needed: Read the actual heredoc to confirm whether this is a real bug or false alarm. The agent may have been wrong about the indentation.
Action: Sonnet should read the install_filebrowser function and verify the heredoc YAML output is correct by checking that volume_lines entries end with proper newlines and the networks: key starts on a new line. If it doesn't, add $'\n' before networks:.
BUG 6.2 — P0: Wizard prompts user even in --dry-run mode
File: scripts/docker-setup.sh — run_config_wizard() function (around line 1456)
Impact: The DRY_RUN check is at the END of the wizard, after all interactive read prompts. User runs --dry-run expecting a non-interactive preview but gets prompted for customer ID, domain, password, etc.
Also, after the wizard returns in dry-run, WIZ_* variables are empty, so downstream functions (install_cloudflare_tunnel, install_filebrowser, install_controller) see empty values and skip their dry-run output.
Fix: Move the if [[ "$DRY_RUN" == true ]] check to the TOP of run_config_wizard(), right after the step log line and the banner. Set dummy wizard values for downstream functions:
run_config_wizard() {
local step_num=5
[[ "$SELF_SIGNED_CERT" == true ]] && ((step_num++))
log_step "${step_num}/$(get_total_steps) - Running configuration wizard..."
echo ""
echo -e "${BOLD}${CYAN}===========================================================${NC}"
echo -e "${BOLD}${CYAN} Felhom Controller Configuration Wizard${NC}"
echo -e "${BOLD}${CYAN}===========================================================${NC}"
echo ""
if [[ "$DRY_RUN" == true ]]; then
echo -e "${CYAN}[DRY-RUN]${NC} Would run interactive wizard and generate controller.yaml"
# Set dummy values so downstream functions produce correct dry-run output
WIZ_CUSTOMER_ID="${CUSTOMER_ID:-demo-felhom}"
WIZ_CUSTOMER_NAME="${WIZ_CUSTOMER_ID}"
WIZ_DOMAIN="${BASE_DOMAIN:-homeserver.local}"
WIZ_EMAIL="${ACME_EMAIL:-admin@example.com}"
WIZ_CF_TUNNEL_TOKEN="" # Empty = no tunnel in dry-run
WIZ_CF_API_TOKEN="${CF_DNS_API_TOKEN:-}"
WIZ_SYSTEM_DATA_PATH="/mnt/sys_drive"
WIZ_PASSWORD_HASH='<would-be-generated>'
WIZ_SESSION_SECRET='<would-be-generated>'
return
fi
# ... rest of interactive prompts unchanged
Also: Remove the old DRY_RUN check at the end of the function (around line 1553-1557).
BUG 6.3 — P1: Unquoted Cloudflare Tunnel token in docker-compose env
File: scripts/docker-setup.sh line ~1055 (in install_cloudflare_tunnel)
Impact: Tokens containing $, backticks, or other shell/Docker special characters will be misinterpreted.
Fix: Quote the value:
# Change:
- TUNNEL_TOKEN=${WIZ_CF_TUNNEL_TOKEN}
# To:
- TUNNEL_TOKEN="${WIZ_CF_TUNNEL_TOKEN}"
BUG 6.4 — P1: htpasswd hash extraction with tr -d ':\n' may include junk
File: scripts/docker-setup.sh line ~1514
Impact: htpasswd -bnBC 10 "" "$password" outputs :<hash>\n. The tr -d ':\n' removes ALL colons and newlines, which works. But if the htpasswd command fails silently, the hash is empty and no error is raised.
Fix: Use cut -d: -f2 instead of tr -d for cleaner extraction, and validate the result:
if command -v htpasswd &>/dev/null; then
WIZ_PASSWORD_HASH=$(htpasswd -bnBC 10 "" "$wiz_password" 2>/dev/null | cut -d: -f2)
if [[ ! "$WIZ_PASSWORD_HASH" =~ ^\$2[aby]\$ ]]; then
log_warn "htpasswd failed — trying Python fallback"
WIZ_PASSWORD_HASH=""
fi
fi
if [[ -z "$WIZ_PASSWORD_HASH" ]] && command -v python3 &>/dev/null; then
WIZ_PASSWORD_HASH=$(python3 -c "import bcrypt; print(bcrypt.hashpw(b'${wiz_password}', bcrypt.gensalt(10)).decode())" 2>/dev/null || echo "")
fi
if [[ -z "$WIZ_PASSWORD_HASH" ]]; then
log_warn "Could not hash password — dashboard will prompt on first visit"
fi
BUG 6.5 — P1: grep without -F for literal path matching
File: scripts/docker-setup.sh line ~1255 (in install_filebrowser)
Impact: Paths like /mnt/data[1] or /mnt/test.dir are treated as regex patterns in grep, causing false matches or missed matches.
Fix:
# Change:
if ! echo "$volume_lines" | grep -q "${WIZ_SYSTEM_DATA_PATH}"; then
# To:
if ! echo "$volume_lines" | grep -qF "${WIZ_SYSTEM_DATA_PATH}"; then
BUG 6.6 — P2: Spaces in mount point names not quoted in YAML
File: scripts/docker-setup.sh lines ~1238-1248 (in install_filebrowser)
Impact: Mount points with spaces (e.g., /mnt/my drive/) produce invalid YAML volume mappings.
Fix: Quote the path in volume_lines:
# Change:
volume_lines+=" - ${mp%/}:/srv/${name}"$'\n'
# To:
volume_lines+=" - \"${mp%/}:/srv/${name}\""$'\n'
BUG 6.7 — P2: No post-wizard validation of required fields
File: scripts/docker-setup.sh (end of run_config_wizard(), before YAML generation)
Impact: User pressing Enter through all prompts without CLI flags generates config with dummy defaults like demo-felhom and homeserver.local. These produce a non-functional deployment with no clear error.
Fix: Add validation before writing controller.yaml:
# After all read prompts, before writing YAML:
if [[ -z "$WIZ_CUSTOMER_ID" ]] || [[ "$WIZ_CUSTOMER_ID" == "demo-felhom" ]]; then
log_error "Customer ID is required (not the default 'demo-felhom')"
exit 1
fi
if [[ -z "$WIZ_DOMAIN" ]] || [[ "$WIZ_DOMAIN" == "homeserver.local" ]]; then
log_error "A real domain is required (not 'homeserver.local')"
exit 1
fi
AREA 7: Template — restore.html (Lower Priority)
BUG 7.1 — P2: Error messages inserted via innerHTML without escaping
File: controller/internal/web/templates/restore.html line ~286
Impact: Error messages from RestoreAppFromBackup are concatenated into HTML via innerHTML. While these errors come from Go code (not user input), it's a bad practice. An error containing < would break the HTML rendering.
Fix: Use textContent for the error portion:
// Instead of:
html += ' <span style="font-size:.8rem;color:var(--danger)">(' + app.error.substring(0, 60) + ')</span>';
// Do:
// After setting cell.innerHTML for the safe part, create the error span with textContent:
if (app.error) {
var errSpan = document.createElement('span');
errSpan.style.cssText = 'font-size:.8rem;color:var(--danger)';
errSpan.textContent = ' (' + app.error.substring(0, 60) + ')';
cell.querySelector('.status-' + app.status).appendChild(errSpan);
}
BUG 7.2 — P3: Polling continues silently on network errors
File: controller/internal/web/templates/restore.html lines ~258-274
Impact: If the server restarts during a restore, the .catch(function() {}) silently swallows all errors and polling continues every 2 seconds forever. User sees a frozen UI with no error indication.
Fix: Add error handling and retry limit:
var pollErrors = 0;
function pollStatus() {
fetch('/api/restore/status')
.then(function(resp) {
if (!resp.ok) throw new Error('HTTP ' + resp.status);
return resp.json();
})
.then(function(data) {
pollErrors = 0; // reset on success
if (!data.ok) return;
updateTable(data.apps || []);
updateProgress(data.apps || []);
if (data.status === 'done') {
clearInterval(polling);
polling = null;
planStatus = 'done';
updateActions();
}
})
.catch(function(err) {
pollErrors++;
console.error('Poll error:', err);
if (pollErrors >= 10) {
clearInterval(polling);
polling = null;
// Show error to user
var actions = document.getElementById('dr-actions');
if (actions) {
actions.innerHTML = '<p style="color:var(--danger)">Kapcsolat megszakadt. <a href="/restore">Oldal frissítése</a></p>';
}
}
});
}
Summary Table
| ID | Priority | Area | File | Line(s) | Description |
|---|---|---|---|---|---|
| 1.1 | P0 | Handler | handler_restore.go | 14-28 | restorePageHandler unlocked reads → nil panic |
| 1.2 | P0 | Handler | handler_restore.go | 35-42 | apiRestoreStatus unlocked reads → nil panic |
| 1.3 | P0 | Handler | handler_restore.go | 46-63 | apiRestoreAll double-restore race + unlocked writes |
| 1.4 | P0 | Handler | handler_restore.go | 66-70 | apiRestoreSkip unlocked read → nil panic |
| 1.5 | P1 | Handler | handler_restore.go | 82-125 | executeAllRestores unlocked reads/writes |
| 1.6 | P2 | Handler | handler_restore.go | 117-124 | No-op goroutine (dead code) |
| 2.1 | P0 | Scan | restore_scan.go | 234-237 | dirIsEmpty returns true on read error |
| 2.2 | P1 | Scan | restore_scan.go | 71-80 | Snapshot() shallow copy of slices |
| 3.1 | P0 | Infra | infra_backup.go | 55-67 | Silent file read failures in BuildInfraBackup |
| 3.2 | P2 | Infra | infra_backup.go | 70-72 | CrossDrivePassword encoding inconsistency |
| 4.1 | P1 | Main | main.go | 80-89 | Cross-drive password lost on settings reload |
| 4.2 | P1 | Main | main.go | 124-127 | No nil check after ScanDrivesForBackups |
| 4.3 | P2 | Main | main.go | 624 | os.MkdirAll error ignored |
| 4.4 | P3 | Main | main.go | 652-653 | Missing directory creation for settings |
| 5.1 | P2 | Hub | store.go | 126+ | Unchecked json.Unmarshal errors |
| 5.2 | P2 | Hub | store.go | 450 | GetInfraBackupMeta swallows error |
| 5.3 | P3 | Hub | handler.go | 88+ | No customer_id validation |
| 6.1 | P0 | Script | docker-setup.sh | ~1297 | YAML networks: indentation (verify) |
| 6.2 | P0 | Script | docker-setup.sh | ~1456 | Wizard prompts in --dry-run mode |
| 6.3 | P1 | Script | docker-setup.sh | ~1055 | Unquoted CF tunnel token |
| 6.4 | P1 | Script | docker-setup.sh | ~1514 | htpasswd hash extraction fragile |
| 6.5 | P1 | Script | docker-setup.sh | ~1255 | grep without -F for paths |
| 6.6 | P2 | Script | docker-setup.sh | ~1238 | Spaces in mount paths unquoted |
| 6.7 | P2 | Script | docker-setup.sh | wizard | No post-wizard field validation |
| 7.1 | P2 | Template | restore.html | ~286 | innerHTML with unescaped error text |
| 7.2 | P3 | Template | restore.html | ~258 | Silent polling failure |
New methods needed in restore_scan.go: TryStartRestore() bool, SetStatus(string), GetStatus() string
Recommended fix order:
- Bugs 1.1-1.4, 2.1 (P0 Go — crash/data-loss prevention)
- Bugs 3.1, 6.2 (P0 — silent data loss, broken dry-run)
- Bug 6.1 (P0 — verify YAML, fix if needed)
- Bugs 1.5, 2.2, 4.1, 4.2 (P1 Go — race conditions, logic)
- Bugs 6.3-6.5 (P1 Script — security, correctness)
- Everything else (P2-P3)