updated TASK.md

This commit is contained in:
2026-02-15 09:39:50 +01:00
parent c4fa121932
commit a5c6899e2c
+458 -112
View File
@@ -1,140 +1,486 @@
# TASK.md — Debug: App Info Page — YAML Parsing Failure
# TASK: Infrastructure FileBrowser + Orphan Stack Handling + Catalog Fixes
> Read CLAUDE.md first for project context, workspace layout, and build instructions.
**Priority order:** Task 1 → Task 2 → Task 3 (Task 3 is independent, can be done anytime)
## Problem
**Repositories involved:**
- `deploy-felhom-compose` — controller Go code, docker-setup.sh, hdd-setup.sh
- `app-catalog-felhom.eu` — template catalog
The `/apps/romm` info page shows NO app info content (no tagline, no use cases, no optional
config form). More importantly, even basic metadata is missing:
---
- `~ RAM` badge shows no value (should be "300M")
- Category badge is empty (should be "media")
- Tagline `<p>` is empty
## Task 1: FileBrowser Quantum as Infrastructure Service
These fields have been in `.felhom.yml` since BEFORE the `app_info` section was added. If they're
empty, it means `LoadMetadata()` is silently failing and returning defaults.
### Context
## Root cause analysis
FileBrowser Quantum (`gtstef/filebrowser:latest`) becomes a **mandatory infrastructure service**
deployed alongside Traefik, Cloudflared, and felhom-controller. It provides the customer with
permanent web-based access to their HDD data — this is critical for the orphan deletion workflow
(Task 2), where users need to retrieve data from deleted apps.
`LoadMetadata()` in `internal/stacks/metadata.go` swallows YAML parse errors:
FileBrowser is **removed from the app catalog** and instead deployed by `docker-setup.sh` during
initial server setup, just like Traefik.
### 1.1 — HDD Folder Structure Update (hdd-setup.sh)
Add new user-facing folders to the HDD folder structure arrays in `hdd-setup.sh`:
**Current structure:**
```
${HDD_PATH}/
├── media/
│ ├── downloads/complete/
│ ├── downloads/incomplete/
│ ├── movies/
│ ├── series/
│ ├── music/
│ └── books/
├── storage/
│ ├── immich/
│ ├── nextcloud/
│ ├── filebrowser/ ← REMOVE (filebrowser is now infra, doesn't need storage/)
│ ├── backups/local/
│ └── backups/appdata/
└── appdata/
```
**Updated structure:**
```
${HDD_PATH}/
├── Dokumentumok/ ← NEW: user documents (OnlyOffice, general files)
├── media/
│ ├── downloads/complete/
│ ├── downloads/incomplete/
│ ├── movies/
│ ├── series/
│ ├── music/
│ └── books/
├── storage/
│ ├── immich/
│ ├── nextcloud/
│ ├── backups/local/
│ └── backups/appdata/
└── appdata/
```
Changes to `hdd-setup.sh`:
- Remove `"storage/filebrowser"` from `STORAGE_DIRS` array
- Add `"Dokumentumok"` as a new top-level entry (add a new `USER_DIRS` array, or add to existing)
- Ownership: same 1000:1000 as other dirs
### 1.2 — FileBrowser Docker Compose (Infrastructure)
Create `/opt/docker/stacks/filebrowser/docker-compose.yml` during `docker-setup.sh` execution.
**Mount strategy — three tiers with different permissions:**
| HDD Path | Container Mount | Access | Rationale |
|---|---|---|---|
| `${HDD_PATH}/storage/` | `/srv/storage` | **read-only** | App data, prevent accidental deletion |
| `${HDD_PATH}/media/` | `/srv/media` | **read-write** | User adds movies, music, books |
| `${HDD_PATH}/Dokumentumok/` | `/srv/Dokumentumok` | **read-write** | User documents (docx, xlsx, etc.) |
**Docker Compose template:**
```yaml
# FileBrowser Quantum — Infrastructure file manager
# Domain: files.${DOMAIN}
# Deployed by docker-setup.sh — do NOT remove
#
# Mount permissions:
# /srv/storage/ → HDD storage/ (READ-ONLY — app data)
# /srv/media/ → HDD media/ (read-write — user media)
# /srv/Dokumentumok/ → HDD Dokumentumok/ (read-write — user documents)
services:
filebrowser:
image: gtstef/filebrowser:latest
container_name: filebrowser
restart: unless-stopped
environment:
- TZ=Europe/Budapest
volumes:
- filebrowser_data:/home/filebrowser/data
- ${HDD_PATH}/storage:/srv/storage:ro
- ${HDD_PATH}/media:/srv/media
- ${HDD_PATH}/Dokumentumok:/srv/Dokumentumok
networks:
- traefik-public
deploy:
resources:
limits:
memory: 256M
healthcheck:
test: ["CMD", "wget", "--spider", "-q", "http://localhost:80/health"]
interval: 30s
timeout: 5s
retries: 3
start_period: 15s
labels:
- "traefik.enable=true"
- "traefik.http.routers.filebrowser.rule=Host(`files.${DOMAIN}`)"
- "traefik.http.routers.filebrowser.entrypoints=websecure"
- "traefik.http.routers.filebrowser.tls=true"
- "traefik.http.routers.filebrowser.tls.certresolver=letsencrypt"
- "traefik.http.services.filebrowser.loadbalancer.server.port=80"
- "traefik.docker.network=traefik-public"
volumes:
filebrowser_data:
networks:
traefik-public:
external: true
```
**Default credentials:** admin / admin (user should change on first login).
**NOTE:** The healthcheck endpoint `/health` should be verified against FileBrowser Quantum docs.
If it doesn't exist, fall back to `wget --spider -q http://localhost:80/`.
### 1.3 — Deploy in docker-setup.sh
Add a new step in `docker-setup.sh` that deploys FileBrowser **after** Traefik is running
and **after** HDD is mounted (HDD_PATH must be known).
**Implementation notes:**
- The step should come after `install_traefik()` and after HDD detection/mount
- Requires `HDD_PATH` to be set — if no HDD is configured, **skip FileBrowser deployment**
and log a warning: "FileBrowser skipped — no HDD path configured. Deploy manually after HDD setup."
- Create the compose file from template (substitute `${DOMAIN}` and `${HDD_PATH}`)
- Create a `.env` file in the filebrowser stack dir with `DOMAIN` and `HDD_PATH`
- `docker compose up -d`
- Verify container is running
**New function:** `install_filebrowser()`
### 1.4 — Add to Protected Stacks
Update `controller.yaml.example` to include filebrowser in the protected list:
```yaml
stacks:
protected:
- "traefik"
- "cloudflared"
- "felhom-controller"
- "filebrowser" # ← ADD
```
Also update any hardcoded protected stack references in documentation/README.
### 1.5 — Remove FileBrowser from App Catalog
In `app-catalog-felhom.eu` repository:
- **Delete** `templates/filebrowser/` directory (docker-compose.yml + .felhom.yml)
- **Delete** `existing-appinfo/filebrowser-appinfo.yml` if it exists
- FileBrowser should no longer appear in the "Alkalmazások" catalog on the dashboard
### 1.6 — Dashboard UI for Infrastructure Services
Currently the dashboard shows catalog apps. Infrastructure services (traefik, cloudflared,
controller, filebrowser) are hidden. Consider adding a small "Rendszer" (System) section
at the bottom of the sidebar or dashboard that shows infrastructure service status.
**This is optional / future work** — not blocking. The controller already knows about
protected stacks via `IsProtectedStack()`. The UI just needs to render them differently
if this section is added.
---
## Task 2: Orphan Stack Detection and Deletion
### Context
When an app is removed from the catalog (e.g., Stirling-PDF replaced by BentoPDF), its stack
directory may still exist in `/opt/docker/stacks/` with containers still deployed. These
"orphaned" stacks need to be visible on the dashboard with a clear state and deletable by the user.
Because FileBrowser (Task 1) gives users permanent access to their HDD data, the delete flow
can safely remove Docker volumes while informing users their HDD files are still accessible.
### 2.1 — New Stack State: `orphaned`
Add a new constant in the stacks package:
```go
if err := yaml.Unmarshal(data, &meta); err != nil {
// Parse error — still return defaults <-- NO LOG OUTPUT!
dirName := filepath.Base(stackDir)
meta.DisplayName = toTitleCase(strings.ReplaceAll(dirName, "-", " "))
meta.Slug = dirName
return meta
StateOrphaned ContainerState = "orphaned"
```
An orphaned stack is defined as:
- Has a `docker-compose.yml` in `/opt/docker/stacks/<name>/`
- Has `app.yaml` with `deployed: true`
- Does **NOT** have a matching template in the synced catalog
### 2.2 — Orphan Detection in ScanStacks()
After the existing scan loop in `ScanStacks()`, add orphan detection:
```go
// After scanning all stack dirs, check which deployed stacks have no catalog template
catalogTemplates := m.getCatalogTemplateSlugs() // returns set of slugs from synced catalog
for name, stack := range m.stacks {
if stack.Protected {
continue // infrastructure stacks are never orphaned
}
if !stack.Deployed {
continue // not deployed = just an available template, not orphaned
}
if !catalogTemplates[name] {
stack.Orphaned = true
}
}
```
The YAML parse is failing for some reason, and the error is silently ignored.
## Step 1: Diagnose — check what the controller has in memory
Run this from your workstation to see the raw API data for romm:
```bash
# Get a session first, then check the stack data
ssh kisfenyo@192.168.0.162 "curl -s http://localhost:8080/api/stacks" 2>/dev/null | head -200
```
Wait — API needs auth. Easier approach: add a temporary debug log.
## Step 2: Add error logging to LoadMetadata
In `controller/internal/stacks/metadata.go`, the `LoadMetadata` function currently swallows
parse errors silently. **Add logging** so we can see what's failing:
**Add `Orphaned` field to Stack struct:**
```go
func LoadMetadata(stackDir string) Metadata {
meta := Metadata{}
path := filepath.Join(stackDir, ".felhom.yml")
data, err := os.ReadFile(path)
if err != nil {
dirName := filepath.Base(stackDir)
meta.DisplayName = toTitleCase(strings.ReplaceAll(dirName, "-", " "))
meta.Slug = dirName
meta.Category = "tools"
return meta
}
if err := yaml.Unmarshal(data, &meta); err != nil {
// ADD THIS LOG LINE — this is critical for debugging
fmt.Fprintf(os.Stderr, "[ERROR] Failed to parse .felhom.yml in %s: %v\n", stackDir, err)
dirName := filepath.Base(stackDir)
meta.DisplayName = toTitleCase(strings.ReplaceAll(dirName, "-", " "))
meta.Slug = dirName
return meta
}
// ADD THIS DEBUG LINE — confirms successful parse
fmt.Fprintf(os.Stderr, "[DEBUG] Loaded metadata for %s: tagline=%q, useCases=%d, optConfig=%d\n",
filepath.Base(stackDir), meta.AppInfo.Tagline, len(meta.AppInfo.UseCases), len(meta.OptionalConfig))
// ... rest of function unchanged ...
type Stack struct {
Name string `json:"name"`
Meta Metadata `json:"meta"`
ComposePath string `json:"compose_path"`
State ContainerState `json:"state"`
Deployed bool `json:"deployed"`
Protected bool `json:"protected"`
Orphaned bool `json:"orphaned"` // ← ADD
Containers []ContainerInfo `json:"containers"`
AppConfig *AppConfig `json:"app_config,omitempty"`
LastUpdated time.Time `json:"last_updated"`
}
```
You'll need to add `"fmt"` to the imports if not already there.
**`getCatalogTemplateSlugs()`** needs to read the synced catalog directory and return
a `map[string]bool` of all template slugs that have a `docker-compose.yml`. The synced
catalog lives at the path configured in `git.local_path` or wherever the catalog sync
stores templates after git pull. Check the existing `catalogsync` package for the exact path.
## Step 3: Build, deploy, check logs
### 2.3 — Dashboard UI for Orphaned Stacks
Follow the CLAUDE.md build workflow. After deploying, immediately check logs:
Orphaned stacks appear in the deployed apps list with distinct visual treatment:
```bash
ssh kisfenyo@192.168.0.162 "docker logs felhom-controller 2>&1 | grep -E 'ERROR.*felhom.yml|DEBUG.*Loaded metadata'"
**Visual styling:**
- Left border: amber/yellow (instead of green for running or gray for stopped)
- Badge: `Elavult` (Deprecated) — amber background, dark text
- App name still shown from `.felhom.yml` metadata (if available) or directory name
- Show current state (running/stopped) alongside the orphan badge
**Available actions for orphaned stacks:**
- ✅ Start / Stop (normal controls — user may need to run it briefly)
- ✅ View logs
-**Törlés** (Delete) button — NEW, only shown for orphaned stacks
- ❌ No "Frissítés" (Update) — no catalog template to update from
- ❌ No "Beállítások" (Settings) — no deploy_fields to configure
### 2.4 — Delete API Endpoint
**Endpoint:** `DELETE /api/stacks/{name}`
**Request body:**
```json
{
"remove_hdd_data": false
}
```
This will show either:
- `[ERROR] Failed to parse .felhom.yml in /opt/docker/stacks/romm: <SPECIFIC ERROR>` — tells us exactly what's wrong
- `[DEBUG] Loaded metadata for romm: tagline="Retró...", useCases=5, optConfig=1` — parsing works, problem is elsewhere
**Preconditions (return 409 Conflict if violated):**
- Stack must be **stopped** (State != running). Force the user to stop first.
- Stack must be **orphaned** (for now — catalog apps cannot be deleted, only stopped).
In the future this could be relaxed, but for safety, start with orphan-only deletion.
## Step 4: Fix based on diagnosis
**Execution steps:**
### If YAML parse error:
The error message will tell us exactly what's wrong. Common causes:
- **Special Unicode characters** in quoted strings (Hungarian quotes `„"`, em-dash `—`)
- **Encoding issue** (BOM character at start of file)
- **Indentation** mismatch
Fix the `.felhom.yml` content accordingly in the app-catalog repo, commit + push, trigger sync.
### If parsing succeeds but data still missing:
The problem is in how `ScanStacks()` stores/retrieves metadata. Check:
- Does `GetStacks()` return the full `Meta` field?
- Is there any code that creates a new `Metadata{}` after `LoadMetadata()`?
## Step 5: After fixing, clean up debug logging
Once the issue is resolved:
1. Keep the `[ERROR]` log line (it should have been there from the start — silent failures are bad)
2. Remove or gate the `[DEBUG]` line behind `isDebug()` check (or just remove it)
3. Build + deploy the final version
## Build workflow fix for CLAUDE.md
Also update the deploy command in CLAUDE.md. The `sed` approach for updating the image tag is
fragile — it matched the service name line too and broke the YAML last time.
Replace Step 3 in CLAUDE.md with this safer approach:
```bash
# Deploy on demo node — use targeted sed that only matches the 'image:' line
ssh kisfenyo@192.168.0.162 "cd /opt/docker/felhom-controller && sudo docker pull gitea.dooplex.hu/admin/felhom-controller:<NEW_VERSION> && sudo sed -i 's|image: gitea.dooplex.hu/admin/felhom-controller:.*|image: gitea.dooplex.hu/admin/felhom-controller:<NEW_VERSION>|' docker-compose.yml && sudo docker compose up -d"
```
1. Verify preconditions (stopped + orphaned)
2. Read docker-compose.yml to identify:
a. Named Docker volumes (from `volumes:` top-level section)
b. HDD bind mounts (paths starting with ${HDD_PATH})
3. Run: docker compose down --rmi local --volumes
- This removes containers, local images, AND named Docker volumes (SSD data)
- Named volumes (configs, databases, caches) are always removed — they're useless
without the app and are the #1 cause of "Docker ate my disk space"
4. If remove_hdd_data == true:
a. For each HDD bind mount found in step 2:
- Calculate size: du -sh <path>
- Remove: rm -rf <path>
- Log: "[INFO] Removed HDD data: <path> (<size>)"
b. WARNING: Never rm -rf ${HDD_PATH} itself or ${HDD_PATH}/media/ or
${HDD_PATH}/Dokumentumok/ — only remove app-specific subdirectories
like ${HDD_PATH}/storage/paperless/ or ${HDD_PATH}/storage/immich/
5. Remove stack directory: rm -rf /opt/docker/stacks/<name>/
6. Log the complete delete action with timestamp
7. Trigger ScanStacks() to refresh dashboard
8. Return 200 OK with summary
```
Key differences from previous command:
- `sudo` for both sed and docker compose (the directory is root-owned)
- sed pattern matches `image: gitea.dooplex.hu/admin/felhom-controller:.*` — only the image line, not the service name
- Single SSH command to avoid partial failures
**Response body:**
```json
{
"deleted": "stirling-pdf",
"volumes_removed": ["stirling_pdf_data"],
"hdd_paths_removed": [],
"hdd_paths_preserved": ["/mnt/hdd_1/storage/stirling-pdf (245 MB)"]
}
```
Update the CLAUDE.md file with this corrected deploy command.
**Safety guards:**
- Protected stacks can never be deleted (check `IsProtectedStack()`)
- Running stacks can never be deleted (must stop first)
- Only orphaned stacks can be deleted (for now)
- HDD data deletion is opt-in (default false)
- Never delete top-level HDD directories (media/, storage/, Dokumentumok/)
- Log every delete action with full details
## Summary
### 2.5 — HDD Data Discovery for Delete Dialog
The core issue is that `LoadMetadata()` silently swallows YAML parse errors. Even if the fix
turns out to be a simple YAML syntax issue, the error logging should be added permanently —
silent failures make debugging impossible.
The delete confirmation dialog needs to show what HDD data exists and its size.
**New endpoint:** `GET /api/stacks/{name}/hdd-data`
Parses the stack's `docker-compose.yml` to find HDD bind mounts, checks if paths exist
on disk, and returns size info:
```json
{
"stack": "stirling-pdf",
"hdd_paths": [
{
"path": "/mnt/hdd_1/storage/stirling-pdf",
"size_bytes": 256901120,
"size_human": "245 MB",
"exists": true
}
],
"has_hdd_data": true
}
```
If no HDD bind mounts exist (SSD-only app like Vaultwarden, Mealie), return:
```json
{
"stack": "vaultwarden",
"hdd_paths": [],
"has_hdd_data": false
}
```
### 2.6 — Delete Confirmation Dialog (UI)
**Full dialog (when HDD data exists):**
```
┌──────────────────────────────────────────────────────┐
│ Stirling-PDF törlése │
│ │
│ ⚠ Ez az alkalmazás már nem érhető el a │
│ katalógusban. │
│ │
│ Az alkalmazás eltávolítása magában foglalja a │
│ konténereket, beállításokat és belső adatbázist. │
│ │
│ ☐ Felhasználói adatok törlése │
│ 📁 /srv/storage/stirling-pdf (245 MB) │
Ha nem törli, a Fájlkezelőben továbbra is │
│ elérheti ezeket a fájlokat. │
│ │
│ [Mégse] [Törlés] │
└──────────────────────────────────────────────────────┘
```
Notes:
- Show the FileBrowser-relative path (`/srv/storage/...`) not the system path — this is
what the user sees in FileBrowser
- The "Felhasználói adatok törlése" checkbox is **unchecked by default**
- The info hint reminds users about FileBrowser access (Task 1 must be completed first)
- "Törlés" button should be red/destructive styling
**Simple dialog (no HDD data — SSD-only apps):**
```
┌──────────────────────────────────────────────────────┐
│ Vaultwarden törlése │
│ │
│ ⚠ Ez az alkalmazás már nem érhető el a │
│ katalógusban. │
│ │
│ Az alkalmazás és minden adata véglegesen törlődik. │
│ │
│ [Mégse] [Törlés] │
└──────────────────────────────────────────────────────┘
```
No checkbox needed — there's nothing optional to preserve.
### 2.7 — Router Registration
Add to the API router:
```go
r.HandleFunc("/api/stacks/{name}/hdd-data", r.getStackHDDData).Methods("GET")
r.HandleFunc("/api/stacks/{name}", r.deleteStack).Methods("DELETE")
```
Both require authentication (same as existing stack endpoints).
---
## Task 3: App Catalog Fixes
These are independent template fixes in the `app-catalog-felhom.eu` repository.
### 3.1 — BentoPDF: Change Subdomain to pdf.*
<should be done already, verify>
### 3.2 — Calibre-Web → Calibre-Web-Automated
<should be done already, verify>
### 3.3 — FileBrowser → FileBrowser Quantum (Catalog Removal)
Since FileBrowser is now infrastructure (Task 1), **remove it from the catalog entirely:**
- **Delete** `templates/filebrowser/` directory
- **Delete** `existing-appinfo/filebrowser-appinfo.yml`
- The existing `filebrowser` catalog entry in any customer's deployed stacks will become
orphaned (Task 2 handles this gracefully)
**Note:** If a customer already has the old catalog-based FileBrowser deployed, it will show
as orphaned after catalog sync. They can delete it via the orphan workflow. The infrastructure
FileBrowser (Task 1) will already be running at `files.${DOMAIN}`.
---
## Implementation Checklist
### deploy-felhom-compose repository
- [ ] **hdd-setup.sh**: Add `Dokumentumok/` to folder structure, remove `storage/filebrowser`
- [ ] **docker-setup.sh**: Add `install_filebrowser()` function
- [ ] **controller.yaml.example**: Add `filebrowser` to `stacks.protected` list
- [ ] **stacks/manager.go** (or equivalent):
- [ ] Add `Orphaned` field to `Stack` struct
- [ ] Add `StateOrphaned` constant
- [ ] Add orphan detection in `ScanStacks()`
- [ ] Add `getCatalogTemplateSlugs()` helper
- [ ] **stacks/delete.go** (new file or add to manager):
- [ ] `DeleteStack()` method with volume + HDD cleanup
- [ ] `GetStackHDDData()` method for size discovery
- [ ] HDD path parsing from docker-compose.yml
- [ ] Safety guards (protected, running, top-level dir protection)
- [ ] **api/router.go**:
- [ ] `DELETE /api/stacks/{name}` endpoint
- [ ] `GET /api/stacks/{name}/hdd-data` endpoint
- [ ] **templates/dashboard.html** (or relevant UI template):
- [ ] Orphan badge styling (amber)
- [ ] Delete button for orphaned stacks
- [ ] Delete confirmation dialog with HDD data info
- [ ] FileBrowser hint in delete dialog
- [ ] **README.md**: Update protected stacks list, document delete flow
### app-catalog-felhom.eu repository
- [ ] Delete `templates/stirling-pdf/` (if exists)
- [ ] Delete `templates/filebrowser/` (moved to infra)
- [ ] Delete `existing-appinfo/filebrowser-appinfo.yml`
- [ ] Update `templates/bentopdf/` — subdomain `bento.*``pdf.*`
- [ ] Replace `templates/calibre-web/` with calibre-web-automated version
- [ ] Verify all YAML files parse without errors
- [ ] **README.md**: Update accordingly