086281b582
Reflow removes hard mid-paragraph line wraps (code blocks and tables untouched); rendered output unchanged. Adds the uniform CHANGELOG (cumulative) / REPORT (overwrite-latest) convention plus a no-secrets rule. Docs/meta only, no version bump. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
316 lines
20 KiB
Markdown
316 lines
20 KiB
Markdown
# CLAUDE.md — Project Instructions for Claude Code
|
|
|
|
> This file is read automatically by Claude Code at the start of every session. It replaces the "Instructions" panel from the claude.ai Project. Keep it updated as the project evolves.
|
|
|
|
!!! IMPORTANT !!!
|
|
- Always update CHANGELOG.md whenever you modified the code, and pushed to git!!
|
|
- IF controller feature changed (new/modify/remove) always update the relevant part of controller/README.md with the architectural change!!
|
|
|
|
## Project overview
|
|
|
|
Creating a business (Felhom) for home-server deployment for Hungarian customers. This repository (`felhom-controller`) contains the felhom-controller — a Go application that manages Docker Compose stacks on customer hardware via a Hungarian-language web dashboard.
|
|
|
|
See `controller/README.md` for full architecture and status (update after each session, keep track of how different functions/features operate, like backup, monitoring, storage handling, app management, user settings, update workflow, notification system, etc-etc...).
|
|
See `CHANGELOG.md` for recent work (update after each session — see "Working with CHANGELOG.md" below).
|
|
See `CONTEXT.md` for current project state, decisions and roadmap (update after each session).
|
|
See `TASK.md` for the current task to implement (if it exists).
|
|
|
|
Claude in Chrome extension is available — can be used to test web UI on demo-felhom.eu or verify dashboard deployments in browser.
|
|
|
|
## System context — the Proxmox re-platform (READ THIS FIRST)
|
|
|
|
The project has **re-platformed onto Proxmox**, with a locked **three-component model**:
|
|
- **Hub** (`felhom.eu/hub/`) — operator backend on k3s.
|
|
- **Host agent** (`felhom-agent/`, formerly `proxmox-controller`) — one per Proxmox host; operator-tier; owns ALL Proxmox interaction.
|
|
- **In-guest controller** (THIS repo) — one per customer LXC; **Docker-only; holds NO Proxmox credentials**.
|
|
|
|
**This repo is being de-privileged.** In the target model, host/disk/Proxmox/Cloudflare responsibilities move OUT of the controller into the **host agent**: System info, Storage (disk scan/format/mount/migrate), the disk-tier Backup (restic, cross-drive, drive-restore, infra-backup), and the Cloudflare-API geo enforcement. The controller keeps the **app domain**: stack/deploy management, the Hungarian web UI, app-data backup (DB dumps + Docker-volume tars), metrics/telemetry, integrations, git-sync, notifications.
|
|
|
|
> **Authoritative map:** `felhom.eu/documentation/architecture/02-controller-module-map.md` — the per-package **KEEP / PORT / DELETE(→agent) / DELETE(obsolete) / MODIFY** classification. Read it before touching `backup/`, `storage/`, `cloudflare/`, `system/`, or `config/`. Also doc 01 (topology/trust) and doc 03 (the host agent).
|
|
|
|
**⚠️ Status — do NOT assume the target state is implemented.** The de-privileging has only *started*: the recent `internal/appbackup/` extraction split the keep-side app-data-backup primitives from the delete-side disk/host code (groundwork, no behaviour change). The **bulk strip has NOT happened** — the current code STILL contains the full privileged storage / restic / cross-drive / disk / Cloudflare stack. The strip + the agent-local-API client land at **~slice 8**. So the code you see is the **pre-strip, still-privileged** controller; match the code, not the target, unless a TASK says otherwise.
|
|
|
|
**Don't confuse the two ex-"controllers":** `felhom-agent` (host, operator-tier, was `proxmox-controller`) vs this `felhom-controller` (in-guest, was `deploy-felhom-compose`).
|
|
|
|
## Cross-repo & artifacts
|
|
|
|
- Workspace orientation (the felhom system, shared conventions, access) lives in the workspace-root `e:\git\CLAUDE.md`. Sibling per-repo files: `felhom-agent/CLAUDE.md`, `felhom.eu/CLAUDE.md`.
|
|
- **Artifact taxonomy:** `TASK.md` / `TASK-*.md` = a spec for YOU to implement (then push + update CHANGELOG + CONTEXT + README).
|
|
- **`RUNBOOK-*.md`** — an operational procedure. CC executes the steps it has access and capability for, including live validation on the demo nodes and the demo Proxmox host (CC has root@felhom-pve SSH + the felhom-agent token). A step is human-only only when it genuinely needs physical presence, a real-world decision, or credentials CC truly lacks — mark those steps HUMAN. Do not decline a whole procedure because it touches a live host or a privileged token. (Judgment still applies: confirm before irreversible ops on real customer data — but demo scratch guests are fair game.)
|
|
|
|
> **In every repository where you make a change, update both files in that repo:**
|
|
> - **`CHANGELOG.md`** — a cumulative log of **all** changes; newest entry on top.
|
|
> - **`REPORT.md`** — **overwrite** with a summary of the **most recent** implementation (or significant validation/operational run) only; not cumulative.
|
|
>
|
|
> **Never write secrets** — tokens, passwords, private keys, API keys — into `CHANGELOG.md`, `REPORT.md`, or any committed file. Reference them as "stored out-of-band" instead.
|
|
|
|
## Code quality rules
|
|
|
|
- Always double-check generated code for bugs, logic issues, syntax errors
|
|
- Handle edge cases without overcomplicating the script/program
|
|
- Add debug capabilities (logging, verbose output) for easier troubleshooting
|
|
- If you need more input or troubleshooting command output, ask first — don't guess
|
|
|
|
## Environment
|
|
|
|
| Machine | OS | IP | Purpose |
|
|
|---------|----|----|---------|
|
|
| **Local (this machine)** | Windows 11 | — | Development, Claude Code runs here. Repos in `E:\git\` |
|
|
| **Build server (k3s, infra)** | Debian 13 | 192.168.0.180 | Build + push container images, k3s cluster |
|
|
| **Demo node** | Debian 13 | 192.168.0.162 | Test deployment (demo-felhom.eu) |
|
|
| **Demo node 2** | Debian 13 | router.abonet.hu:33022 | Remote test deployment |
|
|
|
|
## Workspace layout
|
|
|
|
Claude Code runs on Windows 11. The working directory is `E:\git\` (mapped as `/e/git/` in Git Bash). This repo is at:
|
|
|
|
```
|
|
E:\git\felhom-controller\ (or /e/git/felhom-controller/ in Git Bash)
|
|
├── controller/ # Go application (main codebase)
|
|
│ ├── cmd/controller/ # Entry point (main.go)
|
|
│ ├── internal/
|
|
│ │ ├── config/ # YAML config loading
|
|
│ │ ├── settings/ # settings.json persistence (password hash, DB cache)
|
|
│ │ ├── stacks/ # Docker Compose operations, deploy flow
|
|
│ │ ├── sync/ # Git sync — periodic pull of app catalog repo
|
|
│ │ ├── api/ # REST API endpoints
|
|
│ │ ├── system/ # System info (memory, disk)
|
|
│ │ └── web/ # Dashboard UI
|
|
│ │ ├── server.go # Server struct, routing, static serving
|
|
│ │ ├── auth.go # Session auth, login/logout handlers
|
|
│ │ ├── handlers.go # Page handlers (dashboard, stacks, deploy, etc.)
|
|
│ │ ├── funcmap.go # Template function map
|
|
│ │ ├── embed.go # go:embed directive for templates
|
|
│ │ ├── templates.go # Felhom logo SVG constant
|
|
│ │ └── templates/ # go:embed HTML/CSS files (Hungarian UI)
|
|
│ ├── Dockerfile
|
|
│ ├── Makefile
|
|
│ └── go.mod
|
|
├── scripts/ # Setup scripts for customer nodes
|
|
├── CLAUDE.md # This file
|
|
├── CHANGELOG.md # Changelog
|
|
├── CONTEXT.md # Project memory / state / architectural state/decisions/roadmap
|
|
└── TASK.md # Current task (if exists)
|
|
```
|
|
|
|
Related repos (same parent directory):
|
|
```
|
|
E:\git\app-catalog-felhom.eu\ # Docker Compose templates + .felhom.yml metadata per app
|
|
E:\git\felhom.eu\ # Website (htmls) + k3s manifests
|
|
E:\git\homelab-manifests\ # k3s cluster manifests (dooplex.hu services)
|
|
E:\git\misc-scripts\ # Helper scripts
|
|
```
|
|
|
|
All repos hosted at `gitea.dooplex.hu/admin/`. Git credentials are stored (`git config credential.helper store`).
|
|
|
|
## SSH access
|
|
|
|
SSH key-based authentication is configured and working. No password prompts.
|
|
|
|
**IMPORTANT — SSH binary:** Claude Code runs in Git Bash, which has its own SSH at `/usr/bin/ssh` (= `C:\Program Files\Git\usr\bin\ssh.exe`). This binary does NOT have access to the Windows SSH agent and will fail silently (exit 0/141 with no output). Always use the Windows native OpenSSH binary with the full path:
|
|
|
|
```
|
|
SSH=/c/Windows/System32/OpenSSH/ssh.exe
|
|
```
|
|
|
|
All SSH commands in this file use `$SSH` — set it at the start of your session or substitute the full path manually.
|
|
|
|
| Host | OS | IP | User | Role |
|
|
|------|----|----|------|------|
|
|
| Build server | Debian 13 | 192.168.0.180 | kisfenyo | Build + push container images |
|
|
| Demo Proxmox host | 192.168.0.162 | root@pam (SSH alias felhom-pve, root, no sudo) | pveum/pct + live Proxmox validation — available to CC |
|
|
|
|
## Test environments
|
|
|
|
| Node | OS | Hardware | Domain | IP | Notes |
|
|
|------|-----|----------|--------|----|-------|
|
|
| demo-felhom | Debian 13 | Acemagic N100, 16G RAM, 512G SSD + 1TB HDD | demo-felhom.eu | 192.168.0.162 | Primary test node, Cloudflare Tunnel |
|
|
| felhotest | Debian 13 | Proxmox VM (4-16G RAM, 8 vCPU, 200G + 100G SCSI) | — | router.abonet.hu:33022 | Remote test node |
|
|
| pi-customer-1 | Debian 13 | Raspberry Pi 3B+, 1G RAM, 32G SD | pi-customer-1.local | 192.168.0.161 | Secondary test, not yet active |
|
|
|
|
- Pi-hole DNS on local network forwards `*.demo-felhom.eu` → 192.168.0.162
|
|
- External access via Cloudflare Tunnel → Traefik reverse proxy
|
|
|
|
> **⚠️ Re-platform note:** per the host-agent work, `192.168.0.162` is now a **Proxmox host** (`demo-felhom`, PVE 9.2.2) — the demo-node tables above predate that. Confirm how/where the controller is currently deployed and tested post-re-platform before relying on the bare-metal `docker compose` deploy steps below; on the re-platformed node the controller may now run inside an LXC guest rather than directly on the host.
|
|
|
|
## Build & deploy workflow — MANDATORY
|
|
|
|
After making code changes to the controller, you **MUST** build, push, and deploy the new image. Do NOT leave code changes uncommitted or undeployed. The full cycle is:
|
|
|
|
### Step 1: Commit and push changes
|
|
|
|
```bash
|
|
cd /e/git/felhom-controller
|
|
git add -A && git commit -m "<descriptive message>" && git push
|
|
```
|
|
|
|
### Step 2: Build + push the container image on the build server
|
|
|
|
The build server (192.168.0.180) has the build toolchain. The version tag should be incremented from the current running version.
|
|
|
|
!! Important: use "kisfenyo" user for SSH, as written below
|
|
|
|
First, set the SSH variable (required for every session — Git Bash's built-in ssh does NOT work):
|
|
```bash
|
|
SSH=/c/Windows/System32/OpenSSH/ssh.exe
|
|
```
|
|
|
|
Check the current running version:
|
|
```bash
|
|
$SSH kisfenyo@192.168.0.162 "docker ps --filter name=felhom-controller --format '{{.Image}}'"
|
|
```
|
|
|
|
Then build with the next version (e.g., if current is 0.2.10, use 0.2.11): IMPORTANT!: Build directory is: ~/build/felhom-controller
|
|
```bash
|
|
$SSH kisfenyo@192.168.0.180 "cd ~/build/felhom-controller && git -C ~/git/felhom-controller pull && ./build.sh <NEW_VERSION> --push"
|
|
```
|
|
|
|
The build script:
|
|
- Pulls latest code from Gitea
|
|
- Builds a multi-arch Docker image (amd64 + arm64) if `--multiarch`, or current arch if `--push`
|
|
- Pushes to `gitea.dooplex.hu/admin/felhom-controller:<VERSION>`
|
|
- Expects the version as first argument (e.g., `0.2.11`)
|
|
|
|
### Step 3: Deploy on demo nodes
|
|
|
|
```bash
|
|
# Demo node 1 (local)
|
|
$SSH kisfenyo@192.168.0.162 "cd /opt/docker/felhom-controller && sudo docker pull gitea.dooplex.hu/admin/felhom-controller:<NEW_VERSION> && sudo sed -i 's|image: gitea.dooplex.hu/admin/felhom-controller:.*|image: gitea.dooplex.hu/admin/felhom-controller:<NEW_VERSION>|' docker-compose.yml && sudo docker compose up -d"
|
|
|
|
# Demo node 2 (remote)
|
|
$SSH -p 33022 kisfenyo@router.abonet.hu "cd /opt/docker/felhom-controller && sudo docker pull gitea.dooplex.hu/admin/felhom-controller:<NEW_VERSION> && sudo sed -i 's|image: gitea.dooplex.hu/admin/felhom-controller:.*|image: gitea.dooplex.hu/admin/felhom-controller:<NEW_VERSION>|' docker-compose.yml && sudo docker compose up -d"
|
|
```
|
|
|
|
### Step 4: Verify the deployment
|
|
|
|
```bash
|
|
$SSH kisfenyo@192.168.0.162 "docker ps --filter name=felhom-controller --format '{{.Image}} {{.Status}}'"
|
|
$SSH -p 33022 kisfenyo@router.abonet.hu "docker ps --filter name=felhom-controller --format '{{.Image}} {{.Status}}'"
|
|
```
|
|
|
|
Should show the new version and "Up" status. Also check logs for startup errors:
|
|
```bash
|
|
$SSH kisfenyo@192.168.0.162 "docker logs felhom-controller --tail 20"
|
|
$SSH -p 33022 kisfenyo@router.abonet.hu "docker logs felhom-controller --tail 20"
|
|
```
|
|
|
|
### Build workflow summary
|
|
|
|
| Step | Command | Where |
|
|
|------|---------|-------|
|
|
| 0. Set SSH var | `SSH=/c/Windows/System32/OpenSSH/ssh.exe` | Local (once per session) |
|
|
| 1. Commit + push | `git add -A && git commit -m "..." && git push` | Local (this repo) |
|
|
| 2. Build + push image | `$SSH kisfenyo@192.168.0.180 "cd ~/build/felhom-controller... ./build.sh <VER> --push"` | Build server |
|
|
| 3. Deploy (node 1) | `$SSH kisfenyo@192.168.0.162 "... docker compose up -d"` | Demo node |
|
|
| 3b. Deploy (node 2) | `$SSH -p 33022 kisfenyo@router.abonet.hu "... docker compose up -d"` | Demo node 2 |
|
|
| 4. Verify | `$SSH kisfenyo@192.168.0.162 "docker ps ..."` + same for router.abonet.hu | Both nodes |
|
|
|
|
### Build & deploy workflow — Hub (felhom-hub)
|
|
|
|
The central hub (`hub.felhom.eu`) is a separate Go app in the `E:\git\felhom.eu\hub\` repo. The controller pushes periodic reports to it (when `hub.enabled: true` in `controller.yaml`).
|
|
|
|
| Step | Command | Where |
|
|
|------|---------|-------|
|
|
| 1. Commit + push | `cd /e/git/felhom.eu && git add -A && git commit && git push` | Local |
|
|
| 2. Build + push image | `$SSH kisfenyo@192.168.0.180 "cd ~/build/felhom-hub && ./build.sh <VER> --push"` | Build server |
|
|
| 3. Deploy to k3s | `$SSH kisfenyo@192.168.0.180 "sudo kubectl set image -n felhom-system deploy/hub hub=gitea.dooplex.hu/admin/felhom-hub:<VER>"` | Build server |
|
|
| 4. Verify | `$SSH kisfenyo@192.168.0.180 "sudo kubectl get pods -n felhom-system -l app=hub && sudo kubectl logs -n felhom-system -l app=hub --tail 10"` | Build server |
|
|
|
|
See `E:\git\felhom.eu\CLAUDE.md` for full hub details.
|
|
|
|
**IMPORTANT:** If you make changes to the app-catalog-felhom.eu repo, commit and push those too:
|
|
```bash
|
|
cd /e/git/app-catalog-felhom.eu
|
|
git add -A && git commit -m "<message>" && git push
|
|
```
|
|
The controller's git sync will pick up catalog changes within 15 minutes, or you can trigger it manually via the dashboard "Sablonok frissítése" button.
|
|
|
|
## Tech stack
|
|
|
|
- **Language:** Go 1.22+
|
|
- **Web framework:** stdlib `net/http` + `html/template` (no frameworks)
|
|
- **Templates:** go:embed HTML files in `internal/web/templates/` (Hungarian UI)
|
|
- **CSS:** go:embed CSS file in `internal/web/templates/style.css`
|
|
- **Auth:** bcrypt password hash + session cookies
|
|
- **Container orchestration:** Docker Compose via CLI (`docker compose up -d`)
|
|
- **Reverse proxy:** Traefik (separate stack, managed by controller)
|
|
- **Tunnel:** Cloudflare Tunnel (cloudflared, separate stack)
|
|
|
|
## Key patterns
|
|
|
|
- All UI text is in Hungarian (Budapest timezone, Hungarian locale)
|
|
- Templates use Go template functions: `stateColor`, `stateLabel`, `stateIcon`, `stateStr`, `isOperational`, `logoURL`, `logoPNGURL`, `appPageURL`
|
|
- Container states: `running`, `starting`, `unhealthy`, `stopped`, `exited`, `restarting`, `paused`, `not_deployed`
|
|
- Docker `.State` field is combined with `.Status` field to detect health substatus
|
|
- Stacks are sorted alphabetically by DisplayName
|
|
- Protected stacks (traefik, cloudflared, felhom-controller) can't be stopped from UI
|
|
- `app.yaml` persists deploy config; `deployed: true` flag controls UI state
|
|
- In-memory `Deployed` flag is set BEFORE `docker compose up -d` (avoids race condition with slow image pulls); reverted on failure
|
|
- Password fields require explicit user input or generation (no silent auto-fill)
|
|
- App cards on dashboard and stacks pages are clickable via `data-href` attribute (skip protected stacks)
|
|
- Logs page uses AJAX polling (`?raw=1` query param returns plain text) with auto-scroll and pause/resume
|
|
- Memory bar on deploy page uses two-segment stacked bar (committed = solid green, new = translucent green)
|
|
- Deploy flow shows 3-step progress panel (config → containers → health), polls `GET /api/stacks/{name}` every 3s until running/unhealthy/timeout(120s)
|
|
- Telepítés buttons have `checkBeforeDeploy()` onclick guard — fetches live state from API before navigating to deploy page
|
|
- App info pages at `/apps/{slug}` — detail view with use cases, setup guide, screenshots, optional config
|
|
- Optional config saves to `app.yaml` and restarts deployed apps via `docker compose up -d`
|
|
- `optional_config` fields in `.felhom.yml` define post-deploy configurable env vars (e.g., API keys)
|
|
- `app_info` in `.felhom.yml` provides tagline, use_cases, first_steps, prerequisites, default_creds, docs_url
|
|
|
|
## Git sync module (internal/sync)
|
|
|
|
- Uses `os/exec` to call `git` CLI — no Go git library dependency
|
|
- On startup: clones repo to `{data_dir}/catalog-cache/` (shallow clone, `--depth 1`)
|
|
- Periodically: `git fetch --depth 1` + `git reset --hard origin/{branch}`
|
|
- Copies only `docker-compose.yml` and `.felhom.yml` to stacks dir
|
|
- **Never overwrites** `app.yaml` — this contains deployed secrets
|
|
- Content-hash comparison (SHA-256) — only writes if file actually changed
|
|
- After sync, triggers `ScanStacks()` rescan for dashboard update
|
|
- `POST /api/sync` triggers immediate sync (30s debounce)
|
|
- "Sablonok frissítése" button on Alkalmazások page
|
|
- Sync status exposed in `/api/system/info` response
|
|
|
|
## Debug logging
|
|
|
|
The controller has two-tier logging controlled by `logging.level` in `controller.yaml` (or `FELHOM_LOGGING_LEVEL` env var):
|
|
|
|
- **`info`** (default): Operation success/failure with elapsed time, post-start container states, scan counts
|
|
- **`debug`**: All of above plus env var keys per compose command, local image availability checks, compose command completion times, log fetch byte counts
|
|
|
|
Key patterns used in `internal/stacks/`:
|
|
- `time.Since(start)` for operation timing — always logged at INFO level
|
|
- `m.isDebug()` gates verbose output (env var keys, image checks)
|
|
- `truncateStr(s, 500)` caps stdout/stderr in error logs
|
|
- `logPostStartStatus()` runs async (goroutine + 3s sleep) after start/restart/update/deploy — never blocks or fails the operation
|
|
- `checkLocalImages()` parses compose YAML for `image:` lines, runs `docker image inspect` per image
|
|
- Env var **keys** are logged, never values (secrets safety)
|
|
|
|
## Important lessons learned
|
|
|
|
1. `PAPERLESS_OCR_LANGUAGES` (plural, with S) **installs** tesseract packs; `PAPERLESS_OCR_LANGUAGE` (singular) **selects** which to use
|
|
2. `docker compose restart` does NOT pick up new images — always use `docker compose up -d`
|
|
3. Go map iteration order is random — always sort before displaying in UI
|
|
4. Docker's `.State` field says "running" even for unhealthy containers — must parse `.Status` for health info
|
|
5. In-memory `Deployed` flag must be set BEFORE `docker compose up -d` (not after) — compose can take 30-60s for image pulls; revert both in-memory and disk on failure
|
|
6. `docker compose up -d` returns exit 0 even when containers crash-loop — post-start status check is essential for detecting failures
|
|
7. Mealie image has no wget/curl — use Python TCP socket check for healthcheck; set `start_period: 60s` for DB migration time
|
|
8. Always verify container images have the healthcheck tool (`wget`, `curl`, etc.) before using it — Alpine has BusyBox wget, Python images have `python3`
|
|
|
|
## Working with CHANGELOG.md
|
|
|
|
**DO NOT read the full file** — it is large (29K+ tokens) and will waste context or fail.
|
|
|
|
- **At session start:** Do NOT read CHANGELOG.md. Use `CONTEXT.md` and `controller/README.md` for current state.
|
|
- **To add a new entry:** Read only the top ~30 lines (`limit: 30`) to see the format and insertion point, then use Edit to insert the new entry after line 1 (`## Changelog`).
|
|
- **To check history:** Use Grep to search for specific topics instead of reading the file.
|
|
|
|
## End-of-session checklist
|
|
|
|
Before ending a session, always:
|
|
|
|
1. **Commit and push** all code changes
|
|
2. **Build, push, and deploy** the new controller image (if controller code changed)
|
|
3. **Update CHANGELOG.md** with what was done
|
|
4. **Update CONTEXT.md** with decisions made, update architectural state and what's next
|
|
5. **Update controller/README.md** if architecture or features changed
|
|
6. **Verify** the deployment is working (check `docker ps` and logs) |