# CLAUDE.md — Project Instructions for Claude Code (`felhom.eu`) > Read automatically by Claude Code when it works in this repo. Keep it updated as the project evolves. Cross-repo orientation (the felhom system, artifact taxonomy, access) lives in the workspace-root `e:\git\CLAUDE.md`; this file is `felhom.eu`-specific. ## Project overview This repo (`felhom.eu`) contains: - **Website** (`website/`) — static HTML at felhom.eu, served via k3s nginx + git-sync sidecar. - **Hub** (`hub/`) — Go application (felhom-hub) — the **operator backend**, on k3s at `hub.felhom.eu`. - **K8s manifests** (`manifests/`) — k3s deployment manifests for felhom-system services. - **Architecture docs** (`documentation/`) — the **authoritative design home for the whole Felhom system**: `architecture/01..05-*.md` (topology/trust, controller module map, host-agent, signing, hub), `proxmox-platform.md`, and `tests/phase{0,1-2,3,4}-findings.md`. Read these before designing. See `README.md` for full architecture/DNS/email/SEO docs. See `TASK.md` for the current task (if any). ## The Felhom system (so the hub's role is in context) Felhom is **Proxmox-based**, with a locked **three-component model**: - **Hub** (this repo, `hub/`) — operator backend. Authors operator *intent*; mirrors box *reality*; holds **no data-plane role** and never connects inbound to a box. - **Host agent** (repo `felhom-agent/`) — one per Proxmox host; owns all Proxmox interaction. - **In-guest controller** (repo `felhom-controller/`) — one per customer LXC; Docker-only. The hub is **not** just controller monitoring anymore. As of slice 3 it ingests **two report streams**: the agent's host-domain report (`POST /api/v1/host-report`, the heartbeat) and the legacy controller report (`POST /api/v1/report`). The controller path is **frozen and retires at the slice-10 cutover** — do not modify it until then. ## Hub — current state (v0.7.x) - **Tables:** `customer_configs`, `events`, `app_telemetry`/`app_log_issues`, the legacy `reports`, and the slice-3 host-domain additions `hosts` / `guests` / `host_reports` (additive; columns marked inert exist for the slice-10 cutover but are unused now). - **Auth:** Bearer — global key, per-customer key (legacy), and per-host key (`GetHostByAPIKey`, slice 3). Provisional global-key host mint at `POST /api/v1/admin/hosts`. - **Monitoring:** the controller `StalenessChecker` (over `reports`) AND a sibling `HostStalenessChecker` (over `host_reports`, emitting `host_stale`/`host_down`/`host_recovered`). - Two-tier notifications (operator English / customer Hungarian, Resend, cooldowns); `events` audit. ## Code quality rules - Always double-check generated code for bugs, logic issues, syntax errors. - Handle edge cases without overcomplicating. - Add debug capabilities (logging, verbose output). - If you need more input or troubleshooting output, **ask first — don't guess**. ## Workflow & artifacts The planning/architecture assistant ("project Claude", in claude.ai) writes specs and validates pushes; **you (Claude Code) implement**. A file being open in the editor is NOT an instruction. - **`TASK.md` / `TASK-*.md`** — a spec for you to implement. Then push and update this repo's changelog (`hub/CHANGELOG.md`) and root `REPORT.md` per the convention below. - **`RUNBOOK-*.md`** — an operational procedure. CC executes the steps it has access and capability for, including live validation on the demo nodes and the demo Proxmox host (CC has root@felhom-pve SSH + the felhom-agent token). A step is human-only only when it genuinely needs physical presence, a real-world decision, or credentials CC truly lacks — mark those steps HUMAN. Do not decline a whole procedure because it touches a live host or a privileged token. (Judgment still applies: confirm before irreversible ops on real customer data — but demo scratch guests are fair game.) - Validation of a push against a spec's criteria is project Claude's job, not yours, unless asked. > **In every repository where you make a change, update both files in that repo:** > - **`CHANGELOG.md`** — a cumulative log of **all** changes; newest entry on top. > - **`REPORT.md`** — **overwrite** with a summary of the **most recent** implementation (or significant validation/operational run) only; not cumulative. > > **Never write secrets** — tokens, passwords, private keys, API keys — into `CHANGELOG.md`, `REPORT.md`, or any committed file. Reference them as "stored out-of-band" instead. ## Tech stack (Hub) - **Language:** Go 1.24+ (build server is go1.26.0). - **Web:** stdlib `net/http` + `html/template`. **DB:** SQLite via `modernc.org/sqlite` (pure Go). - **Auth:** bcrypt + Bearer tokens. **Deploy:** Docker on k3s (felhom-system ns). - **Storage:** Longhorn PVC at `/data/` (SQLite DB). **Config:** YAML via ConfigMap at `/etc/felhom-hub/hub.yaml`. ## SSH access Use the Windows OpenSSH binary (Git Bash's `/usr/bin/ssh` can't reach the Windows agent and fails silently): `SSH=/c/Windows/System32/OpenSSH/ssh.exe`. All SSH commands below use `$SSH`. | Host | IP | User | Role | |------|----|------|------| | Build server (k3s node) | 192.168.0.180 | kisfenyo | Build + push images, kubectl (needs `sudo`) | | Demo Proxmox host | 192.168.0.162 | root@pam (SSH alias felhom-pve, root, no sudo) | pveum/pct + live Proxmox validation — available to CC | ## Build & deploy — Hub (GitOps via ArgoCD) The whole k3s cluster is GitOps via a **single ArgoCD app named `felhom`** (`argocd.dooplex.hu`) that syncs this repo's **`manifests/`** to the **`felhom-system`** namespace. **There is no separate `hub` ArgoCD app** — the hub is one `Deployment` (`manifests/hub.yaml`) *inside* the `felhom` app. **Auto-sync is OFF**: deploys are a deliberate manual sync. ArgoCD's source of truth is the **manifest**, so: - **A code change + CHANGELOG version bump does NOT deploy anything.** The running image only changes when `manifests/hub.yaml`'s `image:` tag changes in git and the app is synced. - **Pin explicit versions, never `:latest`.** A `:latest` re-push wouldn't change the manifest, so ArgoCD wouldn't redeploy, and Synced / History / Rollback would all misreport what's actually live. After a code change to `hub/`, to deploy: 1. **Commit + push the code:** `cd /e/git/felhom.eu && git add -A && git commit -m "" && git push` 2. **Build + push the image** (build script lives on the build server, not in this repo): `$SSH kisfenyo@192.168.0.180 "cd ~/build/felhom-hub && ./build.sh --push"` (pulls latest from Gitea, builds version into `main.Version` via ldflags, pushes `gitea.dooplex.hu/admin/felhom-hub:`). Pin ``; don't rely on `:latest`. 3. **Bump the manifest:** set the `image:` tag in `manifests/hub.yaml` to `:`, commit to `main`, push. The `felhom` app now shows **OutOfSync**. 4. **Sync** (auto-sync is off, so this is required). Easiest is the ArgoCD UI → app `felhom` → **Sync**. From the shell, the `argocd` CLI on 180 is **not logged in** (no server session) and `--core` looks in the wrong namespace under `sudo` (env is stripped) — so the reliable scripted path is to drive the Application CR with `kubectl`: ```bash # a) hard-refresh so ArgoCD picks up the new commit, then confirm OutOfSync: $SSH kisfenyo@192.168.0.180 "sudo kubectl -n argocd annotate application felhom argocd.argoproj.io/refresh=hard --overwrite; sleep 8; sudo kubectl -n argocd get application felhom -o jsonpath='{.status.sync.status} {.status.sync.revision}{\"\n\"}'" # b) trigger the sync via the .operation field (the app controller runs it): $SSH kisfenyo@192.168.0.180 "sudo kubectl -n argocd patch application felhom --type merge -p '{\"operation\":{\"initiatedBy\":{\"username\":\"cc\"},\"sync\":{\"syncStrategy\":{\"apply\":{}}}}}'" ``` (If you do log the CLI in: `argocd app sync felhom` is the one-liner equivalent.) 5. **Verify:** `$SSH kisfenyo@192.168.0.180 "sudo kubectl -n argocd get application felhom -o jsonpath='sync={.status.sync.status} health={.status.health.status}{\"\n\"}'; sudo kubectl -n felhom-system rollout status deploy/hub --timeout=90s; sudo kubectl -n felhom-system get deploy hub -o jsonpath='{.spec.template.spec.containers[0].image}'; echo; sudo kubectl -n felhom-system logs -l app=hub --tail 10"` (expect Synced/Healthy + the new tag + `[INFO] felhom-hub starting`). > A bare `kubectl set image` would be reverted on the next sync (the manifest is the truth) — always go through `manifests/hub.yaml`. **The live image can lag the CHANGELOG** when version bumps were committed but step 3/4 was never done; reconcile via the manifest, not by assuming the changelog reflects what's running. ## Build & deploy — Website / Manifests - **Website** auto-deploys via git-sync; just push to `main` (live in 1–2 min). Emergency edits: FileBrowser at `https://files.felhom.eu`. - **Manifests** (`manifests/`) are GitOps via the `felhom` ArgoCD app — commit to `main`, then sync (auto-sync is off): UI Sync or `argocd app sync felhom`. Do **not** `kubectl apply` them directly (a later sync reverts drift; the manifest in git is the truth). ## Key patterns - Hub ingests **host-reports from agents** (`POST /api/v1/host-report`, Bearer per-host) and legacy **controller reports** (`POST /api/v1/report`). The host-report `received_at` is the dead-man's-switch liveness signal. - Status logic: OK (report < 30m), WARN (30m–1h or health=warn), DOWN (> 1h or health=fail). - SQLite timestamps vary in format — use `parseSQLiteTime()`. - Dashboard/detail auto-refresh every 60s via ``. Geo-restricted to Hungary via nginx ingress annotation. ## File encoding All `website/` HTML is **UTF-8 with BOM** — preserve it. Hub Go source is standard UTF-8 (no BOM).