Files
felhom.eu/CLAUDE.md
T
admin 9347fcd3a5 docs(CLAUDE): correct hub/manifests deploy to GitOps via the 'felhom' ArgoCD app
No separate hub app; manifests/ synced by app 'felhom' (auto-sync off). Deploy =
build+push pinned image -> bump manifests/hub.yaml tag + commit -> manual sync.
Never :latest (manifest is ArgoCD's truth). Replaces the stale kubectl apply/set image steps.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-09 10:19:23 +02:00

8.5 KiB
Raw Blame History

CLAUDE.md — Project Instructions for Claude Code (felhom.eu)

Read automatically by Claude Code when it works in this repo. Keep it updated as the project evolves. Cross-repo orientation (the felhom system, artifact taxonomy, access) lives in the workspace-root e:\git\CLAUDE.md; this file is felhom.eu-specific.

Project overview

This repo (felhom.eu) contains:

  • Website (website/) — static HTML at felhom.eu, served via k3s nginx + git-sync sidecar.
  • Hub (hub/) — Go application (felhom-hub) — the operator backend, on k3s at hub.felhom.eu.
  • K8s manifests (manifests/) — k3s deployment manifests for felhom-system services.
  • Architecture docs (documentation/) — the authoritative design home for the whole Felhom system: architecture/01..05-*.md (topology/trust, controller module map, host-agent, signing, hub), proxmox-platform.md, and tests/phase{0,1-2,3,4}-findings.md. Read these before designing.

See README.md for full architecture/DNS/email/SEO docs. See TASK.md for the current task (if any).

The Felhom system (so the hub's role is in context)

Felhom is Proxmox-based, with a locked three-component model:

  • Hub (this repo, hub/) — operator backend. Authors operator intent; mirrors box reality; holds no data-plane role and never connects inbound to a box.
  • Host agent (repo felhom-agent/) — one per Proxmox host; owns all Proxmox interaction.
  • In-guest controller (repo felhom-controller/) — one per customer LXC; Docker-only.

The hub is not just controller monitoring anymore. As of slice 3 it ingests two report streams: the agent's host-domain report (POST /api/v1/host-report, the heartbeat) and the legacy controller report (POST /api/v1/report). The controller path is frozen and retires at the slice-10 cutover — do not modify it until then.

Hub — current state (v0.7.x)

  • Tables: customer_configs, events, app_telemetry/app_log_issues, the legacy reports, and the slice-3 host-domain additions hosts / guests / host_reports (additive; columns marked inert exist for the slice-10 cutover but are unused now).
  • Auth: Bearer — global key, per-customer key (legacy), and per-host key (GetHostByAPIKey, slice 3). Provisional global-key host mint at POST /api/v1/admin/hosts.
  • Monitoring: the controller StalenessChecker (over reports) AND a sibling HostStalenessChecker (over host_reports, emitting host_stale/host_down/host_recovered).
  • Two-tier notifications (operator English / customer Hungarian, Resend, cooldowns); events audit.

Code quality rules

  • Always double-check generated code for bugs, logic issues, syntax errors.
  • Handle edge cases without overcomplicating.
  • Add debug capabilities (logging, verbose output).
  • If you need more input or troubleshooting output, ask first — don't guess.

Workflow & artifacts

The planning/architecture assistant ("project Claude", in claude.ai) writes specs and validates pushes; you (Claude Code) implement. A file being open in the editor is NOT an instruction.

  • TASK.md / TASK-*.md — a spec for you to implement. Then push and update this repo's changelog (hub/CHANGELOG.md) and root REPORT.md per the convention below.
  • RUNBOOK-*.md — an operational procedure. CC executes the steps it has access and capability for, including live validation on the demo nodes and the demo Proxmox host (CC has root@felhom-pve SSH + the felhom-agent token). A step is human-only only when it genuinely needs physical presence, a real-world decision, or credentials CC truly lacks — mark those steps HUMAN. Do not decline a whole procedure because it touches a live host or a privileged token. (Judgment still applies: confirm before irreversible ops on real customer data — but demo scratch guests are fair game.)
  • Validation of a push against a spec's criteria is project Claude's job, not yours, unless asked.

In every repository where you make a change, update both files in that repo:

  • CHANGELOG.md — a cumulative log of all changes; newest entry on top.
  • REPORT.mdoverwrite with a summary of the most recent implementation (or significant validation/operational run) only; not cumulative.

Never write secrets — tokens, passwords, private keys, API keys — into CHANGELOG.md, REPORT.md, or any committed file. Reference them as "stored out-of-band" instead.

Tech stack (Hub)

  • Language: Go 1.24+ (build server is go1.26.0).
  • Web: stdlib net/http + html/template. DB: SQLite via modernc.org/sqlite (pure Go).
  • Auth: bcrypt + Bearer tokens. Deploy: Docker on k3s (felhom-system ns).
  • Storage: Longhorn PVC at /data/ (SQLite DB). Config: YAML via ConfigMap at /etc/felhom-hub/hub.yaml.

SSH access

Use the Windows OpenSSH binary (Git Bash's /usr/bin/ssh can't reach the Windows agent and fails silently): SSH=/c/Windows/System32/OpenSSH/ssh.exe. All SSH commands below use $SSH.

Host IP User Role
Build server (k3s node) 192.168.0.180 kisfenyo Build + push images, kubectl (needs sudo)
Demo Proxmox host 192.168.0.162 root@pam (SSH alias felhom-pve, root, no sudo) pveum/pct + live Proxmox validation — available to CC

Build & deploy — Hub (GitOps via ArgoCD)

The whole k3s cluster is GitOps via a single ArgoCD app named felhom (argocd.dooplex.hu) that syncs this repo's manifests/ to the felhom-system namespace. There is no separate hub ArgoCD app — the hub is one Deployment (manifests/hub.yaml) inside the felhom app. Auto-sync is OFF: deploys are a deliberate manual sync. ArgoCD's source of truth is the manifest, so:

  • A code change + CHANGELOG version bump does NOT deploy anything. The running image only changes when manifests/hub.yaml's image: tag changes in git and the app is synced.
  • Pin explicit versions, never :latest. A :latest re-push wouldn't change the manifest, so ArgoCD wouldn't redeploy, and Synced / History / Rollback would all misreport what's actually live.

After a code change to hub/, to deploy:

  1. Commit + push the code: cd /e/git/felhom.eu && git add -A && git commit -m "<msg>" && git push
  2. Build + push the image (build script lives on the build server, not in this repo): $SSH kisfenyo@192.168.0.180 "cd ~/build/felhom-hub && ./build.sh <NEW_VERSION> --push" (pulls latest from Gitea, builds version into main.Version via ldflags, pushes gitea.dooplex.hu/admin/felhom-hub:<VER>). Pin <VER>; don't rely on :latest.
  3. Bump the manifest: set the image: tag in manifests/hub.yaml to :<NEW_VERSION>, commit to main, push. The felhom app now shows OutOfSync.
  4. Sync: ArgoCD UI → app felhomSync, or $SSH kisfenyo@192.168.0.180 "argocd app sync felhom" (argocd CLI v3.2.1 at /usr/local/bin).
  5. Verify: $SSH kisfenyo@192.168.0.180 "sudo kubectl get deploy -n felhom-system hub -o jsonpath='{.spec.template.spec.containers[0].image}'; echo; sudo kubectl logs -n felhom-system -l app=hub --tail 10" (expect the new tag + [INFO] felhom-hub <VERSION> starting).

A bare kubectl set image would be reverted on the next sync (the manifest is the truth) — always go through manifests/hub.yaml. The live image can lag the CHANGELOG when version bumps were committed but step 3/4 was never done; reconcile via the manifest, not by assuming the changelog reflects what's running.

Build & deploy — Website / Manifests

  • Website auto-deploys via git-sync; just push to main (live in 12 min). Emergency edits: FileBrowser at https://files.felhom.eu.
  • Manifests (manifests/) are GitOps via the felhom ArgoCD app — commit to main, then sync (auto-sync is off): UI Sync or argocd app sync felhom. Do not kubectl apply them directly (a later sync reverts drift; the manifest in git is the truth).

Key patterns

  • Hub ingests host-reports from agents (POST /api/v1/host-report, Bearer per-host) and legacy controller reports (POST /api/v1/report). The host-report received_at is the dead-man's-switch liveness signal.
  • Status logic: OK (report < 30m), WARN (30m1h or health=warn), DOWN (> 1h or health=fail).
  • SQLite timestamps vary in format — use parseSQLiteTime().
  • Dashboard/detail auto-refresh every 60s via <meta http-equiv="refresh">. Geo-restricted to Hungary via nginx ingress annotation.

File encoding

All website/ HTML is UTF-8 with BOM — preserve it. Hub Go source is standard UTF-8 (no BOM).