Files
felhom.eu/CLAUDE.md
T
2026-06-08 20:06:11 +02:00

116 lines
7.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# CLAUDE.md — Project Instructions for Claude Code (`felhom.eu`)
> Read automatically by Claude Code when it works in this repo. Keep it updated as the project
> evolves. Cross-repo orientation (the felhom system, artifact taxonomy, access) lives in the
> workspace-root `e:\git\CLAUDE.md`; this file is `felhom.eu`-specific.
## Project overview
This repo (`felhom.eu`) contains:
- **Website** (`website/`) — static HTML at felhom.eu, served via k3s nginx + git-sync sidecar.
- **Hub** (`hub/`) — Go application (felhom-hub) — the **operator backend**, on k3s at `hub.felhom.eu`.
- **K8s manifests** (`manifests/`) — k3s deployment manifests for felhom-system services.
- **Architecture docs** (`documentation/`) — the **authoritative design home for the whole Felhom
system**: `architecture/01..05-*.md` (topology/trust, controller module map, host-agent, signing,
hub), `proxmox-platform.md`, and `tests/phase{0,1-2,3,4}-findings.md`. Read these before designing.
See `README.md` for full architecture/DNS/email/SEO docs. See `TASK.md` for the current task (if any).
## The Felhom system (so the hub's role is in context)
Felhom is **Proxmox-based**, with a locked **three-component model**:
- **Hub** (this repo, `hub/`) — operator backend. Authors operator *intent*; mirrors box *reality*;
holds **no data-plane role** and never connects inbound to a box.
- **Host agent** (repo `felhom-agent/`) — one per Proxmox host; owns all Proxmox interaction.
- **In-guest controller** (repo `felhom-controller/`) — one per customer LXC; Docker-only.
The hub is **not** just controller monitoring anymore. As of slice 3 it ingests **two report
streams**: the agent's host-domain report (`POST /api/v1/host-report`, the heartbeat) and the
legacy controller report (`POST /api/v1/report`). The controller path is **frozen and retires at the
slice-10 cutover** — do not modify it until then.
## Hub — current state (v0.7.x)
- **Tables:** `customer_configs`, `events`, `app_telemetry`/`app_log_issues`, the legacy `reports`,
and the slice-3 host-domain additions `hosts` / `guests` / `host_reports` (additive; columns
marked inert exist for the slice-10 cutover but are unused now).
- **Auth:** Bearer — global key, per-customer key (legacy), and per-host key (`GetHostByAPIKey`,
slice 3). Provisional global-key host mint at `POST /api/v1/admin/hosts`.
- **Monitoring:** the controller `StalenessChecker` (over `reports`) AND a sibling
`HostStalenessChecker` (over `host_reports`, emitting `host_stale`/`host_down`/`host_recovered`).
- Two-tier notifications (operator English / customer Hungarian, Resend, cooldowns); `events` audit.
## Code quality rules
- Always double-check generated code for bugs, logic issues, syntax errors.
- Handle edge cases without overcomplicating.
- Add debug capabilities (logging, verbose output).
- If you need more input or troubleshooting output, **ask first — don't guess**.
## Workflow & artifacts
The planning/architecture assistant ("project Claude", in claude.ai) writes specs and validates
pushes; **you (Claude Code) implement**. A file being open in the editor is NOT an instruction.
- **`TASK.md` / `TASK-*.md`** — a spec for you to implement. Then push, update `hub/CHANGELOG.md`,
and **append** a section to this repo's root `REPORT.md` (this repo appends; newest section last).
- **`RUNBOOK-*.md`** — an operational procedure. CC executes the steps it has access and capability for, including live validation on the demo nodes and the demo Proxmox host (CC has root@felhom-pve SSH + the felhom-agent token). A step is human-only only when it genuinely needs physical presence, a real-world decision, or credentials CC truly lacks — mark those steps HUMAN. Do not decline a whole procedure because it touches a live host or a privileged token. (Judgment still applies: confirm before irreversible ops on real customer data — but demo scratch guests are fair game.)
- Validation of a push against a spec's criteria is project Claude's job, not yours, unless asked.
## Tech stack (Hub)
- **Language:** Go 1.24+ (build server is go1.26.0).
- **Web:** stdlib `net/http` + `html/template`. **DB:** SQLite via `modernc.org/sqlite` (pure Go).
- **Auth:** bcrypt + Bearer tokens. **Deploy:** Docker on k3s (felhom-system ns).
- **Storage:** Longhorn PVC at `/data/` (SQLite DB). **Config:** YAML via ConfigMap at `/etc/felhom-hub/hub.yaml`.
## SSH access
Use the Windows OpenSSH binary (Git Bash's `/usr/bin/ssh` can't reach the Windows agent and fails
silently): `SSH=/c/Windows/System32/OpenSSH/ssh.exe`. All SSH commands below use `$SSH`.
| Host | IP | User | Role |
|------|----|------|------|
| Build server (k3s node) | 192.168.0.180 | kisfenyo | Build + push images, kubectl (needs `sudo`) |
| Demo Proxmox host | 192.168.0.162 | root@pam (SSH alias felhom-pve, root, no sudo) | pveum/pct + live Proxmox validation — available to CC |
## Build & deploy — Hub
After code changes to `hub/`, you **MUST** build, push, and deploy.
1. **Commit + push:** `cd /e/git/felhom.eu && git add -A && git commit -m "<msg>" && git push`
2. **Check running version:**
`$SSH kisfenyo@192.168.0.180 "sudo kubectl get deploy -n felhom-system hub -o jsonpath='{.spec.template.spec.containers[0].image}'"`
3. **Build + push image** (next version; build script lives on the build server, not in this repo):
`$SSH kisfenyo@192.168.0.180 "cd ~/build/felhom-hub && ./build.sh <NEW_VERSION> --push"`
(pulls latest from Gitea, builds with version+build-time ldflags into `main.Version`, pushes
`gitea.dooplex.hu/admin/felhom-hub:<VER>` and `:latest`.)
4. **Deploy:**
`$SSH kisfenyo@192.168.0.180 "sudo kubectl set image -n felhom-system deploy/hub hub=gitea.dooplex.hu/admin/felhom-hub:<NEW_VERSION>"`
5. **Verify:**
`$SSH kisfenyo@192.168.0.180 "sudo kubectl get pods -n felhom-system -l app=hub && sudo kubectl logs -n felhom-system -l app=hub --tail 10"`
(expect Running + `[INFO] felhom-hub <VERSION> starting`.)
> If the hub deployment is ArgoCD-managed (auto-sync), a manual `kubectl set image` may be reverted
> by ArgoCD drift-correction — confirm the deploy path before relying on step 4.
## Build & deploy — Website / Manifests
- **Website** auto-deploys via git-sync; just push to `main` (live in 12 min). Emergency edits:
FileBrowser at `https://files.felhom.eu`.
- **Manifests** are applied manually (git pull on the build server first if you pushed):
`$SSH kisfenyo@192.168.0.180 "sudo kubectl apply -f /home/kisfenyo/git/felhom.eu/manifests/<manifest>.yaml"`
## Key patterns
- Hub ingests **host-reports from agents** (`POST /api/v1/host-report`, Bearer per-host) and legacy
**controller reports** (`POST /api/v1/report`). The host-report `received_at` is the dead-man's-
switch liveness signal.
- Status logic: OK (report < 30m), WARN (30m1h or health=warn), DOWN (> 1h or health=fail).
- SQLite timestamps vary in format — use `parseSQLiteTime()`.
- Dashboard/detail auto-refresh every 60s via `<meta http-equiv="refresh">`. Geo-restricted to
Hungary via nginx ingress annotation.
## File encoding
All `website/` HTML is **UTF-8 with BOM** — preserve it. Hub Go source is standard UTF-8 (no BOM).