fix: group title on first ingredient + multi-site parser registry

- Fix ingredient groups creating empty entries in Mealie: set title
  field on the first ingredient after the group marker instead
- Refactor scraper with @_register decorator for URL-based site dispatch
- Update README with structured ingredients, groups, MEALIE_INTERNAL_URL

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-24 08:51:14 +01:00
parent c235d5caa7
commit a27b322409
3 changed files with 52 additions and 22 deletions
+14 -4
View File
@@ -48,7 +48,8 @@ Extracts data from the Angular-rendered HTML:
- **Title**: `og:title` meta tag, with ` | Mindmegette.hu` suffix stripped
- **Description**: `og:description` meta tag
- **Image**: `og:image` meta tag
- **Ingredients**: `div.ingredients``div.ingredients-meta` rows, each containing `span.quantity`, `span.unit`, `span.name`, `span.extra`
- **Ingredients**: `div.ingredients``div.ingredients-meta` rows, each containing `<strong>` (qty), `<span>` (unit), `<a class="ingredients-link">` (food), `<small>` (extra)
- **Ingredient groups**: Multiple `div.ingredients` containers; group title via `<strong class="ingredients-group">`
- **Instructions**: `mindmegette-wysiwyg-box``ol > li` elements
### Generic Fallback Parser
@@ -57,14 +58,22 @@ For unsupported sites, attempts extraction via:
1. Schema.org JSON-LD `@type: Recipe` blocks (`recipeIngredient`, `recipeInstructions`)
2. OpenGraph meta tags for title, description, image
### Adding a New Site Parser
1. Create a parser function in `app/scraper.py` with the `@_register("hostname")` decorator
2. The function receives `(soup: BeautifulSoup, url: str)` and returns the standard recipe dict
3. The hostname substring is matched against the URL — first match wins, unmatched URLs use the generic fallback
## Mealie API Integration
The importer uses the Mealie REST API:
1. **POST** `/api/recipes` — create a stub recipe (returns slug)
2. **PATCH** `/api/recipes/{slug}` — populate ingredients, instructions, description, orgURL
2. **PATCH** `/api/recipes/{slug}` — populate structured ingredients (with unit/food IDs), instructions, description, orgURL
3. **PUT** `/api/recipes/{slug}/image` — upload the recipe image
**Structured ingredients**: The client resolves unit and food names to Mealie database IDs. Missing units/foods are created automatically via the API. Ingredient groups are supported via the `title` field on the first ingredient of each group.
Authentication uses a long-lived API token (Bearer header), created in Mealie at *Profile → API Tokens*.
## Configuration
@@ -83,7 +92,7 @@ All settings are persisted to `/data/config.json` (mounted as a Docker volume).
```yaml
services:
recipe-importer:
image: gitea.dooplex.hu/admin/recipe-importer:0.1.0
image: gitea.dooplex.hu/admin/recipe-importer:0.1.7
container_name: recipe-importer
restart: unless-stopped
ports:
@@ -104,6 +113,7 @@ volumes:
| `SECRET_KEY` | `recipe-importer-dev-key` | Flask session secret |
| `DATA_DIR` | `/data` | Persistent storage path |
| `VERSION` | `dev` | Shown in the UI navbar |
| `MEALIE_INTERNAL_URL` | *(empty)* | Docker-internal Mealie URL (e.g. `http://mealie:9000`) to avoid Cloudflare hairpin |
## Building
@@ -120,7 +130,7 @@ The UI is in Hungarian and uses a dark theme. The workflow is:
1. **Settings** (`/settings`) — Enter Mealie URL and API key, test connection
2. **Import** (`/import`) — Paste a recipe URL, click "Beolvasás" (Scrape)
3. **Review** — Edit the title, description, ingredients, instructions in the preview
3. **Review** — Edit structured ingredients (4-column: quantity, unit, food, note), add/remove ingredient groups, edit instructions
4. **Send** — Click "Importálás Mealie-be" to push to Mealie
## Tech Stack