admin 0ec9ce0c6d v0.8.0: gastrohobbi.hu parser, fix ingredient fraction parsing
Add gastrohobbi.hu parser (WPBakery page builder layout): ingredients
with groups, instructions with embedded lists, tags from JSON-LD
articleSection, prep time extraction.

Fix ingredient line parser: fractions like "1/2" no longer split due to
regex backtracking, en-dash ranges normalized, unicode fractions (½¼¾)
recognized as quantity start across all parsers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 19:17:13 +01:00
2026-02-24 07:42:36 +01:00

Recipe Importer

Docker container for importing recipes from Hungarian websites into Mealie and Tandoor Recipes.

Problem: Mealie's and Tandoor's built-in URL import cannot parse ingredients and instructions from Hungarian recipe sites like mindmegette.hu.

Solution: This container provides a web UI that scrapes Hungarian recipe pages with site-specific parsers, lets you review and edit the extracted data, then pushes it to Mealie and/or Tandoor via their REST APIs. Supports both single recipe import and bulk import of multiple URLs.

Architecture

┌──────────────────────────────────────────────────────┐
│  recipe-importer container (:8000)                   │
│                                                      │
│  Flask + Gunicorn                                    │
│  ├── /settings       → Configure Mealie & Tandoor    │
│  ├── /import         → Single or bulk import          │
│  ├── /scrape         → AJAX: parse recipe HTML       │
│  ├── /send           → AJAX: push to Mealie API      │
│  ├── /send-tandoor   → AJAX: push to Tandoor API     │
│  ├── /tags           → AJAX: list tags from both     │
│  └── /health         → Health check                  │
│                                                      │
│  Modules:                                            │
│  ├── app/config.py   → JSON config persistence       │
│  ├── app/scraper.py  → Site-specific parsers         │
│  ├── app/mealie.py   → Mealie REST API client        │
│  └── app/tandoor.py  → Tandoor REST API client       │
└───────────────────┬──────────────┬───────────────────┘
                    │ HTTP         │ HTTP
                    ▼              ▼
         ┌──────────────┐  ┌───────────────┐
         │  Mealie       │  │  Tandoor      │
         │  POST /api/.. │  │  POST /api/.. │
         │  PUT /api/..  │  │  PUT /api/..  │
         └──────────────┘  └───────────────┘

Supported Sites

Site Ingredients Instructions Image Tags
mindmegette.hu Yes Yes Yes Yes
streetkitchen.hu Yes (with groups) Yes (ol/ul/paragraph) Yes Yes (from JSON-LD categories)
nosalty.hu Yes (with groups) Yes (with section headers) Yes Yes
sobors.hu Yes (with groups) Yes (with section headers, follows linked recipes) Yes Yes
kiskegyed.hu Yes (with groups, dual measurements) Yes (follows sobors.hu links) Yes Yes
gastrohobbi.hu Yes (with groups) Yes (with embedded lists) Yes Yes (from JSON-LD categories)
Other sites Fallback (schema.org JSON-LD) Fallback (schema.org JSON-LD) Yes (og:image) Fallback (schema.org keywords)

Mindmegette.hu Parser

Extracts data from the Angular-rendered HTML:

  • Title: og:title meta tag, with | Mindmegette.hu suffix stripped
  • Description: og:description meta tag
  • Image: og:image meta tag
  • Ingredients: div.ingredientsdiv.ingredients-meta rows, each containing <strong> (qty), <span> (unit), <a class="ingredients-link"> (food), <small> (extra)
  • Ingredient groups: Multiple div.ingredients containers; group title via <strong class="ingredients-group">
  • Instructions: mindmegette-wysiwyg-boxol > li elements
  • Tags: <a class="tag"> elements inside div.desktop-wrapper

Streetkitchen.hu Parser

Extracts data from the Next.js-rendered HTML:

  • Title: og:title meta tag, with | Street Kitchen suffix stripped
  • Description: og:description meta tag
  • Image: og:image meta tag (CDN URL)
  • Ingredients: div.grid.grid-cols-1 container → div.my-2.flex rows; quantity+unit merged in first <div> (split via regex), food in <div class="font-bold">, optional extra in parenthesised <div>
  • Ingredient groups: <h5> headers inside section divs (e.g. "Az előfőzéshez", "A sütéshez")
  • Instructions: Three formats handled — <ol> ordered list, <ul> unordered list, or plain <p> paragraphs (with optional <strong> section headers)
  • Tags: recipeCategory field from JSON-LD @graphRecipe object (comma-separated)

Nosalty.hu Parser

Extracts data from the nosalty.hu recipe pages:

  • Title: og:title meta tag
  • Description: Story text from div#recipe-story > p (nosalty has no dedicated description field)
  • Image: og:image meta tag
  • Ingredients: Scoped to div#ingredients to avoid per-serving/nutrition duplicates; ul.m-list__list > li.m-list__item rows with <span> (qty+unit), <a class="a-link"> (food), optional trailing <span> (extra notes in parentheses)
  • Ingredient groups: <h3 class="m-list__title"> headers between <ul> lists
  • Instructions: div#selectol.m-list__list > li.m-list__item steps; optional <h4 class="m-list__title"> section headers
  • Tags: <a class="m-tags__tagItem"> inside div.p-recipe__attributeList

Sobors.hu Parser

Extracts data from the sobors.hu recipe pages:

  • Title: h3.recept_nev
  • Description: og:description meta tag
  • Image: og:image meta tag
  • Ingredients: div.hozzavalok-containersection elements with ul > li, each containing span.mennyiseg (qty), span.mertekegyseg (unit), span.hozzavalo (food)
  • Ingredient groups: section > h4 headers (e.g., "A szószhoz:", "A húsgolyókhoz:")
  • Instructions: div.recept_leiras<p> tags, with <h3><strong> section headers
  • Linked recipes: Some pages link to another site (e.g. kiskegyed.hu) instead of showing full instructions. The parser detects external links in the instruction area and follows them to scrape the real recipe content.
  • Article-style ingredient fallback: Pages without the structured div.hozzavalok-container are parsed from article-body h4 + ul > li plain text
  • Tags: div.cikk-cimkek > ul.cikk-cimkek-list > li > a (skips generic "Receptek" category)

Kiskegyed.hu Parser

Extracts data from kiskegyed.hu recipe pages:

  • Title: h2 element (with - Kiskegyed suffix stripped)
  • Description: section#leadText > p
  • Image: og:image meta tag
  • Ingredients: div.recipe_ingredientsul.list > li items; group headers from <p> or <p><em> elements
  • Ingredient groups: <p>Name:</p> or <p><em>A ...hez</em></p> format
  • Dual measurements: "3 ek (70 g) búzafinomliszt" → qty: 3, unit: ek, food: búzafinomliszt, extra: 70 g
  • Instructions: div.recipe_preparation > ol > li > div
  • Cross-site links: Pages linking to sobors.hu are followed to get the full recipe
  • Tags: section.tags > a > span (# prefix stripped, "recept" filtered)

GastroHobbi.hu Parser

Extracts data from gastrohobbi.hu recipe pages (WPBakery page builder layout):

  • Title: h1.mpcth-post-title > span.mpcth-color-main-border
  • Description: First <p> in the first wpb_text_column before the recipe columns; falls back to og:description
  • Image: og:image meta tag
  • Ingredients: Finds h3 containing "Hozzávalók:", then walks sibling <ul> elements; items from li > p or li directly
  • Ingredient groups: Plain <h3> elements between ingredient lists (e.g. "A csipetkéhez:")
  • Instructions: <p> elements following the "Elkészítés:" h3; embedded <ul> items rendered as bullet points
  • Prep time: Extracted from "Elkészítési idő:" h3, appended to description
  • Tags: JSON-LD Article.articleSection array (site uses Article schema, not Recipe)

Generic Fallback Parser

For unsupported sites, attempts extraction via:

  1. Schema.org JSON-LD @type: Recipe blocks (recipeIngredient, recipeInstructions, keywords)
  2. OpenGraph meta tags for title, description, image

Adding a New Site Parser

  1. Create a parser function in app/scraper.py with the @_register("hostname") decorator
  2. The function receives (soup: BeautifulSoup, url: str) and returns the standard recipe dict
  3. The hostname substring is matched against the URL — first match wins, unmatched URLs use the generic fallback

Bulk Import

The "Tömeges importálás" (Bulk Import) tab allows importing multiple recipes at once:

  1. Paste one URL per line in the textarea
  2. Choose a mode:
    • Review mode — edit each recipe before importing, with option to switch to auto mid-way
    • Auto mode — scrape and import all recipes without manual review (with tag option: import all tags or none)
  3. Select target: Mealie, Tandoor, or both
  4. Progress table tracks per-recipe status (pending, scraping, importing, done, error, skipped, duplicate)

All processing is done client-side, calling the existing /scrape and /send / /send-tandoor endpoints sequentially.

Mealie API Integration

The importer uses the Mealie REST API:

  1. POST /api/recipes — create a stub recipe (returns slug)
  2. PATCH /api/recipes/{slug} — populate structured ingredients (with unit/food IDs), instructions, description, orgURL
  3. PUT /api/recipes/{slug}/image — upload the recipe image

Structured ingredients: The client resolves unit and food names to Mealie database IDs. Missing units/foods are created automatically via the API. Ingredient groups are supported via the title field on the first ingredient of each group.

Authentication uses a long-lived API token (Bearer header), created in Mealie at Profile → API Tokens.

Tandoor API Integration

The importer uses the Tandoor REST API:

  1. POST /api/recipe/ — create the full recipe in one call (name, description, source_url, steps with nested ingredients)
  2. PUT /api/recipe/{id}/image/ — upload the recipe image

Step-based ingredients: Tandoor nests ingredients inside steps. All ingredients are attached to the first step. Units and foods are auto-created by name (no separate resolution needed). Ingredient groups use is_header: true on a header entry.

Duplicate detection: Before import, searches Tandoor by title and checks the source_url field to detect already-imported recipes.

Authentication uses an API token (Bearer header), created in Tandoor at Settings → API Browser → Auth Token.

Tag Management

Tags are scraped from recipe pages and shown as editable chips in the UI. Users can:

  • Remove scraped tags that are irrelevant
  • Search existing tags from Mealie and Tandoor (fetched via GET /tags endpoint)
  • Add custom tags by typing and pressing Enter

Tags are sent to both services on import:

  • Mealie: Tags are created via POST /api/organizers/tags if they don't exist, then attached to the recipe in the PATCH payload
  • Tandoor: Keywords are auto-created by including keywords: [{"name": "..."}] in the recipe POST

Configuration

All settings are persisted to /data/config.json (mounted as a Docker volume).

Setting Description
mealie_url Full URL to Mealie instance (e.g. https://mealie.example.com)
mealie_api_key Mealie API token
tandoor_url Full URL to Tandoor instance (e.g. https://recipes.example.com)
tandoor_api_key Tandoor API token

Deployment

Docker Compose

services:
  recipe-importer:
    image: gitea.dooplex.hu/admin/recipe-importer:0.2.0
    container_name: recipe-importer
    restart: unless-stopped
    ports:
      - "8011:8000"
    volumes:
      - recipe-data:/data
    environment:
      - SECRET_KEY=change-me-in-production
      - MEALIE_INTERNAL_URL=http://mealie:9000
      - TANDOOR_INTERNAL_URL=http://tandoor:8080

volumes:
  recipe-data:

Environment Variables

Variable Default Description
SECRET_KEY recipe-importer-dev-key Flask session secret
DATA_DIR /data Persistent storage path
VERSION dev Shown in the UI navbar
MEALIE_INTERNAL_URL (empty) Docker-internal Mealie URL (e.g. http://mealie:9000) to avoid Cloudflare hairpin
TANDOOR_INTERNAL_URL (empty) Docker-internal Tandoor URL (e.g. http://tandoor:8080) to avoid Cloudflare hairpin

Building

On the build server (kisfenyo@192.168.0.180):

cd ~/build/recipe-importer
./build.sh X.X.X --push

Web UI

The UI is in Hungarian and uses a dark theme. The workflow is:

  1. Settings (/settings) — Configure Mealie and/or Tandoor connection (URL + API key), test each connection
  2. Import (/import) — Paste a recipe URL, click "Beolvasás" (Scrape)
  3. Review — Edit structured ingredients (4-column: quantity, unit, food, note), add/remove ingredient groups, edit instructions, manage tags (add/remove/search existing)
  4. Send — Click "Importálás Mealie-be" and/or "Importálás Tandoor-ba" to push to your configured services

Tech Stack

  • Runtime: Python 3.12 (slim)
  • Web framework: Flask 3.1 + Gunicorn
  • HTML parsing: BeautifulSoup 4 + lxml
  • HTTP client: requests
  • Container: ~60 MB image
S
Description
No description provided
Readme 47 MiB
Languages
HTML 54%
Python 43.2%
Shell 2.6%
Dockerfile 0.2%