admin 8e45d647fd Add bulk import feature and supported sites display
- Two-tab UI: single import (existing) and bulk import
- Bulk mode: paste multiple URLs, choose review-each or auto-import
- Review mode: edit each recipe before importing, option to switch
  to auto mid-way
- Auto mode: scrape and import all without manual review
- Tag option for auto mode: import all tags or none
- Progress table with per-recipe status tracking
- Import targets: Mealie, Tandoor, or both
- Supported sites shown on both tabs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 17:06:02 +01:00
2026-02-23 21:25:44 +01:00
2026-02-24 16:30:18 +01:00
2026-02-24 15:44:15 +01:00
2026-02-24 07:42:36 +01:00

Recipe Importer

Docker container for importing recipes from Hungarian websites into Mealie and Tandoor Recipes.

Problem: Mealie's and Tandoor's built-in URL import cannot parse ingredients and instructions from Hungarian recipe sites like mindmegette.hu.

Solution: This container provides a web UI that scrapes Hungarian recipe pages with site-specific parsers, lets you review and edit the extracted data, then pushes it to Mealie and/or Tandoor via their REST APIs.

Architecture

┌──────────────────────────────────────────────────────┐
│  recipe-importer container (:8000)                   │
│                                                      │
│  Flask + Gunicorn                                    │
│  ├── /settings       → Configure Mealie & Tandoor    │
│  ├── /import         → Paste URL, scrape, review     │
│  ├── /scrape         → AJAX: parse recipe HTML       │
│  ├── /send           → AJAX: push to Mealie API      │
│  ├── /send-tandoor   → AJAX: push to Tandoor API     │
│  ├── /tags           → AJAX: list tags from both     │
│  └── /health         → Health check                  │
│                                                      │
│  Modules:                                            │
│  ├── app/config.py   → JSON config persistence       │
│  ├── app/scraper.py  → Site-specific parsers         │
│  ├── app/mealie.py   → Mealie REST API client        │
│  └── app/tandoor.py  → Tandoor REST API client       │
└───────────────────┬──────────────┬───────────────────┘
                    │ HTTP         │ HTTP
                    ▼              ▼
         ┌──────────────┐  ┌───────────────┐
         │  Mealie       │  │  Tandoor      │
         │  POST /api/.. │  │  POST /api/.. │
         │  PUT /api/..  │  │  PUT /api/..  │
         └──────────────┘  └───────────────┘

Supported Sites

Site Ingredients Instructions Image Tags
mindmegette.hu Yes Yes Yes Yes
streetkitchen.hu Yes (with groups) Yes (ol/ul/paragraph) Yes Yes (from JSON-LD categories)
nosalty.hu Yes (with groups) Yes (with section headers) Yes Yes
Other sites Fallback (schema.org JSON-LD) Fallback (schema.org JSON-LD) Yes (og:image) Fallback (schema.org keywords)

Mindmegette.hu Parser

Extracts data from the Angular-rendered HTML:

  • Title: og:title meta tag, with | Mindmegette.hu suffix stripped
  • Description: og:description meta tag
  • Image: og:image meta tag
  • Ingredients: div.ingredientsdiv.ingredients-meta rows, each containing <strong> (qty), <span> (unit), <a class="ingredients-link"> (food), <small> (extra)
  • Ingredient groups: Multiple div.ingredients containers; group title via <strong class="ingredients-group">
  • Instructions: mindmegette-wysiwyg-boxol > li elements
  • Tags: <a class="tag"> elements inside div.desktop-wrapper

Streetkitchen.hu Parser

Extracts data from the Next.js-rendered HTML:

  • Title: og:title meta tag, with | Street Kitchen suffix stripped
  • Description: og:description meta tag
  • Image: og:image meta tag (CDN URL)
  • Ingredients: div.grid.grid-cols-1 container → div.my-2.flex rows; quantity+unit merged in first <div> (split via regex), food in <div class="font-bold">, optional extra in parenthesised <div>
  • Ingredient groups: <h5> headers inside section divs (e.g. "Az előfőzéshez", "A sütéshez")
  • Instructions: Three formats handled — <ol> ordered list, <ul> unordered list, or plain <p> paragraphs (with optional <strong> section headers)
  • Tags: recipeCategory field from JSON-LD @graphRecipe object (comma-separated)

Nosalty.hu Parser

Extracts data from the nosalty.hu recipe pages:

  • Title: og:title meta tag
  • Description: Story text from div#recipe-story > p (nosalty has no dedicated description field)
  • Image: og:image meta tag
  • Ingredients: Scoped to div#ingredients to avoid per-serving/nutrition duplicates; ul.m-list__list > li.m-list__item rows with <span> (qty+unit), <a class="a-link"> (food), optional trailing <span> (extra notes in parentheses)
  • Ingredient groups: <h3 class="m-list__title"> headers between <ul> lists
  • Instructions: div#selectol.m-list__list > li.m-list__item steps; optional <h4 class="m-list__title"> section headers
  • Tags: <a class="m-tags__tagItem"> inside div.p-recipe__attributeList

Generic Fallback Parser

For unsupported sites, attempts extraction via:

  1. Schema.org JSON-LD @type: Recipe blocks (recipeIngredient, recipeInstructions, keywords)
  2. OpenGraph meta tags for title, description, image

Adding a New Site Parser

  1. Create a parser function in app/scraper.py with the @_register("hostname") decorator
  2. The function receives (soup: BeautifulSoup, url: str) and returns the standard recipe dict
  3. The hostname substring is matched against the URL — first match wins, unmatched URLs use the generic fallback

Mealie API Integration

The importer uses the Mealie REST API:

  1. POST /api/recipes — create a stub recipe (returns slug)
  2. PATCH /api/recipes/{slug} — populate structured ingredients (with unit/food IDs), instructions, description, orgURL
  3. PUT /api/recipes/{slug}/image — upload the recipe image

Structured ingredients: The client resolves unit and food names to Mealie database IDs. Missing units/foods are created automatically via the API. Ingredient groups are supported via the title field on the first ingredient of each group.

Authentication uses a long-lived API token (Bearer header), created in Mealie at Profile → API Tokens.

Tandoor API Integration

The importer uses the Tandoor REST API:

  1. POST /api/recipe/ — create the full recipe in one call (name, description, source_url, steps with nested ingredients)
  2. PUT /api/recipe/{id}/image/ — upload the recipe image

Step-based ingredients: Tandoor nests ingredients inside steps. All ingredients are attached to the first step. Units and foods are auto-created by name (no separate resolution needed). Ingredient groups use is_header: true on a header entry.

Duplicate detection: Before import, searches Tandoor by title and checks the source_url field to detect already-imported recipes.

Authentication uses an API token (Bearer header), created in Tandoor at Settings → API Browser → Auth Token.

Tag Management

Tags are scraped from recipe pages and shown as editable chips in the UI. Users can:

  • Remove scraped tags that are irrelevant
  • Search existing tags from Mealie and Tandoor (fetched via GET /tags endpoint)
  • Add custom tags by typing and pressing Enter

Tags are sent to both services on import:

  • Mealie: Tags are created via POST /api/organizers/tags if they don't exist, then attached to the recipe in the PATCH payload
  • Tandoor: Keywords are auto-created by including keywords: [{"name": "..."}] in the recipe POST

Configuration

All settings are persisted to /data/config.json (mounted as a Docker volume).

Setting Description
mealie_url Full URL to Mealie instance (e.g. https://mealie.example.com)
mealie_api_key Mealie API token
tandoor_url Full URL to Tandoor instance (e.g. https://recipes.example.com)
tandoor_api_key Tandoor API token

Deployment

Docker Compose

services:
  recipe-importer:
    image: gitea.dooplex.hu/admin/recipe-importer:0.2.0
    container_name: recipe-importer
    restart: unless-stopped
    ports:
      - "8011:8000"
    volumes:
      - recipe-data:/data
    environment:
      - SECRET_KEY=change-me-in-production
      - MEALIE_INTERNAL_URL=http://mealie:9000
      - TANDOOR_INTERNAL_URL=http://tandoor:8080

volumes:
  recipe-data:

Environment Variables

Variable Default Description
SECRET_KEY recipe-importer-dev-key Flask session secret
DATA_DIR /data Persistent storage path
VERSION dev Shown in the UI navbar
MEALIE_INTERNAL_URL (empty) Docker-internal Mealie URL (e.g. http://mealie:9000) to avoid Cloudflare hairpin
TANDOOR_INTERNAL_URL (empty) Docker-internal Tandoor URL (e.g. http://tandoor:8080) to avoid Cloudflare hairpin

Building

On the build server (kisfenyo@192.168.0.180):

cd ~/build/recipe-importer
./build.sh X.X.X --push

Web UI

The UI is in Hungarian and uses a dark theme. The workflow is:

  1. Settings (/settings) — Configure Mealie and/or Tandoor connection (URL + API key), test each connection
  2. Import (/import) — Paste a recipe URL, click "Beolvasás" (Scrape)
  3. Review — Edit structured ingredients (4-column: quantity, unit, food, note), add/remove ingredient groups, edit instructions, manage tags (add/remove/search existing)
  4. Send — Click "Importálás Mealie-be" and/or "Importálás Tandoor-ba" to push to your configured services

Tech Stack

  • Runtime: Python 3.12 (slim)
  • Web framework: Flask 3.1 + Gunicorn
  • HTML parsing: BeautifulSoup 4 + lxml
  • HTTP client: requests
  • Container: ~60 MB image
S
Description
No description provided
Readme 47 MiB
Languages
HTML 54%
Python 43.2%
Shell 2.6%
Dockerfile 0.2%