v0.8.0: gastrohobbi.hu parser, fix ingredient fraction parsing
Add gastrohobbi.hu parser (WPBakery page builder layout): ingredients with groups, instructions with embedded lists, tags from JSON-LD articleSection, prep time extraction. Fix ingredient line parser: fractions like "1/2" no longer split due to regex backtracking, en-dash ranges normalized, unicode fractions (½¼¾) recognized as quantity start across all parsers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -45,6 +45,7 @@ Docker container for importing recipes from Hungarian websites into [Mealie](htt
|
||||
| nosalty.hu | Yes (with groups) | Yes (with section headers) | Yes | Yes |
|
||||
| sobors.hu | Yes (with groups) | Yes (with section headers, follows linked recipes) | Yes | Yes |
|
||||
| kiskegyed.hu | Yes (with groups, dual measurements) | Yes (follows sobors.hu links) | Yes | Yes |
|
||||
| gastrohobbi.hu | Yes (with groups) | Yes (with embedded lists) | Yes | Yes (from JSON-LD categories) |
|
||||
| *Other sites* | Fallback (schema.org JSON-LD) | Fallback (schema.org JSON-LD) | Yes (og:image) | Fallback (schema.org keywords) |
|
||||
|
||||
### Mindmegette.hu Parser
|
||||
@@ -111,6 +112,19 @@ Extracts data from kiskegyed.hu recipe pages:
|
||||
- **Cross-site links**: Pages linking to sobors.hu are followed to get the full recipe
|
||||
- **Tags**: `section.tags > a > span` (# prefix stripped, "recept" filtered)
|
||||
|
||||
### GastroHobbi.hu Parser
|
||||
|
||||
Extracts data from gastrohobbi.hu recipe pages (WPBakery page builder layout):
|
||||
|
||||
- **Title**: `h1.mpcth-post-title > span.mpcth-color-main-border`
|
||||
- **Description**: First `<p>` in the first `wpb_text_column` before the recipe columns; falls back to `og:description`
|
||||
- **Image**: `og:image` meta tag
|
||||
- **Ingredients**: Finds `h3` containing "Hozzávalók:", then walks sibling `<ul>` elements; items from `li > p` or `li` directly
|
||||
- **Ingredient groups**: Plain `<h3>` elements between ingredient lists (e.g. "A csipetkéhez:")
|
||||
- **Instructions**: `<p>` elements following the "Elkészítés:" `h3`; embedded `<ul>` items rendered as bullet points
|
||||
- **Prep time**: Extracted from "Elkészítési idő:" `h3`, appended to description
|
||||
- **Tags**: JSON-LD `Article.articleSection` array (site uses Article schema, not Recipe)
|
||||
|
||||
### Generic Fallback Parser
|
||||
|
||||
For unsupported sites, attempts extraction via:
|
||||
|
||||
Reference in New Issue
Block a user