# German Medical/Government Site Extraction via Camofox

Session-specific notes from researching cardiac catheter ablation (Herzablation) for the user's father, 16.06.2026.

## Sites that resist automated extraction

| Site | Blocker | jina.ai | Camofox /click | Notes |
|------|---------|---------|----------------|-------|
| herzstiftung.de | Cookiebot consent wall | 403 | Timeout | Banner dominates screenshots |
| praktischarzt.de | Cookiebot + ad overlay | 403 | Timeout | Page renders but content hidden behind consent |
| aerzteblatt.de | Consent wall / paywall | 403 | Renders consent page only | Requires "Einwilligen und weiter" |
| dgk.org | Consent/tracking wall | 403 | Homepage loads, deep content blocked | Guidelines section not directly reachable |
| herzmedizin.de | Heavy cookie wall | 403 | Empty links | |

## What works

1. **Wikipedia (DE/EN)** loads cleanly in Camofox and gives readable screenshots.
   - `https://de.wikipedia.org/wiki/Katheterablation`
   - Cookie banner is present but article text is still visible above/below it.
   - Vision extraction of screenshots is reliable enough for general patient education.

2. **DuckDuckGo search results** via Camofox `/links` are the fastest way to discover target URLs without fighting Google CAPTCHA.

3. **Fallback chain for blocked medical sites:**
   - Try `r.jina.ai/http://URL` first
   - If 403, open URL in Camofox
   - Wait, attempt cookie dismissal with common selectors (`Zustimmen`, `Alle akzeptieren`, `Cookies zulassen`, `Einwilligen und weiter`)
   - If dismissal times out or fails, take screenshot anyway
   - Use `vision_analyze` to extract whatever article text is visible
   - If banner dominates, the site is effectively unreadable by automation — tell the user explicitly and recommend official patient info sources instead of guessing.

## Pitfalls

- Do not trust vision-extracted medical text for precise numbers or legal/medical decisions. It paraphrases and can hallucinate exact statistics. Use it only for general orientation and always cite the uncertainty.
- Do not keep retrying the same cookie-dismiss selector — if 3 common selectors fail, the site is actively resisting automation.
- Do not present blocked-site failures as "the article does not exist." The wall exists; the content is simply not accessible to the agent.

## Recommended approach for medical research

1. Start with **Wikipedia** for baseline terminology and risks.
2. Use **patient education portals** (Herzstiftung, BZgA, Gesundheit.gv.at) but be ready to tell the user when the site blocks automation.
3. For authoritative clinical guidance, direct the user to ask the treating clinic for written patient info or the official DGK/ESC guideline PDFs.
4. Always end with a practical checklist the user can take to the doctor, rather than a vague literature summary.
