# Multi-Source Web Research via Camofox

Pattern for researching a topic comprehensively across multiple sites simultaneously — used for gift ideas, product comparisons, travel planning, etc.

## Workflow

### Phase 1: Parallel Search Tabs
Open 4–6 Google search tabs simultaneously with different query angles (different keywords, languages, facets):

```bash
for name_url in "query1|url1" "query2|url2"; do
  name="${name_url%|*}"
  url="${name_url#*|}"
  curl -sS -X POST http://127.0.0.1:9377/tabs \
    -H "Content-Type: application/json" \
    -d "{\"userId\":\"kiwi\",\"sessionKey\":\"default\",\"url\":\"$url\"}"
done
```

Save tab IDs for later use in a temp file (`/tmp/hermes_gifts/tabs.json`).

### Phase 2: Dismiss Cookie Banners
Google and many EU sites show cookie consent overlays. Dismiss with:

```bash
curl -sS -X POST "http://127.0.0.1:9377/tabs/$TAB/click" \
  -H "Content-Type: application/json" \
  -d '{"userId":"kiwi","sessionKey":"default","selector":"button:has-text(\"Alle akzeptieren\")"}'
```

Wait 2–3 seconds after clicking, then screenshot.

### Phase 3: Extract Search Results
- **Google results pages**: Screenshot + vision_analyze (links endpoint only returns ads/shopping, not organic results)
- **Known-good content sites**: Use Jina.ai Reader directly (faster, cleaner markdown, no browser needed)
- **Shopping-heavy queries**: Google increasingly shows ads over organic results. If vision shows mainly shopping ads, switch to Jina on known gift-guide sites instead.

### Phase 4: Jina.ai for Deep Content
For known content sites (The Knot, Vogue, Geschenkidee.de, etc.), skip the browser entirely:

```bash
curl -sL --max-time 20 "https://r.jina.ai/https://target-site.com/article" \
  -H "Accept: text/plain" -o /tmp/content.txt
```

### Phase 5: Aggregate & Curate
Read Jina outputs with read_file, cross-reference ideas across sources, produce curated list.

## Pitfalls

- **Jina 404/empty responses**: Some sites return 404 even when valid. Try alternate URLs or fall back to browser+screenshot.
- **Google ad dominance**: "Gesponsert" (sponsored) results drown out organic. Try niche queries or skip Google for known content sites.
- **Paywalls/blockers**: Brides.com, Vogue, and some other sites may return "Just a moment..." (Cloudflare challenge) via Jina. Skip those, try alternatives.
- **EU medical/legal sites (Cookiebot)**: German medical sites (herzstiftung.de, aerzteblatt.de, dgk.org) and many .de domains block automated access with Cookiebot consent dialogs. jina.ai returns 403. Camofox `/click` on consent buttons usually times out. **Fallback:** screenshot + vision_analyze. See `references/research-patterns.md` Pattern 5 for the full chain.
- **Parallel tabs timeout**: Opening too many tabs at once can be slow. 5–6 is the sweet spot.
- **Tab ID tracking**: Always save tab IDs. Camofox doesn't have named tabs, only UUIDs.
- **Don't guess when blocked**: If all fallbacks fail and a source cannot be read, state that clearly. Never pivot to guessing about the topic from metadata, a different domain, or general knowledge presented as if from the source. "I couldn't access this source" is always better than an invented answer.

## Proven Source Sites for Gift Research

| Site | Language | Quality | Notes |
|------|----------|---------|-------|
| theknot.com | EN | ★★★★★ | 45+ curated gifts, editor picks |
| geschenkidee.de | DE | ★★★★☆ | Experience gifts, German market |
| vogue.com | EN | ★★★★☆ | Luxury focus, may paywall |
| uncommon goods | EN | ★★★★☆ | Unique/personalized items |
| cartida.de | DE | ★★★☆☆ | Star maps, personalized prints |
| librio.com | DE | ★★★☆☆ | Personalized books |
