# Camofox Research Patterns

Condensed recipes for common web-research tasks using the local Camofox container plus lightweight fallbacks.

## Pattern 1: YouTube playlist / channel extraction

Goal: find the latest episode, video list, or metadata without logging in.

1. Create a tab with the playlist URL:
   ```bash
   curl -s -X POST http://127.0.0.1:9377/tabs \
     -H "Content-Type: application/json" \
     -d '{"userId":"kiwi","sessionKey":"yt-research","url":"https://www.youtube.com/playlist?list=PLAYLIST_ID"}'
   ```
2. After 3–4 s, call the `/links` endpoint. YouTube renders playlist entries as links with index and title.
   ```bash
   curl -s "http://127.0.0.1:9377/tabs/$TAB/links?userId=kiwi&sessionKey=yt-research" | jq
   ```
3. Look for the highest `index=N` link. The title usually includes episode number and duration.
4. Navigate to the last video URL to confirm length, title, etc.
5. YouTube hides exact upload dates behind a login wall. If the date is required, check the public video page while logged in, or cross-reference social media / fandom wikis.

## Pattern 2: Product page research (Ikea, Amazon, etc.)

Goal: extract product specs, care instructions, and official claims.

1. Navigate to the product URL with Camofox.
2. Take a screenshot for visual verification.
3. Use `r.jina.ai/<URL>` to pull clean text from the rendered SPA:
   ```bash
   curl -sL "https://r.jina.ai/<URL>" -H "Accept: text/plain"
   ```
   This works even when the `/links` endpoint only returns navigation links because the page is mostly SPA-rendered.
4. Quote the official text directly in the answer and note the source.

## Pattern 3: Help-center / about-page fact extraction

Goal: answer "Where does X get its data?"-style questions authoritatively.

1. Search the target site for an about/help page (e.g. `/help/woher-stammen-die-daten`).
2. Use `r.jina.ai/<URL>` to get the full article text.
3. Cite the exact phrasing and source URL.
4. If the site has no direct help article, fall back to `/links` + screenshot analysis.

## Pattern 4: SPA fallback chain

When `curl` returns only an empty `<div id="root">`:

1. Try `r.jina.ai/<URL>` first — it often renders JS and returns markdown.
2. If jina.ai is blocked or insufficient, open the page in Camofox, wait, and use `/links`.
3. If links are too sparse, take a screenshot and analyze with `vision_analyze`.
4. Only use `/evaluate` if `CAMOFOX_API_KEY` is set and the endpoint is not loopback-only; otherwise prefer `/type` + `/click` or jina.ai.

## Pattern 5: Cookie-walled EU sites (medical, legal, news)

Goal: extract content from sites that block access behind Cookiebot / consent dialogs (common on .de medical, legal, and news sites).

1. Navigate to the target URL with Camofox.
2. Try clicking the consent button. Common selectors that may work:
   - `button:has-text('Alle akzeptieren')`
   - `button:has-text('Zustimmen')`
   - `button:has-text('Cookies zulassen')`
   - `#CybotCookiebotDialogBodyButtonAccept`
   **Pitfall:** Many of these time out (the click endpoint has a short timeout and consent dialogs often use JS event handlers that don't fire on programmatic clicks). Don't spend more than 2-3 attempts on this.
3. Take a screenshot immediately (even with the banner visible — sometimes content renders behind it).
4. Use `vision_analyze` on the screenshot. The vision model can often read text that renders behind semi-transparent or partially-obscuring banners.
5. If the banner fully blocks content, try jina.ai as a first resort (faster, cleaner), then Camofox screenshot + vision as fallback.
6. **Do not try `/evaluate`** unless `CAMOFOX_API_KEY` is set — it's blocked without an API key.

**Proven fallback chain for EU medical sites:**
1. jina.ai → often 403 on .de medical sites
2. Camofox navigate + screenshot → banner may block
3. Camofox click consent (1-2 attempts max) + screenshot
4. vision_analyze on best screenshot → most reliable final fallback

## Pitfalls

- YouTube exact dates require login; quote only what is public.
- Some SPAs (e.g. YouTube watch page without login) return bot-check walls. Use playlists and jina.ai to bypass where possible.
- `/links` returns `<a>` tags only; buttons inside forms (submit, load more) are not listed.
- **Cookie consent dialogs (Cookiebot, etc.) on EU sites rarely respond to programmatic `/click`.** Don't burn time retrying — screenshot + vision_analyze is the reliable fallback.
- **If a source cannot be read after exhausting the fallback chain, state that clearly.** Never pivot to guessing about the topic from metadata or a different domain. "I couldn't access this source" is always better than an invented answer.
- **When the user gives you a link and asks you to read it, do NOT guess the content from the URL, title, or surrounding context.** If the link is unreadable (bot protection, paywall, client-side rendering), say so explicitly and ask the user to paste the content. Inventing a summary from metadata is worse than admitting you can't access it — the user may have the actual text and will catch the fabrication immediately.
