# Document Chain Analysis

Many official documents (insurance policies, contracts, terms of service) reference external documents by name — e.g. "gemäß Art. 6.6.2 ARB 2023" or "laut §5 AGB". The primary document often only contains deviations/supplements, not the full text of those references.

## Workflow

1. **Extract the primary document** (pymupdf or web_extract)
2. **Scan for external references** — look for patterns like:
   - "gemäß Art. X.Y.Z [ABBREVIATION]"
   - "Art. [Number] [ABBREVIATION] [Year]"
   - "laut § [Number] [Law/Regulation]"
   - "siehe Anhang / see appendix"
3. **Identify the referenced document** — map abbreviations to full names (e.g., ARB = Allgemeine Bedingungen für die Rechtsschutz-Versicherung, ERB = Ergänzende Bedingungen)
4. **Find the external document** — check:
   - Same website domain as primary document source
   - `/medien/pdf/` or `/downloads/` paths
   - Direct search: `curl -sL <domain> | grep -oiP 'https?://[^"<> ]*\\.pdf'`
   - Google/DDG if not on same domain
5. **Extract and analyze** — pull the relevant articles/clauses, not the full document
   - Use targeted extraction: `python3 -c "doc = pymupdf.open(f); [print(p.get_text()) for p in doc if 'keyword' in p.get_text()]"` 
   - Only full-extract pages that contain the target reference

## Example: Insurance Policy (AT/DE)

- **Primary**: Polizze/Versicherungsschein (policy — summary of coverage)
- **Referenced**: ARB (Allgemeine Bedingungen), ERB (Ergänzende Bedingungen), specific Klauseln (KL-numbers)
- The policy states coverage scope + deviations from ARB/ERB; the actual detail of coverage limits, exclusions, and obligations lives in ARB/ERB
- ARAG AT publishes ARB/ERB PDFs at `https://www.arag.at/medien/pdf/`
- Individual Klauseln (KL-numbers) are usually in the policy's Beilage/Anhang pages

## Anti-Pattern

Do NOT answer detailed coverage questions from the policy alone if it references external terms. The policy is a summary — the ARB/ERB text is binding and contains limits, exclusions, and procedural rules not repeated in the policy.
