# Gmail Raw Extraction — Multipart/HTML/Attachment Recovery

When `$GAPI gmail get MESSAGE_ID` returns `"body": ""`, the message is likely a complex multipart/mixed email (HTML template, inline images, PDF attachments). Drop to raw Python extraction.

## Full pattern for multipart with PDF attachment

```python
import sys, base64
sys.path.insert(0, "/DATA/.hermes/skills/productivity/google-workspace/scripts")
from google_api import get_credentials
from googleapiclient.discovery import build
from email import message_from_bytes

creds = get_credentials()
service = build("gmail", "v1", credentials=creds)

MESSAGE_ID = "19e745905275daaa"  # Gotogate booking confirmation

msg = service.users().messages().get(
    userId="me", id=MESSAGE_ID, format="raw"
).execute()

raw_bytes = base64.urlsafe_b64decode(msg["raw"])
mime_msg = message_from_bytes(raw_bytes)

# Walk parts
for part in mime_msg.walk():
    ct = part.get_content_type()
    if ct == "text/plain":
        payload = part.get_payload(decode=True)
        text = payload.decode("utf-8", errors="replace")
        # For quoted-printable HTML-in-text-plain artifacts (Gotogate):
        # The text/plain part may actually contain HTML. Extract visible
        # text from it using html.parse or BeautifulSoup.
        print(text[:2000])
    elif ct == "application/pdf":
        filename = part.get_filename()
        pdf_data = part.get_payload(decode=True)
        pdf_path = f"/tmp/{filename}"
        with open(pdf_path, "wb") as f:
            f.write(pdf_data)
        # Extract with PyMuPDF
        import fitz
        doc = fitz.open(pdf_path)
        for page in doc:
            print(page.get_text())
        doc.close()
```

## Alternative: Use attachmentId from full format

```python
msg = service.users().messages().get(
    userId="me", id=MESSAGE_ID, format="full"
).execute()

for part in msg.get("payload", {}).get("parts", []):
    filename = part.get("filename")
    att_id = part.get("body", {}).get("attachmentId")
    if att_id:
        att = service.users().messages().attachments().get(
            userId="me", messageId=MESSAGE_ID, id=att_id
        ).execute()
        data = base64.urlsafe_b64decode(att["data"])
        with open(f"/tmp/{filename}", "wb") as f:
            f.write(data)
```

## Libraries needed

- `googleapiclient` (already in the skill's deps)
- `fitz` (PyMuPDF — `pip install PyMuPDF`) for PDF text extraction
- `pdfplumber` as fallback if fitz unavailable