# Reddit-Inhalte via JSON-API abrufen

Reddit blockiert oft direkte Scraping-Anfragen mit Cloudflare/JS-Challenges. Der `.json` Endpunkt liefert sauberes JSON und umgeht viele Blockaden.

## URL-Pattern

Normale Post-URL:
```
https://www.reddit.com/r/SUBREDDIT/comments/POST_ID/TITLE/
```

JSON-Version:
```
https://www.reddit.com/r/SUBREDDIT/comments/POST_ID/TITLE.json
```

## Beispiel: Clawdmeter-Post

```bash
curl -sL -A "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36" \
  "https://www.reddit.com/r/ClaudeCode/comments/1takxpl/clawdmeter_a_small_esp32_usage_limit_monitor.json"
```

## Python-Parsing

```python
import urllib.request, json

url = "https://www.reddit.com/r/ClaudeCode/comments/1takxpl/POST.json"
req = urllib.request.Request(url, headers={"User-Agent": "Mozilla/5.0"})

with urllib.request.urlopen(req, timeout=15) as resp:
    data = json.loads(resp.read().decode())

# Reddit JSON-Response ist eine Liste: [Post, Comments]
post_data = data[0]["data"]["children"][0]["data"]

title = post_data["title"]
selftext = post_data.get("selftext", "")
author = post_data["author"]
url_overridden = post_data.get("url_overridden_by_dest", "")

print(f"Title: {title}")
print(f"Author: u/{author}")
print(f"Body: {selftext[:500]}")
print(f"External Link: {url_overridden}")
```

## Wichtige Felder

| Feld | Beschreibung |
|------|-------------|
| `title` | Post-Titel |
| `selftext` | Self-Post-Body (Markdown) |
| `author` | Username |
| `url_overridden_by_dest` | Externer Link (wenn Link-Post) |
| `permalink` | Relativer Pfad |
| `created_utc` | Unix-Timestamp |
| `score` | Upvotes |
| `num_comments` | Kommentar-Anzahl |

## Kommentare extrahieren

```python
comments_data = data[1]["data"]["children"]
for comment in comments_data:
    if comment["kind"] == "t1":  # t1 = Comment
        body = comment["data"]["body"]
        author = comment["data"]["author"]
        print(f"u/{author}: {body[:200]}")
```

## Subreddit-Listing

```bash
curl -sL -A "Mozilla/5.0" "https://www.reddit.com/r/ClaudeCode/hot.json?limit=10"
```

```python
url = "https://www.reddit.com/r/ClaudeCode/hot.json?limit=10"
req = urllib.request.Request(url, headers={"User-Agent": "Mozilla/5.0"})
with urllib.request.urlopen(req, timeout=15) as resp:
    data = json.loads(resp.read().decode())

posts = data["data"]["children"]
for post in posts:
    p = post["data"]
    print(f"{p['score']}↑ | {p['title'][:60]}...")
```

## User-Agent

**Unbedingt setzen!** Ohne User-Agent blockiert Reddit sofort.
```python
headers = {"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36"}
```

## Rate-Limiting

- Reddit hat keine dokumentierten harten Limits für den JSON-Endpunkt
- Trotzdem nicht spammen — max 1 Request/s
- Bei 403/429 → Pause von 5-10s, dann retry

## Alternative: teddit / libreddit

Falls Reddit komplett blockt:
- `https://teddit.net/r/SUBREDDIT/comments/POST_ID` (Scraping-freundlich)
- `https://libreddit.de/r/SUBREDDIT/comments/POST_ID`
