# Ollama Cloud Pro as Custom Provider

How to configure Hermes to use Ahmed's Ollama Cloud Pro ($20/mo) as a custom provider — direct config or via CCR proxy.

## Provider Details

- **Base URL:** `https://ollama.com/v1`
- **Auth:** API key (set in `api_key` field or `.env`)
- **Active 3-model setup (June 2026):** `deepseek-v4-pro` (main/default), `kimi-k2.7-code` (coding via `delegation.model`), `gemini-3-flash-preview` (ALL `auxiliary.*` roles incl. vision — replaced kimi-k2.6 + gemma3:4b)
  - Chosen after live testing: qwen3.5:397b returned empty content (never finished reasoning, 6.7s); gemini-3-flash gave best quality+vision at ~3s. Constraint: keep total distinct models at 3.
- **Other vision-capable cloud models available:** `gemma4:31b`, `qwen3.5:397b`, `minimax-m3`, `gemma3:*` (deepseek-v4-pro itself has NO vision, so the aux model must)
- **Concurrency limit:** 3 simultaneous requests (4th → 404)
- **Capabilities:** OpenAI-compatible `/v1/chat/completions`, `/v1/models`

## Direct Custom Provider Config (Recommended)

Simplest, most reliable approach. Hermes talks directly to Ollama Cloud Pro with no proxy layer.

### config.yaml

```yaml
model:
  default: deepseek-v4-pro
  provider: custom
  base_url: https://ollama.com/v1
  api_key: <ollama-cloud-pro-key>
  context_length: 131072

auxiliary:
  vision:
    provider: custom
    model: gemma3:4b
    base_url: https://ollama.com/v1
    api_key: <ollama-cloud-pro-key>
  compression:
    provider: custom
    model: kimi-k2.6
    base_url: https://ollama.com/v1
    api_key: <ollama-cloud-pro-key>

delegation:
  provider: custom
  model: kimi-k2.6
  base_url: https://ollama.com/v1
  api_key: <ollama-cloud-pro-key>
```

### Key points
- Set `provider: custom` — not `ollama` (that's for local Ollama)
- `base_url` ends at `/v1` — Hermes appends `/chat/completions` itself
- All 3 models use the same base URL and API key
- The concurrency limit (3) means delegation max_children should stay ≤ 2 (leaving 1 slot for main)

## CCR (Claude Code Router) — Pitfalls & Status

CCR was attempted as a proxy layer to add model routing (e.g. auto-switch to kimi-k2.6 for long context). It proved unreliable.

### What CCR does
Sits between Hermes and Ollama Cloud Pro on `localhost:3456`. Routes requests based on model name or context length. Adds overhead but gives dynamic model switching without changing Hermes config.

### Known issues (as of CCR v2.0.0)
1. **Provider resolution bug:** `Provider 'undefined' not found` — CCR fails to resolve the provider name from the request, even when the provider is correctly registered in config. The error comes from `cli.js:582` and appears to be a race condition or config loading issue.
2. **Config format sensitivity:** v2.0.0 appends `/v1/chat/completions` to `api_base_url` automatically. Having the full path in `api_base_url` causes double-appending → 404.
3. **Systemd vs manual start:** Identical binary and config behave differently under systemd. Possible causes: CWD dependency, stale PID/lock files, or environment inheritance.
4. **Model name format:** Router entries require `"provider,model"` format (e.g. `"ollama,deepseek-v4-pro"`). The request model field mapping is inconsistent.
5. **Transformer block:** The `transformer` config section (deepseek max-token transform) was suspected of causing issues — removing it didn't help consistently.
6. **Crash on startup:** Process sometimes crashes silently after config load — no logs, no port binding. `curl → "Failed to connect"`.

### CCR verdict
**Do not use CCR for production.** Direct custom provider config is simpler, faster, and actually works. CCR adds a single point of failure for marginal routing benefit. If model routing is needed, use Hermes's built-in `delegation.model` and `auxiliary.*.model` config instead.

### CCR config reference (for future attempts)

Minimal v2.0.0 config that was tested:
```json
{
  "LOG": true,
  "API_TIMEOUT_MS": 600000,
  "Providers": [
    {
      "name": "ollama",
      "api_base_url": "https://ollama.com/v1",
      "api_key": "<key>",
      "models": ["deepseek-v4-pro", "kimi-k2.6", "gemma3:4b"]
    }
  ],
  "Router": {
    "default": "ollama,deepseek-v4-pro",
    "think": "ollama,kimi-k2.6",
    "background": "ollama,gemma3:4b",
    "longContext": "ollama,kimi-k2.6",
    "longContextThreshold": 60000
  },
  "HOST": "0.0.0.0",
  "APIKEY": "kiwi-ccr-local"
}
```

Installation: `npm install -g @musistudio/claude-code-router@2.0.0`
Config dir: `~/.claude-code-router/config.json`
Logs: `~/.claude-code-router/logs/`

## Pricing Comparison: Ollama Cloud Pro vs OpenRouter

Ollama Cloud Pro is **$20/month flat** for 3 concurrent models. Here's what the same models cost on OpenRouter (per-token, prices as of June 2026):

| Model | Role | OpenRouter Prompt | OpenRouter Completion |
|-------|------|-------------------|----------------------|
| deepseek-v4-pro | Main | $0.43/M tok | $0.87/M tok |
| kimi-k2.7-code | Coding | $0.74/M tok | $3.50/M tok |
| gemini-3-flash-preview | Aux ×9 | $0.50/M tok | $3.00/M tok |

**Estimated monthly cost on OpenRouter** (moderate usage: ~30M prompt + ~6M completion):
- deepseek-v4-pro: ~$10.34
- kimi-k2.7-code: ~$17.90
- gemini-3-flash-preview: ~$4.50
- **Total: ~$33/month**

**Optimized OpenRouter alternatives:**
- Replace gemini-3-flash-preview with **gemini-2.5-flash-lite** ($0.10/$0.40) → ~85% cheaper for aux
- Replace gemini-3-flash-preview with **deepseek-v4-flash** ($0.09/$0.18) → ~90% cheaper for aux
- Optimized total: ~$21/month (but loses Kimi K2.7 for coding)

**Verdict:** Ollama Cloud Pro at $20/month is a **good deal**. You get Kimi K2.7 for coding (which alone would cost ~$18/month on OpenRouter) plus unlimited usage with no surprise bills. The only downside is the 3-model lock — can't test Claude/GPT without dropping a slot.

### Direct API verification

Always test the upstream directly before blaming the proxy:
```bash
curl -s https://ollama.com/v1/chat/completions \
  -H "Authorization: Bearer <key>" \
  -H "Content-Type: application/json" \
  -d '{"model":"deepseek-v4-pro","messages":[{"role":"user","content":"Say OK"}],"max_tokens":10}'
```
