---
name: local-transcription-workflow
description: |
  Complete local video/audio transcription workflow on ZimaOS — faster-whisper,
  SSD-based temp storage, auto-cleanup of source media, and Notion archival.
  Replaces scattered memory entries with a reusable procedural skill.
---

# Local Transcription Workflow on ZimaOS

## Overview

This skill documents the proven workflow for transcribing video/audio content
entirely offline using `faster-whisper`, while avoiding the ZimaOS RAM-disk
(`/tmp`) trap and keeping storage clean.

## Prerequisites

- `faster-whisper` installed in the Hermes venv
- `ffmpeg` available (system package on ZimaOS)
- `yt-dlp` available (system package on ZimaOS)
- Notion API key in `/DATA/AppData/hermes/.env`

Verify faster-whisper:
```bash
python3 -c "from faster_whisper import WhisperModel; print('OK')"
```

## Disk Layout (ZimaOS-specific)

| Mount     | Type    | Size   | Free   | Use Case                          |
|-----------|---------|--------|--------|-----------------------------------|
| `/`       | rootfs  | 1.2 GB | 0%     | **Never use for temp files**      |
| `/tmp`    | tmpfs   | 3.9 GB | ~3.5G  | RAM-disk — only for tiny temps    |
| `/DATA`   | SSD     | 222 GB | ~165G  | **All work, downloads, models**   |
| `/media/HDD_1TB` | HDD | 932 GB | ~806G  | Long-term archives                |

**Critical rule**: Downloads >100 MB or `pip install` operations MUST use
`/DATA` as temp space, not `/tmp`.

## Environment Setup

Add to `/DATA/AppData/hermes/.env`:
```
HF_HOME=/DATA/AppData/hermes/.cache/huggingface
WHISPER_CACHE_DIR=/DATA/AppData/hermes/.cache/whisper
XDG_CACHE_HOME=/DATA/AppData/hermes/.cache
TMPDIR=/DATA/AppData/hermes/tmp_downloads
```

Create directories:
```bash
mkdir -p /DATA/AppData/hermes/work
mkdir -p /DATA/AppData/hermes/tmp_downloads
mkdir -p /DATA/AppData/hermes/.cache/huggingface
mkdir -p /DATA/AppData/hermes/.cache/whisper
```

## Standard Workflow

### 1. Download Video (to SSD, never /tmp)

```bash
cd /DATA/AppData/hermes/work
yt-dlp -o "video.%(ext)s" "<URL>"
```

### 2. Extract Audio

```bash
ffmpeg -i video.mp4 -vn -acodec pcm_s16le -ar 16000 -ac 1 audio.wav
```

### 3. Transcribe with faster-whisper

```python
import os
os.environ["HF_HOME"] = "/DATA/AppData/hermes/.cache/huggingface"
os.environ["WHISPER_CACHE_DIR"] = "/DATA/AppData/hermes/.cache/whisper"

from faster_whisper import WhisperModel

model = WhisperModel("base", device="cpu", compute_type="int8",
                     download_root="/DATA/AppData/hermes/.cache/whisper")

segments, info = model.transcribe("/DATA/AppData/hermes/work/audio.wav",
                                   beam_size=5, language="en")

text = "\n".join([f"[{s.start:.1f}s] {s.text}" for s in segments])
with open("/DATA/AppData/hermes/work/transcript.txt", "w") as f:
    f.write(text)
```

### 4. Immediate Cleanup (MANDATORY)

**Delete source media immediately after transcription.**
Only the `.txt` transcript may be kept.

```bash
rm /DATA/AppData/hermes/work/video.mp4
rm /DATA/AppData/hermes/work/audio.wav
```

This is a **user requirement** — never leave video/audio clutter.

### 5. Optional: Archive to Notion

See `notion` skill for API usage. Transcript can be chunked and written
to a Notion page if the user requests archival.

## Pitfalls

1. **Don't install `openai-whisper`** — it pulls PyTorch (>500 MB wheel)
   and fails on `/tmp`. Use `faster-whisper` (already installed).
2. **Don't use `/tmp` for pip** — set `TMPDIR=/DATA/...` before installing.
3. **Verify Before Claim** — check `python3 -c "import faster_whisper"`
   before attempting any transcription. It's already there.
4. **RAM-disk fills fast** — 3.9 GB `/tmp` is shared with all system
   processes. One video download can exhaust it.

## Model Choice

- `base` — sweet spot for DE/EN/TR on CPU, fast enough
- `small` — better accuracy, ~2× slower
- `medium` / `large-v3` — only if GPU available

## Files and Paths

| File/Dir | Path |
|----------|------|
| Work directory | `/DATA/AppData/hermes/work/` |
| Temp downloads | `/DATA/AppData/hermes/tmp_downloads/` |
| Whisper cache | `/DATA/AppData/hermes/.cache/whisper/` |
| HF cache | `/DATA/AppData/hermes/.cache/huggingface/` |
| venv Python | `/DATA/AppData/hermes/venv/bin/python3` |
| Notion env | `/DATA/AppData/hermes/.env` |