# Melody Extraction & Notation Rendering Pipeline

Full pipeline: audio → spectral peak melody → ABC notation → LilyPond → PNG sheet music.

Use this when the user asks for "Noten extrahieren", "Melodie aus Audio holen", "transcribe melody", or when key/chord analysis (audio-analysis-librosa.md) isn't enough and they want actual notation.

## Overview

```
yt-dlp (WAV) → librosa HPSS → spectral peaks → median filter → phrase grouping → ABC → music21 → LilyPond → PNG
```

## Step 1: Download Audio

```bash
yt-dlp -x --audio-format wav --no-part -o /tmp/song.wav "<url>"
```

## Step 2: Melody Extraction (Spectral Peak Method)

**Why not pYIN?** `librosa.pyin` works for clean monophonic vocals but fails badly on melismatic singing (Turkish makam, Sufi music) and drone-heavy arrangements — it detects only 2-8 note changes across an entire song. The spectral peak method is more robust.

```python
import librosa
import numpy as np
from scipy.ndimage import median_filter
from collections import Counter

# Load with HPSS to isolate harmonic (melody) from percussive (drone/rhythm)
y_full, sr = librosa.load("/tmp/song.wav", sr=22050)
y_harm, _ = librosa.effects.hpss(y_full)

# STFT with good frequency resolution
S = np.abs(librosa.stft(y_harm, n_fft=4096, hop_length=512))
freqs = librosa.fft_frequencies(sr=sr, n_fft=4096)
times_spec = librosa.frames_to_time(np.arange(S.shape[1]), sr=sr, n_fft=4096, hop_length=512)

# Focus on vocal range (C3-C6)
c3_bin = np.searchsorted(freqs, librosa.note_to_hz('C3'))
c6_bin = np.searchsorted(freqs, librosa.note_to_hz('C6'))

# Find strongest peak per frame, only if significantly above noise floor
peak_midis = []
for i in range(S.shape[1]):
    frame = S[c3_bin:c6_bin, i]
    if frame.max() > np.median(frame) * 2:  # peak must be 2x median
        peak_bin = c3_bin + np.argmax(frame)
        peak_hz = freqs[peak_bin]
        peak_midis.append(int(round(librosa.hz_to_midi(peak_hz))))
    else:
        peak_midis.append(-1)  # rest marker

# Median filter smoothing (window ~0.2-0.25s)
# At hop_length=512 and sr=22050, ~11 frames ≈ 0.25s
smoothed = median_filter(np.array(peak_midis), size=11)
```

## Step 3: Phrase Grouping

Raw frame-by-frame notes are too noisy. Group into ~2-second phrases and pick the dominant note:

```python
note_names = ['C', 'C#', 'D', 'D#', 'E', 'F', 'F#', 'G', 'G#', 'A', 'A#', 'B']

def midi_to_abc(m):
    """MIDI to ABC notation. C4 (MIDI 60) = 'C'. Below = commas, above = lowercase+apostrophes."""
    if m < 0: return "z"
    note = note_names[m % 12]
    octave = (m // 12) - 1
    if octave < 4:      return note + "," * (4 - octave)
    elif octave == 4:    return note
    else:                return note.lower() + "'" * (octave - 5)

phrase_dur = 2.0  # seconds
total_dur = len(y_full) / sr
num_phrases = int(total_dur / phrase_dur)

melody_phrases = []
for p in range(num_phrases):
    start_t = p * phrase_dur
    end_t = (p + 1) * phrase_dur
    mask = (times_spec >= start_t) & (times_spec < end_t)
    notes_in_phrase = smoothed[mask]
    valid = notes_in_phrase[notes_in_phrase >= 0]
    
    if len(valid) == 0:
        melody_phrases.append(("z", -1, start_t))
        continue
    
    dominant_midi = Counter(valid).most_common(1)[0][0]
    melody_phrases.append((midi_to_abc(dominant_midi), dominant_midi, start_t))
```

## Step 4: Generate ABC Notation

```python
abc_header = """X:1
T:{title}
C:{artist}
M:4/4
L:1/4
Q:1/4={tempo}
K:{key}
"""

# Group into bars (4 phrases per bar at L:1/4)
bar_len = 4
lines = []
for i in range(0, len(melody_phrases), bar_len * 4):  # 4 bars per line
    line_phrases = melody_phrases[i:i + bar_len * 4]
    bars = []
    for j in range(0, len(line_phrases), bar_len):
        bar_notes = [p[0] for p in line_phrases[j:j+bar_len]]
        bars.append(" ".join(bar_notes))
    lines.append("| " + " | ".join(bars) + " |")

abc_content = abc_header + "\n".join(lines)

with open("/tmp/song.abc", "w") as f:
    f.write(abc_content)
```

## Step 5: Render to PNG via LilyPond

music21 bridges ABC → LilyPond. If LilyPond isn't installed, download the static binary:

```bash
# Download LilyPond static binary (no install needed)
curl -L "https://gitlab.com/lilypond/lilypond/-/releases/v2.24.4/downloads/lilypond-2.24.4-linux-x86_64.tar.gz" -o /tmp/lilypond.tar.gz
cd /tmp && tar xzf lilypond.tar.gz
```

Then render:

```python
import os
os.environ['MPLCONFIGDIR'] = '/tmp/mplconfig'  # avoid permission errors on ZimaOS

from music21 import converter, environment

env = environment.Environment()
env['lilypondPath'] = '/tmp/lilypond-2.24.4/bin/lilypond'

score = converter.parse('/tmp/song.abc', format='abc')
score.write('lilypond', fp='/tmp/song.ly')

import subprocess
subprocess.run(
    ['/tmp/lilypond-2.24.4/bin/lilypond', '--png', '-o', '/tmp/song', '/tmp/song.ly'],
    capture_output=True, text=True, timeout=60, cwd='/tmp'
)
# Output: /tmp/song-page1.png, song-page2.png, ...
```

Also write MusicXML for MuseScore compatibility:
```python
score.write('musicxml', fp='/tmp/song.musicxml')
```

## Output Files

| File | Description |
|------|-------------|
| `/tmp/song-page*.png` | Rendered sheet music (one PNG per page) |
| `/tmp/song.abc` | ABC notation (human-editable) |
| `/tmp/song.musicxml` | MusicXML (import into MuseScore, Sibelius, etc.) |
| `/tmp/song.ly` | LilyPond source (for fine-tuning the engraving) |

## Pitfalls

- **pYIN fails on melismatic/drone music**: Don't use `librosa.pyin` for Turkish makam, Sufi, or heavily ornamented vocals. Spectral peak + median filter is the reliable path.
- **LilyPond not on system**: ZimaOS and many containers lack it. The static binary from GitLab works anywhere (no dependencies beyond glibc).
- **music21 environment**: Set `MPLCONFIGDIR` to a writable tmp path on ZimaOS (rootfs is read-only, `/DATA/.config` may be permission-denied).
- **ABC octave notation**: C4 (middle C, MIDI 60) = `C`. Below: `C,` `C,,` `C,,,`. Above: `c` `c'` `c''`. The `midi_to_abc()` function above handles this correctly.
- **Phrase duration tradeoff**: 2s windows produce readable notation but lose micro-rhythm. For detailed rhythm, use 0.5s windows but expect more pages.
- **Accuracy disclaimer**: Always tell the user this is machine-extracted, not a perfect transcription. Melismatic ornaments and exact rhythms are simplified. The output captures harmonic structure and melodic direction.
- **librosa not in execute_code sandbox**: Always write the script to a file and run via `terminal` with `python3 /path/to/script.py`.
- **Tempo is numpy array**: `librosa.beat.beat_track` returns tempo as ndarray — use `.item()` to extract scalar.
- **Full tracks timeout**: 6+ min WAV at 48kHz causes librosa to hang. Use `sr=22050` and consider processing in chunks if needed (the phrase-grouping approach handles full tracks fine at 22050).

## Dependencies

```bash
pip install librosa music21 scipy
# LilyPond: download static binary (no pip package)
```