# Melody Extraction with librosa (Fallback when songsee/Go unavailable)

When `songsee` CLI is not installed and Go is unavailable, use Python + librosa for full melody extraction from audio files.

## Prerequisites

```bash
python3 -m pip install librosa scipy
```

## Workflow

### Step 1: Download Audio

```bash
yt-dlp -x --audio-format wav --no-part -o "/tmp/song.wav" "URL"
```

### Step 2: Key & Tempo Analysis (60s segment)

```python
import librosa, numpy as np

y, sr = librosa.load("/tmp/song.wav", sr=22050, duration=60.0)
tempo, beats = librosa.beat.beat_track(y=y, sr=sr)
tempo_val = tempo.item()  # librosa returns array

chroma = librosa.feature.chroma_stft(y=y, sr=sr)
chroma_mean = chroma.mean(axis=1)

# Krumhansl-Schmuckler key estimation
major_profile = np.array([6.35, 2.23, 3.48, 2.33, 4.38, 4.09, 2.52, 5.19, 2.39, 3.66, 2.29, 2.88])
minor_profile = np.array([6.33, 2.68, 3.52, 5.38, 2.60, 3.53, 2.54, 4.75, 3.98, 2.69, 3.34, 3.17])
# Correlate rotated chroma against profiles to find best key
```

### Step 3: Melody Extraction (spectral peaks + median smoothing)

**Pitfall:** pYIN (`librosa.pyin`) often finds very few notes on melismatic/drone-heavy music (e.g., Turkish İrfan Türküler). It picks up the drone tone (D2) 76% of the time and misses the vocal melody.

**Better approach:** Spectral peak tracking with median smoothing:

```python
from scipy.ndimage import median_filter

y_full, sr = librosa.load("/tmp/song.wav", sr=22050)
y_harm, _ = librosa.effects.hpss(y_full)

S = np.abs(librosa.stft(y_harm, n_fft=4096, hop_length=512))
freqs = librosa.fft_frequencies(sr=sr, n_fft=4096)

c3_bin = np.searchsorted(freqs, librosa.note_to_hz('C3'))
c6_bin = np.searchsorted(freqs, librosa.note_to_hz('C6'))

peak_midis = []
for i in range(S.shape[1]):
    frame = S[c3_bin:c6_bin, i]
    if frame.max() > np.median(frame) * 2:  # only significant peaks
        peak_bin = c3_bin + np.argmax(frame)
        peak_hz = freqs[peak_bin]
        peak_midis.append(int(round(librosa.hz_to_midi(peak_hz))))
    else:
        peak_midis.append(-1)  # silence

# Median filter window ~0.2s smooths out noise
smoothed = median_filter(np.array(peak_midis), size=11)
```

### Step 4: Phrase Grouping (2s windows)

For sheet music, group dominant pitch per 2-second window:

```python
phrase_dur = 2.0
for p in range(num_phrases):
    start_t = p * phrase_dur
    mask = (times_spec >= start_t) & (times_spec < end_t)
    notes_in_phrase = smoothed[mask]
    valid = notes_in_phrase[notes_in_phrase >= 0]
    if len(valid) > 0:
        dominant_midi = Counter(valid).most_common(1)[0][0]
```

### Step 5: ABC Notation Output

Convert MIDI to ABC notation:
- MIDI 60 (C4) = `C` in ABC
- Below C4: uppercase with commas (e.g., `D,` for D3)
- Above C5: lowercase with apostrophes (e.g., `c'` for C5)

### Step 6: Render to Sheet Music PNG

Use music21 + Lilypond:

```bash
# Download Lilypond static binary (no apt-get on ZimaOS)
curl -sL "https://gitlab.com/lilypond/lilypond/-/releases/v2.24.4/downloads/lilypond-2.24.4-linux-x86_64.tar.gz" -o /tmp/lilypond.tar.gz
cd /tmp && tar xzf lilypond.tar.gz

# Python: parse ABC → write LilyPond → render PNG
from music21 import converter, environment
env = environment.Environment()
env['lilypondPath'] = '/tmp/lilypond-2.24.4/bin/lilypond'

score = converter.parse('/tmp/song.abc', format='abc')
score.write('lilypond', fp='/tmp/song.ly')

import subprocess
subprocess.run(['/tmp/lilypond-2.24.4/bin/lilypond', '--png', '-o', '/tmp/song', '/tmp/song.ly'])
# Output: /tmp/song-page1.png through pageN.png
```

## Pitfalls

- **Long audio (>2 min) times out librosa:** Load with `duration=60.0` parameter, analyze in segments
- **pYIN fails on melismatic/drone music:** Use spectral peak + median filter instead
- **`tempo` is a numpy array, not float:** Use `tempo.item()` to extract scalar
- **Lilypond not on system:** Download static binary from GitLab releases (no apt-get on ZimaOS)
- **ABC parse may fail on complex notation:** Fall back to building `music21.stream.Score` manually from note list
- **GitHub raw URLs may return HTML, not binary:** Use GitLab releases or SourceForge for font/binaries