Skip to main content

Songsee

Generate spectrograms and audio feature visualizations (mel, chroma, MFCC, tempogram, etc.) from audio files via CLI. Useful for audio analysis, music production debugging, and visual documentation.

Skill metadata

SourceBundled (installed by default)
Pathskills/media/songsee
Version1.0.0
Authorcommunity
LicenseMIT
TagsAudio, Visualization, Spectrogram, Music, Analysis

Reference: full SKILL.md

info

The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.

songsee

Generate spectrograms and multi-panel audio feature visualizations from audio files.

Prerequisites

Requires Go:

go install github.com/steipete/songsee/cmd/songsee@latest

Optional: ffmpeg for formats beyond WAV/MP3.

Quick Start

# Basic spectrogram
songsee track.mp3

# Save to specific file
songsee track.mp3 -o spectrogram.png

# Multi-panel visualization grid
songsee track.mp3 --viz spectrogram,mel,chroma,hpss,selfsim,loudness,tempogram,mfcc,flux

# Time slice (start at 12.5s, 8s duration)
songsee track.mp3 --start 12.5 --duration 8 -o slice.jpg

# From stdin
cat track.mp3 | songsee - --format png -o out.png

Visualization Types

Use --viz with comma-separated values:

TypeDescription
spectrogramStandard frequency spectrogram
melMel-scaled spectrogram
chromaPitch class distribution
hpssHarmonic/percussive separation
selfsimSelf-similarity matrix
loudnessLoudness over time
tempogramTempo estimation
mfccMel-frequency cepstral coefficients
fluxSpectral flux (onset detection)

Multiple --viz types render as a grid in a single image.

Common Flags

FlagDescription
--vizVisualization types (comma-separated)
--styleColor palette: classic, magma, inferno, viridis, gray
--width / --heightOutput image dimensions
--window / --hopFFT window and hop size
--min-freq / --max-freqFrequency range filter
--start / --durationTime slice of the audio
--formatOutput format: jpg or png
-oOutput file path

Notes

  • WAV and MP3 are decoded natively; other formats require ffmpeg
  • Output images can be inspected with vision_analyze for automated audio analysis
  • Useful for comparing audio outputs, debugging synthesis, or documenting audio processing pipelines