Create

The Create tab is where you turn an MP3 into a finished video.

The Create tab

Two flows

The shape of the Create tab depends on one setting:

AI Lyric Generator off (default) — Create opens to a drop zone. Bring an MP3 and render. Described below. Walk-through: Make your first video.
AI Lyric Generator on — Create gains two steps in front of the drop zone: Lyrics (topic → scene → four candidates) and Song (the chosen lyrics plus a Suno-ready style prompt, with a Skip to Audio button to drop the resulting MP3). Turn the feature on in Settings → Preferences. Walk-through: Make your first video with AI lyrics. Feature reference: AI Lyric Generator.

The rest of this page describes the MP3 → render portion, which is the same in both flows.

Dropping in a song

Drag an MP3 file onto the drop zone, or click Browse to pick one. The app reads:

Title and artist from the file’s metadata.
Lyrics from the lyrics-eng metadata tag if present.

You can edit any of these before generating.

Song analysis

Before rendering, the app splits the song into sections (verses, choruses, bridges) by reading the lyrics. Each section (“act”) gets its own background image — so a verse might have a quieter scene and the chorus a bigger, more dramatic one.

AI prompt editor

Want to steer the look of a specific section? Click the prompt icon next to it in the Create panel. You can type a custom prompt like:

“Dark highway at night with red taillights, cinematic”

This overrides the auto-generated prompt for that one section only.

Custom prompt dialog

“Base model is not installed yet”

Auto-prompt generation needs the local AI prompt model. On a fresh install it downloads in the background — the Base model progress bar at the bottom of the screen shows where it’s at. If you click AI → Generate before that bar reaches 100%, the request is refused with a “Base model is not installed yet” message rather than wasting a render on a generic placeholder prompt.

You have two options:

Wait for the Base model bar to finish, then click Generate again.
Type your own prompt in the dialog before clicking Generate — custom prompts don’t depend on the model and will be accepted right away.

Disabling an act

Each background slot has an eye icon in the top-right corner. Click it to disable that act — the slot grays out (eye-slash), the render skips that section entirely, and the previous act’s background extends over the disabled slice of the song. Click the icon again to re-enable.

At least one act must stay enabled.

While the app is auto-filling empty slots on first load, the Upload, AI, and eye toggles dim and the cursor flips to a not-allowed indicator so you can tell at a glance that they’re locked until generation finishes.

Previewing a background at full size

The top-left corner of each populated background slot has an expand icon. Click it to open a full-size overlay of that image. Click the dim backdrop, press Esc, or click the × button in the corner to close the overlay.

Per-act lyrics

Each non-disabled background slot shows the lyrics for that act inline beneath its action row — no click required. Lyrics appear under every slot (populated, empty, or generating) whenever the section has lyrics, so you can see what part of the song each slot corresponds to while you’re deciding what to upload or generate. Sections with no lyrics (purely instrumental acts, or acts that fail to load lyrics) stay compact — the lyrics block is hidden entirely rather than showing a placeholder. This makes it easy to match the right visual to each part of the song while you’re reviewing or regenerating backgrounds.

Active background generator chip

The Backgrounds label shows a small chip next to it with the name of the currently-active background generator (blue for cloud generators like Pollinations.ai, purple for local-GPU generators like SDXL-Lightning or RealVisXL V5). If no generator is active, the chip reads None — pick one in Settings → Backgrounds to enable auto-generation.

Per-song overrides

When you have more than one prompt model or more than one background generator installed, a Use for this song row appears under the Backgrounds chip with two dropdowns:

Prompt builder — pick a different prompt-builder LLM (Qwen 3B / 7B / 14B / Cydonia) for this song’s prompt generation only.
Background generator — pick a different image generator (Pollinations.ai / SDXL-Lightning / RealVisXL V5) for this song’s renders only.

Each dropdown’s current global default is marked with a star (★). Choosing the starred option clears the override and the song falls back to whatever the global default is at render time. Choosing anything else pins this song to that model.

The override is per-song and per-session — it persists while you keep the song open on Create but resets to the global defaults the next time the app starts. To change the default permanently, use Settings → App → Prompt Builder or Settings → Backgrounds.

If you only have one prompt model and one background generator installed, this row stays hidden — there’s nothing to choose between.

Cancelling auto-generated backgrounds

While the auto-generation queue is running across the empty acts, a red Cancel button appears in the Backgrounds header. Clicking it stops the queue at the next act boundary — the act currently being rendered finishes (SDXL inference is not interruptible mid-step), and any later acts are left untouched, flipped back to empty so you can re-trigger them individually with the per-slot AI button.

Refreshing a song’s analysis

The Refresh button next to the song title clears the cached analysis for that song and re-runs it. Use this if you edited the song’s lyrics externally or want to redo beat/section detection from scratch. (This replaces the older Clear cache wording; behavior is unchanged.)

Regenerating backgrounds and existing renders

Regenerating a background on Create changes what your next render uses — it does not change anything about renders already in the Library. Each render is locked to its own set of backgrounds, so you can safely iterate on the Create workspace without touching finished videos.

Generating the video

Click Generate to kick off the render. A new Library card is created for this render attempt. Progress is shown in the Queue tab, and you can render the same song repeatedly — each attempt produces its own independent card.

What happens under the hood

Lyrics cleanup — section headers like “Verse 1” are stripped.
Lyric transcription — each word is aligned to a timestamp in the audio.
Prompt generation — the local AI prompt model writes a visual description for each enabled section.
Background fetch — each prompt becomes a 4K image via the AI image service.
Beat detection — the app finds the song’s BPM so the visualizer pulses on-beat.
Color matching — the dominant color of the background tints the visualizer.
Render — the renderer stitches everything into a 1080p MP4, saved to the render’s own folder.

You don’t need to understand any of this — it all just happens.