Readme
Music 2.6
Music 2.6 is MiniMax’s latest music generation model. Give it lyrics and a style description, and it generates a full-length song with vocals and instrumentation — or go instrumental-only with just a prompt.
What’s new in 2.6
Music 2.6 builds on Music 2.5 with major improvements:
- BPM and key control — specify a key and BPM in your prompt (e.g. “E minor, 90 BPM”) and the output matches 99%+ of the time
- Faster streaming — end-to-end chunk latency dropped from over 60 seconds to under 25 seconds
- Longer songs — up to 6 minutes per generation (most songs land between 2 and 4 minutes)
- Instrumental mode — generate music without vocals. Set
is_instrumentalto true and provide a prompt describing the style. No lyrics needed. - Automatic lyrics — set
lyrics_optimizerto true and the model generates lyrics from your prompt, so you can create a complete song from just a style description.
Plus everything from 2.5:
- Natural-sounding vocals with realistic timbre, breathing, and pitch transitions
- Expanded sound library including orchestral and traditional instruments
- 14+ section tags for precise structure control
- Style-aware mixing that adapts to genre
Inputs
lyrics — The lyrics for your song, up to 3,500 characters. Required for vocal tracks (unless lyrics_optimizer is enabled). Use structure tags to control the arrangement:
[Intro], [Verse], [Pre Chorus], [Chorus], [Hook], [Drop], [Bridge], [Solo], [Build Up], [Inst], [Interlude], [Break], [Transition], [Outro]
Use \n to separate lines and \n\n to add pauses between sections.
prompt — A description of the music style, mood, and scenario. For example: E minor, 90 BPM, acoustic guitar ballad, male vocal, emotional. Up to 2,000 characters. Required for instrumental mode.
is_instrumental — Set to true to generate instrumental music without vocals. When enabled, prompt is required and lyrics is not needed. Default: false.
lyrics_optimizer — Set to true to auto-generate lyrics from the prompt when lyrics are empty. Default: false.
sample_rate — Audio sample rate. Options: 16000, 24000, 32000, 44100 (default).
bitrate — Audio bitrate. Options: 32000, 64000, 128000, 256000 (default).
audio_format — Output format: mp3 (default), wav, or pcm.
Quick start examples
Song with lyrics and BPM/key control
{
"prompt": "E minor, 90 BPM, acoustic guitar ballad, male vocal, emotional",
"lyrics": "[Verse]\nWalking through the rain...\n[Chorus]\nBut I still remember you"
}
Instrumental track
{
"prompt": "Cinematic orchestral, epic and dramatic, full symphony",
"is_instrumental": true
}
Auto-generated lyrics
{
"prompt": "Upbeat pop, summer vibes, feel-good, catchy melody",
"lyrics_optimizer": true
}
Prompt guide
The prompt field steers the overall sound of your track. Think of it as giving creative direction to a producer.
Prompt structure
A good prompt follows this general pattern:
[Key], [BPM], [Genre], [Mood/Emotion], [Vocal description], [Key instruments], [Production style]
You don’t need every element every time — pick the ones that matter for your track. Music 2.6 is especially good at matching exact BPM and key when you specify them.
What to include
| Element | What it does | Examples |
|---|---|---|
| Key | Sets the musical key | E minor, C major, Bb minor, A major |
| BPM | Controls the exact tempo | 75 BPM, 90 BPM, 120 BPM, 140 BPM |
| Genre | Sets the foundational sound | Pop, Indie folk, Jazz, Blues, EDM, Hip-hop, Rock, Classical, Country, R&B |
| Mood / Emotion | Shapes the emotional tone | Melancholic, uplifting, aggressive, dreamy, hopeful, introspective, confident |
| Vocal style | Guides the singer’s delivery | Male vocals, female vocals, breathy, powerful, soulful, clear, operatic, raspy |
| Instruments | Requests specific sounds | Acoustic guitar, piano, synth bass, 808 drums, orchestral strings, brass section |
| Production / Mixing | Shapes the sonic character | Lo-fi, warm reverb, wide soundstage, intimate studio feel, distorted, crisp |
Example prompts
Acoustic ballad with BPM/key:
E minor, 90 BPM, acoustic guitar ballad, male vocal, emotional
Bright pop:
C major, 120 BPM, bright pop, female vocal
Jazz piano:
Bb minor, 75 BPM, jazz piano, smooth vocal
Electronic / Dance:
Pop-Dance/Progressive House, uplifting, anthemic, 125 BPM, four-on-the-floor kick, synth bass, atmospheric pads
Cinematic instrumental:
Cinematic orchestral, epic, sweeping strings, building tension, heroic brass, 90 BPM, wide soundstage
Lo-fi:
Lo-fi hip-hop, chill, study vibes, vinyl texture, warm midrange
Structure tags
Structure tags let you design the emotional arc and arrangement of your song. Vocals and instruments evolve dynamically across sections — vocal emotion and technique shift from verse to chorus, instrumentation adjusts per section.
| Tag | Purpose | When to use it |
|---|---|---|
[Intro] |
Song opening | Setting mood before vocals kick in |
[Verse] |
Story / narrative sections | Main lyrical content |
[Pre Chorus] |
Build-up before chorus | Escalating tension |
[Chorus] |
Hook / memorable section | The repeating main idea |
[Post Chorus] |
After-hook section | Extended hook variation |
[Hook] |
Catchy standalone phrase | A memorable moment |
[Drop] |
Energy release (EDM) | After a build-up, the beat drops |
[Bridge] |
Contrast section | Breaking repetition, new perspective |
[Solo] |
Instrumental spotlight | Showcasing a specific instrument |
[Inst] |
Instrumental section | Music without vocals |
[Build Up] |
Intensity increase | Leading to a drop or climax |
[Interlude] |
Instrumental break | Breathing space between sections |
[Break] |
Rhythmic pause | Dynamic contrast |
[Transition] |
Section connector | Smooth flow between parts |
[Outro] |
Song ending | Graceful exit, fade-out |
Good to know
- Max song length: Up to 6 minutes per generation. Most songs land between 2 and 4 minutes.
- BPM/key accuracy: Specify exact BPM and key in the prompt and the output matches 99%+ of the time.
- Language support: English and Mandarin Chinese have the strongest support. Other languages work but with less consistent pronunciation.
- Each generation is unique. The same prompt and lyrics produce different arrangements each time.
- Reproducibility: Use a seed parameter (0–1,000,000) for deterministic output.
- For instrumental tracks, set
is_instrumentalto true — or use[Inst]tags and parenthetical instrument directions in the lyrics. - Higher quality settings (44100 sample rate, 256000 bitrate) give the best audio quality. Use wav format for production work.
Also check out minimax/music-cover to reimagine existing songs in a different style while preserving the original melody.
Privacy policy
Data from this model is sent from Replicate to MiniMax.
Check their privacy policy for details:
https://www.minimax.io/platform/protocol/privacy-policy