// Honest Review · June 18, 2026

STABLE AUDIO

Name: Stable Audio Review: Research-Grade Sound Design, Not a Song Machine
Item: Stable Audio
Rating: 7.8
Author: Nova Reyes

No sponsorships, no spin. A straight look at what Stable Audio is genuinely good at, where it falls short, and who should actually pay for it.

Stable Audio

Best for sound design & open models

7.8/10

Research-grade generation with open-weights options and clean WAV — built for tinkerers more than song-writers.

Pricing: free tier; paid from ~$12/mo.

The breakdown

Strong on: open-weights options, 48kHz WAV export, solid sound-design and loop generation, and a research community.

Watch out for: no stem separation, no live voice steering, weaker full-song output, and plan-dependent ownership.

WHAT YOU GET

Feature	Stable Audio
Text-to-music	✓
Vocal generation	Instrumental focus
Stem separation	—
Sound design / foley	✓ Partial
Voice steering (live)	—
Open weights	✓
Own your outputs	Plan-dependent
48kHz WAV	✓
API access	✓
Starting paid price	~$12/mo

The specific case: a horror-adjacent indie game, a corridor level that needed rain on a corrugated metal roof. Ninety seconds, loopable, no melodic content, because the composer's cue was already parked in the same frequency range and two things fighting for 400Hz is how a mix dies. I typed that as a prompt and ran it. Out of eight renders, three were usable and one was better than the field recording I had been planning to license. That result — three in eight, one keeper — is where this Stable Audio review starts, because the same evening I asked the same tool for a two-minute synthwave track with a vocal hook and got something that sounded like a radio playing two rooms away.

Both of those outcomes are the tool working as designed. That is the whole story.

The short version

Stable Audio is built for sound, not for songs. Loops, textures, risers, foley-adjacent beds, the thing that goes under a scene rather than the thing anyone sings along to. We score it 7.8, and the honest one-liner is this: research-grade generation with open-weights options and clean 48kHz WAV, built for tinkerers more than songwriters.

If your job this week is a rain loop, an ambience bed under a product video, or a four-bar percussion cell you were going to chop anyway, it earns its slot in the toolchain. If your job is a finished song with a vocal on top, this is the wrong door.

Where it shines

It generates sound, not only tracks

The capability table marks sound design and foley as partial, and that qualifier is doing real work. It is not a replacement for a well-recorded library — you are not getting a clean, isolated, correctly-labelled door creak on demand. What you get is material: a texture with the right character and the wrong length, which you then trim, layer, and EQ into the thing you needed. That is a normal day for anyone who works in sound. The generation is the raw take, not the delivery.

Where that pays off is in the cues nobody wants to spend a day on. The stinger. The UI whoosh you need forty variations of. The two-bar loop that has to sit at the same tempo as the last one. Loop generation is one of the listed strengths, and in a game build — where you are cutting adaptive layers that must butt-join without a click — a model that thinks in loops is worth more than a model that thinks in three-minute arrangements.

48kHz WAV is the boring feature that decides everything

Video runs at 48kHz. Game engines run at 48kHz. Your session runs at 48kHz. A tool that hands back a lossy file at some other rate means a resample and a generation of quality loss at every bounce, plus the low-grade dread of shipping compressed artifacts into a master. 48kHz WAV export means the render lands in the timeline and behaves like every other asset in it. It is unglamorous and it is the difference between a toy and a tool.

Open weights change what the tool actually is

This is the genuinely unusual column in the table. Most of the category is a website with a login and a queue. Open-weights options mean a version of the model can live on your drive: no queue, no upload, no round trip. For anyone under an NDA — unreleased game, unannounced product film — that matters, because "my prompt describes the client's secret project" is a real problem with a boring solution.

It also buys longevity. Hosted tools sunset. A model on your local disk does not get deprecated out from under a project you have to patch in three years. There is a research community around it, which in practice means fine-tunes, notebooks, and people publishing what does and does not work — the kind of shared knowledge that closed products never accumulate in public.

The API makes it infrastructure

API access is listed and it changes the ceiling. If you need 200 variations of a footstep on gravel, or a build step that regenerates ambience whenever a level's mood tag changes, you are not clicking a button 200 times. You are writing a loop and going to bed. That is a genuinely different relationship with a generator than prompt-and-listen.

Where it falls short

A dramatic photorealistic photograph of a futuristic sci-fi corridor with a heavy metallic door…

No stem separation, and it costs you

This is the gap that hurts most in practice. You get a stereo bounce, already mixed. You cannot pull the kick out, cannot mute the pad that is masking dialogue, cannot solo the top end and re-verb it. Every mix decision the model made is baked.

There is a workaround and it is the one experienced sound people land on anyway: stop asking for finished things. Prompt one element at a time — the low drone, then the metallic hits, then the air — and build your own stems by generating them separately. That works well for sound design, where layering is the craft. It works badly for music, where you wanted the arrangement to hang together in the first place.

No live voice steering

There is no hum-the-melody, sing-the-idea path. You type, it answers in stereo, and if the answer is wrong your only lever is different words. Prompt-roulette is real here, and my three-in-eight hit rate on a simple, well-specified ambience is the honest shape of it. Budget renders the way you budget takes.

Vocals and full songs are the weak register

The table says instrumental focus, and the source data marks full-song output as a listed weakness. I will not dress that up: do not buy this for vocals. Vocals are the hardest problem in the category generally, and this tool is not positioned to solve them. My two-minute synthwave attempt was not a bad session — it was a tool being asked to do the job of a different tool.

Ownership is plan-dependent, so read the terms before you invoice

"Own your outputs" comes back as plan-dependent, and that is the line to read twice. The source data does not spell out which tier grants what rights, and I am not going to guess on your behalf — licensing terms in this category get revised often enough that anything I pinned down here would be stale by the time you read it. Before a generated asset ships in something a client pays for, open the current terms for the exact tier you are on and confirm commercial use in writing. Free tiers across the category commonly restrict it.

The 12-month math

There is a free tier, and paid starts around $12/mo as of writing. Call it roughly $144 over a year at the entry rate, before whatever tier your actual rights and usage needs push you toward. That is a reasonable number against one commissioned ambience cue and an unreasonable number if you open it twice a quarter. Recurring costs are a subscription question, not a features question, and the free tier exists precisely so you can answer it before committing.

Who should use it, who should skip it

Use it if you cut sound for games, film, or video — you need loops, beds, textures, and variations, and you already layer everything by hand. Use it if you work under NDA or offline and the open-weights path solves a problem no hosted tool can. Use it if you are a developer who wants generation inside a pipeline rather than inside a browser tab.

Skip it if you are writing songs with vocals. Skip it if your workflow depends on stems — remixers, anyone who needs to rebalance after the fact. Skip it if the way you work is humming an idea and wanting the machine to catch it, because there is no live voice steering to catch it with.

Either way: take the free tier and give it a real cue from an actual project, not a fun prompt. A tool's honest score is how it does on the thing that is due Friday.

Back to the rain

That corrugated-roof loop is still in the build. It sat under a corridor, it did not fight the composer, and nobody has ever asked where it came from — which is the highest compliment ambience gets. The synthwave track went in the bin, and it deserved to.

The myth I walked in with, the one the whole category keeps repeating, is that an AI audio tool is a machine that writes your song for you. The more accurate version: Stable Audio is a machine that renders your sound, one element at a time, and leaves the song where it has always been — with you.

SEE HOW IT COMPARES.

Line Stable Audio up against the other AI sound tools — side by side, no sponsorships.

Compare the Tools