Home/ Articles/ Suno AI API Audio Export: What You Actually Get Back, and How to Store It
Developers

Suno AI API Audio Export: What You Actually Get Back, and How to Store It

The generation call is the easy part. You send a prompt, you get a job ID, you poll or wait for a webhook, and eventually a track exists.

A moody, close-up macro photograph of a solid-state drive and an external audio interface…

The generation call is the easy part. You send a prompt, you get a job ID, you poll or wait for a webhook, and eventually a track exists. Where integrations actually rot is the export step — the moment you have to decide what audio you keep, in what format, and for how long. Most teams building on a Suno AI API through a third-party gateway treat that step as an afterthought, and it comes back to bite them three weeks after launch when a user tries to download a track that no longer exists.

The verdict, in one sentence: treat every hosted audio URL from an aggregator as a temporary pointer, pull the highest-quality asset you can get into your own storage the moment the job completes, and never let a playback feature depend on a link you don't control.

Full disclosure before anything else: City of Punk builds generative audio tooling that competes with some of the services described here. I'm going to tell you where the export step is thin regardless. If this reads like a brochure, I've failed at my job.

What most people do

The first integration almost always looks the same. You call the endpoint, the response comes back with a field named something like audio_url, and it points at an MP3 sitting on the provider's CDN. You store that string in your database, drop it into an <audio> tag or an ffmpeg step, and ship the feature. It demos beautifully.

Here's the shape of the response most gateways hand you:

{
  "id": "gen_8f3a...",
  "status": "complete",
  "audio_url": "https://cdn.provider.example/tracks/8f3a.mp3",
  "image_url": "https://cdn.provider.example/art/8f3a.png",
  "duration": 122.4,
  "title": "midnight drive synthwave"
}

You save audio_url. Done. And for a week or a month, everything works.

Then one of three things happens. The link expires — a lot of these CDN URLs are signed and time-boxed, and nobody told you the expiry was measured in hours or days rather than forever. Or the provider rotates storage and the path changes. Or your app scales, a hundred users hit the same cached-but-now-dead URL, and your support inbox fills with "the audio won't play." The generation worked perfectly. The export strategy was a MP3 string you didn't own.

The second common shortcut is accepting whatever format shows up. Most gateways default to a lossy MP3 at a middling bitrate because it's cheap to serve. If your product is a background-music picker for social clips, fine. If anyone downstream is loading that file into a video editor, a game engine, or a mastering chain, a 128 kbps MP3 is a wall you'll hit later and can't undo.

What the evidence suggests

Run a few dozen jobs across the aggregators and the differences stop being about generation quality — the underlying model is largely the same — and start being about what leaves the building. These are the export dimensions that actually vary, in rough order of how often they burn people.

URL lifetime. Hosted audio links are frequently signed and expiring. Some persist for weeks, some for hours. The durable assumption, as of writing, is that any URL you didn't put in your own bucket is temporary. Don't store the link as your source of truth. Store your own copy and store the link only as a fallback for re-fetching.

Format and fidelity. What you can pull varies by tier and by provider:

  • MP3 — near-universal, usually the default, bitrate often unspecified or middling. Adequate for streaming previews, weak for editing.
  • WAV — sometimes available, sometimes gated behind a higher tier. This is what you want if the audio is a source asset, not an endpoint. A 48kHz WAV survives being pitched, time-stretched, and re-encoded; an MP3 accumulates artifacts every pass.
  • Stems — separated instrument or vocal tracks. Rarer, usually a premium feature, and worth confirming before you promise adaptive-mix features to your own users. Availability varies a great deal between gateways.

Delivery model. Polling a status endpoint works until you have volume, at which point you're paying for HTTP requests to ask "done yet?" Webhook callbacks — the provider POSTs to you when a job finishes — scale far better and are the difference between a queue that hums and one that thrashes. If a gateway offers callbacks, use them; if it only offers polling, budget for the request overhead.

Metadata you'll wish you kept. Duration, title, seed or generation parameters, and the prompt. Save all of it at export time. Six months in, when a user asks "can I get more like track 4102," the seed and prompt are the only things that let you answer.

The uncomfortable honest note: none of these gateways is an official first-party product, so export guarantees are soft. Formats, expiry windows, and stem availability can change under you without a version bump. Build as if they will.

What I actually do

My pipeline treats the provider as a source and my storage as the record. The sequence, every time:

  1. Fire the generation request with a callback URL if the gateway supports one; fall back to polling only when it doesn't.
  2. On completion, immediately fetch the highest-fidelity asset offered — WAV if it's on the table, otherwise the best MP3 — before doing anything else with the response.
  3. Write that file straight to my own object storage, keyed by my job ID, not the provider's.
  4. Persist the metadata block — prompt, seed, duration, title, format, source URL, fetched-at timestamp — in the database next to the storage key.
  5. Serve every download and preview from my storage. The provider URL exists only so I can re-fetch if my copy is ever lost.

That's it. It's not clever. It's the difference between a feature that survives the provider changing its CDN and one that doesn't.

For anything that's a genuine source asset — a track a user will edit or that feeds a video render — I insist on WAV and confirm the sample rate. For disposable previews in a scroll-through picker, MP3 is fine and cheaper to store. I let the job-to-be-done pick the format instead of accepting the default.

Who this is for, who should skip it

Build this pipeline if you're putting generated audio into a product other people rely on — a content platform, a game with adaptive loops, a video tool where the track gets exported. Ownership of the file is not optional at that point.

You can skip most of it if you're prototyping, running an internal tool, or generating throwaway audio nobody re-downloads. Storing the URL and moving on is a reasonable trade when nothing depends on the file existing next week. Just know that's the trade you're making, and don't let a prototype's shortcut become production's foundation by accident.

Try this this week

Take one generation you ran three or more days ago and click the stored audio_url directly. If it 403s, expires, or 404s, you've found your bug before your users did — and now you know the export step, not the model, is what you need to fix first.

Not sure which tool to use?

Compare the top AI music and sound tools side by side — honest reviews, real pricing, no sponsorships.

Compare the Tools
R

Robert Halstead

The Signal · City of Punk