The problem with AI music generation is not that the music sounds bad. Plenty of it sounds fine. The problem is the ledger — the list of names whose work went into the machine, who never got asked, and who will never see a cent when the machine spits out something in their voice. That is the part the demo reels skip.
I make sound for a living. I have spent enough late nights auditioning generated stems to tell you the technology is genuinely useful for some jobs and genuinely dishonest about where it came from. Those two facts sit next to each other whether the vendors like it or not.
This got concrete when SZA posted publicly, furious, after learning a stack of her songs — including material that had never been released — had been swept into a training dataset without her knowledge. She used harder words than I will. The number she cited ran into the low hundreds. The detail that should stop you is the unreleased part: work that never left her hard drive turned up in a corpus somewhere. That is not a licensing gray area. That is a locked door someone walked through.
How does a song end up in an AI training dataset?
In plain terms: a dataset is built by scraping audio and metadata at scale — from streaming rips, leaked files, fan uploads, unofficial reposts, and sometimes catalogs a company claims it had rights to but did not. The audio gets converted into a mathematical representation the model learns from. Nobody signs a release. In most cases the artist is never notified, and there is no standard mechanism to find out after the fact. You learn your catalog was used the way SZA did — because someone told you, or because the output started sounding suspiciously like you.
The two words that do the heavy lifting in every vendor FAQ are "trained on." A model trained on ten thousand vocal performances does not store those files intact. What it stores is a statistical map of how those voices move — the rasp on a held note, the way a particular ad-lib lands behind the beat. So the companies say, accurately, that the model does not "copy" any single track. And that is a dodge. You do not need a copy of the master to reproduce the thing that made an artist worth listening to. The distinctiveness is the asset, and the distinctiveness is exactly what the model extracts.
Why the bill lands on Black artists first
Follow the genres. A staggering share of the vocal phrasing, rhythmic feel, and production language that models are trained to imitate comes out of R&B, hip-hop, gospel, soul, and their descendants — Black American music that the wider industry has monetized without crediting for a century. When a generator learns "smooth contemporary R&B vocal, 72 BPM, minor key," it is not learning from nowhere. It is learning from a specific lineage of specific people, most of whom were never in the room when the training decision got made.
Then there is the cover economy, which is where the money actually leaks. AI voice tools let anyone render a track "in the style of" a recognizable singer and upload it. The revenue that would have flowed to the original artist — or to the newer artist building a following by sounding adjacent to that style — gets diluted across a flood of synthetic near-copies. For a superstar with a catalog, that is an insult and a lawsuit. For an emerging artist three EPs deep, whose entire economic case is "I have a distinctive sound and I own it," a machine that reproduces that sound for free is not an insult. It is a foreclosure.
This is the asymmetry that matters. The artists with the leverage to fight — big names, big lawyers — can at least make noise. The artists whose sound seeds the models but who have no leverage get harvested twice: their tradition trains the machine, and then the machine competes with them for the same listeners.
The volume problem, and where the money routes
You do not have to trust one artist's anger to see the scale. Streaming platforms are absorbing enormous daily volumes of uploaded tracks, and the share that is AI-generated has been climbing fast — figures reported across the industry over the past couple of years show the proportion moving from a large minority toward roughly half of new uploads on some services, as of writing. The exact number varies by platform and by month, and any single figure is stale the moment it is published. The direction does not vary.
Here is the mechanism that should bother a working professional most. Streaming royalties are a fixed pie divided by play count. Every synthetic track that pulls a stream pulls a fraction of a cent that would otherwise have gone to a human catalog. Multiply by tens of thousands of uploads a day and you have a quiet, structural transfer of income away from the people whose work trained the generators and toward whoever operates them. No single theft. Just erosion, at industrial scale.
I am not going to pretend a spokesperson would frame it that way, because in the reporting on this, the companies mostly decline to. That silence is part of the record too.
What you can actually check before you ship
If you make things for a living, moral clarity is nice but a provenance habit is better. Before you drop a generated track into a client build, ask the questions the vendor would rather you skipped.
- Where did the training data come from? If the tool cannot describe its corpus in a sentence, treat the output as unclearable.
- Is there a written indemnity? Some platforms now put commercial-use and indemnification terms in writing. "Royalty-free" is a marketing word; indemnification is a legal one. Read for the second.
- Does it let you generate "in the style of" a named artist? If yes, that feature is the liability. Do not use it, even as a joke, even for a temp track.
- Can you get stems and a clean 48kHz WAV, not a lossy render? Ethics aside, mushy MP3 output is its own reason to walk.
- Who owns the output? Ownership and licensing are different clauses. Confirm both.
This is also the honest case for using tools built on consented or original training data — City of Punk among them — not because it makes the ethics disappear, but because it moves the provenance question from "unanswerable" to "answered in the terms." That is the whole difference between a tool you can defend in a client meeting and one you cannot.
None of this means the instrument is illegitimate. A synth was theft-of-labor panic once too. The difference is that a synth did not train itself on a specific person's unreleased vocals and then sell the result back to the room she sang it in.
So here is the rule you can use tonight: if a track sounds like a person who never agreed to be in it, it is not yours to ship — no matter what the license says.
Not sure which tool to use?
Compare the top AI music and sound tools side by side — honest reviews, real pricing, no sponsorships.