AI Music Detection Tools Promise to Clean Up Your Playlists. Here's What They Can Actually Tell You.

A detector once told me a track on my running playlist was machine-generated. I'd seen the band play it in a basement in Oakland two years earlier — three people, one of whom broke a string. The song was as human as a broken string gets. The tool was very confident. It was also wrong.

That gap — between confidence and correctness — is the whole story of AI music detection, the cluster of tools and platform features now promising to tell you which songs on your Spotify or Apple Music playlists were made by a machine. The promise is reassuring: scan a playlist, get a report, know what you're listening to. The reality is more interesting, and worth understanding before you trust the green checkmark.

Can a tool really tell if a song was made by AI?

Short answer: sometimes, partially, and with caveats it rarely shows you. A detector can flag signals that often accompany AI-generated audio — certain spectral patterns, the absence of provenance data, telltale artifacts in how a render handles reverb tails or vocal sibilance. What it cannot do is prove origin the way a birth certificate proves a birthday. It estimates a probability and rounds it into a yes or a no for your convenience. The rounding is where the trouble lives.

How the field talked itself into certainty

The belief that we can reliably detect AI music didn't appear from nowhere. It assembled itself out of three earlier, sturdier ideas — each true in its narrow domain, each stretched a little past its strength.

The first was provenance metadata. The push to embed origin information directly into media files is real and serious; image and audio standards now exist to carry a tamper-evident record of how a file was made. When that data is present and intact, you genuinely know something. The catch: it only works if the tool that made the file wrote the data, and if nothing downstream stripped it. Re-encode a track, run it through a mastering chain, upload it to a platform that flattens metadata, and the record is gone. Provenance proves origin when it survives. It usually doesn't survive a playlist.

The second idea was watermarking — a faint, deliberate signature baked into generated audio by the model that made it. Several generation tools do this. It can be robust, and when a detector finds a watermark it recognizes, that's strong evidence. But a watermark only catches tracks made by cooperating tools that chose to mark their output, and only until someone learns to scrub it. It's a fence, not a wall, and it only fences the polite.

The third, and the shakiest, was the statistical classifier — a model trained on examples of human and machine music, learning to spot the difference. This is the engine behind most "scan your playlist" tools, because it needs nothing from the file but the audio itself. It's also the one that flagged my friend's basement recording. Classifiers learn the texture of the training data, not the truth of the world. Lo-fi human recordings can look synthetic. Clean AI renders can look like a competent session player. The model is confident either way, because confidence is what it was trained to output.

So the lineage runs: metadata (strong, fragile), to watermarking (strong, narrow), to statistical detection (broad, soft). Somewhere along that chain the public conversation collapsed three different reliabilities into one word — detection — and inherited the confidence of the strongest link while mostly using the weakest.

The numbers that built the alarm

The urgency around all this got its fuel from scale. Platforms have reported enormous volumes of new uploads daily, and AI tracks now make up a meaningful and growing share of catalogs. Those figures are real and they matter — a flood of generated music is genuinely reshaping how playlists get filled and how royalties get split.

But a count of suspected AI tracks is not a measurement of detection accuracy. "We labeled millions of tracks" tells you how busy the classifier was, not how often it was right. The two numbers get reported together so often that the second borrows credibility from the first. A tool can label thirteen million tracks and still misjudge the one on your playlist, and nothing in the press release will tell you which.

What a playlist scan actually gives you

If you run one of these reports — Deezer's playlist analyzer and a handful of independent web tools will scan against Spotify and Apple Music libraries as of writing — here's how to read it honestly.

A flag is a guess, not a verdict. Treat it as "worth a second listen," not "confirmed synthetic."
No flag is not a clearance. Detection misses things, especially well-produced renders and hybrid tracks where a human wrote the song and a model filled the arrangement.
Watermark hits are your strongest signal. If a report distinguishes a detected watermark from a statistical guess, the watermark deserves more weight.
The interesting middle is invisible. A track co-written by a person and a model, or a human song mastered with AI tools, is neither and both. Most detectors have no honest category for it, so they pick one.

For a casual listener, the useful takeaway is modest and real: these tools are good at surfacing the obvious — the generic, fully-synthetic filler that streaming farms churn out by the thousand. That's most of what bothers people about AI on playlists, and catching it has value. They're far less reliable at the edges, which is exactly where the music you'd actually argue about lives.

None of this is an argument against the tools. It's an argument for reading them the way you'd read a smoke detector: a useful alarm, not a courtroom. The platforms building these features are responding to a genuine problem, and the better ones are honest about probabilities. The risk isn't the technology. It's the rounding.

The myth: a detector can tell you whether the song you're hearing was made by a machine. The more accurate version: a detector can tell you how much a song resembles the machine-made music it was trained on — and the distance between those two sentences is where your favorite basement band lives.