Here is the counterintuitive part, stated plainly: when you upload a music video to Spotify, the video is not the product. Your audio stream is. The clip is bait, and the catch is a listener who plays the track forty more times after the video ends. If you are an independent artist eyeing Spotify direct uploads as a way to compete with YouTube on video, you are pointed at the wrong target. The whole mechanism is built to convert a watch into a habit — and a habit is what actually pays.
The rest of this piece is me earning the right to have said that.
What the beta actually is
For most of streaming's history, the path onto Spotify ran through a distributor — DistroKit, TuneCore, CD Baby, a label, somebody who took a cut and delivered your WAV to the platform's ingestion pipeline. Spotify has periodically experimented with letting artists skip that step. There was a direct audio-upload beta around 2018 to 2019 that the company shut down, citing focus. So a direct-upload feature is not unprecedented territory for them; it is a thing they have tried, killed, and circled back to.
The current move extends that idea to video — music videos and recorded live performances, uploaded by the artist or their team rather than ingested as a finished package from a major. The clips sit alongside the audio version of a track. A listener browsing your artist page can hit play on the video or play on the track, and Spotify counts both. The video, when it qualifies, is royalty-bearing. That last word is the entire reason any of this matters to your release strategy.
Why a streaming platform wants your face on screen
Spotify makes money when people listen. Video does not obviously serve that — video is YouTube's house, and YouTube monetizes attention through ads, not per-stream payouts. So why would an audio-first business spend engineering hours on a video pipeline?
Because the data the platform has shared over the years points the same direction every time: people who watch tend to listen more, and listen longer, than people who only stream audio. Treat the specific percentages with caution — they come from Spotify's own decks, they shift between announcements, and the company has an obvious interest in making the lift look large. But the mechanism underneath is not controversial. A face, a room, a performance gives a listener a reason to attach to an artist instead of a track. Attachment is retention. Retention is repeat plays. Repeat plays are the only thing the royalty pool actually counts.
That is the conversion you are optimizing for. Not video views. Video-to-audio lift — the additional audio streams a release earns because a video existed.
This is not YouTube, and treating it like YouTube is the mistake
If you run releases for a living, your instinct after reading "video uploads" is to port your YouTube playbook over: maximize watch time, chase the algorithm, optimize thumbnails, post Shorts. Resist it. The two surfaces reward opposite behavior.
| YouTube | Spotify video (beta) | |
|---|---|---|
| Monetizes | Attention via ad impressions | Audio streams via royalty pool |
| You want the viewer to | Stay on the video, watch the next one | Leave the video, go play the track on repeat |
| Discovery driver | Watch-time signals, recommendations | Existing listener relationship, editorial, playlists |
| The asset is | The destination | The on-ramp |
On YouTube, a viewer who closes the tab after one watch is a failure. On Spotify, a viewer who closes the video and then streams your album twice is the entire point. So your editing instinct should change. A YouTube cut front-loads a hook to survive the first eight seconds against the skip button. A Spotify video has a softer job: it has already captured someone inside your ecosystem, and it needs to make them feel like staying. Less arms-race, more atmosphere.
What's actually eligible, and what will break
The format constraints, as of writing, are narrow, and they will frustrate anyone used to YouTube's permissiveness.
- Aspect ratio leans 16:9. This is built for landscape music videos and performance footage, not vertical clips ripped from your phone.
- It expects a real video. Static visualizers, looping waveform animations, the audio-with-a-cover-art-still trick that fills YouTube — those are not what this surface is for. The platform wants moving footage tied to the track.
- Access is gated and gradual. Not every artist or team has the upload toggle yet. Rollout has been incremental, which means your release calendar cannot assume the feature is live for your account until you have confirmed it.
- The primary path is still distribution and labels. Direct upload is an option being layered on top, not a replacement for the ingestion pipeline that delivers the overwhelming majority of catalog. If your distributor handles your metadata, splits, and territories, that relationship does not vanish because a video toggle appeared.
Check the current eligibility and spec sheet inside Spotify for Artists before you build anything — these numbers and rules move, and a clip cut to last quarter's spec is a re-export you did not need.
A release strategist's checklist
You are not making a video. You are making an audio on-ramp that happens to have pictures. Run a release through this:
- Confirm the upload toggle is live on the account you actually control. Do this before you brief any editor. If it is not live, the video plan is a YouTube/Reels plan for now.
- Match the video to a track that benefits from repeat listening — a single you want streamed on loop, not your most experimental B-side. The video's job is to recruit a habit for that specific track.
- Cut for landscape and atmosphere, not for the skip button. You have a captive viewer. Let the song breathe.
- Keep the audio master identical to the streaming version. The whole conversion logic collapses if the video has a different mix; you want the listener to chase the exact sound they just heard.
- Watch the audio stream count, not the video count, for two to four weeks after upload. The number that proves the experiment worked is downstream streams per listener, not plays on the clip.
- Don't retire YouTube. That is still where discovery from outside your fanbase happens. Spotify video converts people who already found you; it is the bottom of the funnel, not the top.
The honest gaps
I am not going to pretend the royalty math is settled, because it is not, and anyone telling you the exact per-stream value of a video play is guessing or selling something. How a royalty-bearing video stream is weighted against an audio stream, how it pools, how it interacts with the thresholds your distributor reports against — these are the parts the announcements gloss and the parts that decide whether the effort pays for an independent artist with no video budget. A performance clip shot on a borrowed camera in a rehearsal room costs you a weekend. A proper music video costs more than most singles will ever recoup in streams. The conversion lift has to clear that bill, and the public data does not yet tell you where the break-even sits for an artist your size.
The other open question is durability. Spotify has built and dismantled a direct-upload feature before. Pouring your catalog and your workflow into a beta means accepting that the beta can change its specs, its eligibility, or its existence on a quarter's notice. Build the muscle, not the dependency.
What this piece did not answer: the precise royalty weighting of a video stream versus an audio one, the break-even budget for a self-funded artist, and whether the feature survives past beta in its current shape. For those, watch three things — the Spotify for Artists changelog for spec and eligibility shifts, your distributor's payout reports for how video streams actually land in your statements, and your own two-week audio numbers after the first upload, which will tell you more about your specific audience than any platform deck.
Upload the video if you can. Then ignore it and go count the streams it sent downstream — that number is the only review that matters.
Try it yourself, free
Generate your first royalty-free track in seconds. No card, no catch — type a prompt and hit render.