The advice everyone gives you is clean and old: clear the rights, take delivery of the assets, keep the paperwork, and you are covered. Get a signed license, get the high-res files, archive the contract in the shared drive, and your campaign is defensible. That rule has held for a long time, and for a lot of work it still does.
It stops holding the moment your customer experience can talk back.
The short version: traditional IP licensing describes a finished thing you receive and use; the new wave of AI-enabled experiences forces licensing to describe how a thing is allowed to behave, and most marketing teams are still budgeting, contracting, and approving as if they are buying files. If you are planning anything where a licensed character, voice, or brand persona responds to a user in real time, the contract that protects you is no longer the one that lists deliverables. It is the one that defines conduct.
A disclosure before we go further. City of Punk runs a neural sound foundry and licenses machine-made audio, so we sell into the exact category this piece is about. Read everything below as coming from someone with skin in the game. The argument holds anyway, because it cuts against the easy sale: the honest version of these terms is more restrictive and more work, not less.
Where the old rule is still basically right
If your deliverable is static, the standard advice is fine. A licensed photo, a track scored to picture, a typeface, a stock loop dropped under a thirty-second spot — these are finite. The contract can enumerate the thing, the territory, the term, the media, and the permitted uses, and that enumeration genuinely describes reality. Nothing about the asset changes after delivery. A song does not improvise a second verse about your competitor. A photograph does not answer a customer's question in a way that defames someone.
This is why the "clear it, file it, you're covered" model survived so long. For a closed, finished asset, the gap between what the license says and what the asset can actually do is essentially zero. You licensed a thing, you received the thing, the thing behaves like a thing.
So if your 2024-style campaign is a video, a banner set, a podcast intro, and a landing page, keep doing what you are doing. The old rule covers you, and most of this article is not yet your problem.
Where it breaks
It breaks the instant the asset becomes a system instead of an object.
Consider what a growing number of brands are actually scoping: a licensed character that holds a conversation inside a mobile app, a brand-voice assistant that answers support questions in a recognizable persona, an interactive installation where a mascot reacts to what visitors say. In each case you are no longer taking delivery of a finished output. You are taking delivery of a generator — something that will produce thousands of outputs you will never individually review, in response to inputs you cannot fully predict.
The old contract has no language for this. It can tell you that you licensed the character's likeness and the actor's recorded performance. It cannot tell you what happens when a user prompts that character into saying something cruel, off-brand, or legally radioactive, and the system, doing its job, generates a plausible reply. The asset can now produce an output that no human at your company or the rights holder's ever approved, in a voice the audience associates with a real performer and a beloved property.
That is the structural shift. The unit of risk used to be the asset. Now it is the asset's range of possible conduct — and conduct is harder to enumerate than a media list. A few concrete failure modes that the file-clearance mindset misses entirely:
- The voice persona answers a question outside its intended domain and gives wrong medical or financial guidance in a trusted, branded tone.
- A user steers the character into endorsing something — a product, a politician, a slur — and screenshots it.
- The performer behind the voice never agreed to have their vocal identity used in synthetic, unscripted dialogue, only in a defined recording session.
None of those are covered by a clean signature on a deliverables schedule.
What actually changes in the workflow
This is where it gets operational, and where the new licensing models are heading whether or not your current vendor has caught up.
The contract starts describing behavior, not just access. Expect terms that define what the licensed persona is permitted to discuss, where it must refuse, and what topics are walled off entirely. These are contractual constraints on output, enforced in the system itself, not creative-brief suggestions. Treat them like the technical spec they are.
Talent participation becomes ongoing, not one-and-done. When a recognizable voice can be synthesized into unscripted speech, the people who own those voices increasingly want continuing consent and continuing compensation — a stake in the model's use, not a buyout of a session. Your procurement and legal teams should assume that voice and likeness terms now include living people with leverage and a clear interest in how the synthetic version conducts itself.
Monitoring moves inside the deal. Static licensing ended at delivery. These arrangements often do not. Rights holders are building the ability to detect unauthorized or out-of-bounds use of their properties, which means your vendor relationship may include logging, audit rights, and a kill switch. Budget for the experience to be observed for its entire life, not approved once at launch.
Approval shifts from review to design. You cannot QA every output of a generative system by watching it. So the approval gate moves upstream, into how the constraints are configured before launch. Your sign-off is no longer "this looks good"; it is "we have defined and tested what this can and cannot say."
How I would evaluate a vendor right now
If you are scoping an interactive, AI-enabled experience with licensed material, here is what I would press a vendor on before signing anything.
- What does the license cover when the output is generated, not delivered? If the answer is only the underlying assets and not the system's conduct, you are exposed.
- What are the refusal and topic constraints, and can you see and test them? Vague assurances about "alignment" are not terms.
- Who is on the hook when the system says something it shouldn't? Get indemnity language that survives the existence of unpredictable output.
- What are the talent terms behind any voice or likeness? Confirm the real humans consented to synthetic, unscripted use, not only to a recording.
- What happens at end of term? When the deal ends, does the trained system get retired, or could your configured persona keep running somewhere you no longer control?
The price of these experiences over twelve months is rarely the contract value alone. It is the contract value plus the monitoring overhead plus the legal review of the conduct spec plus the talent's continuing stake. Price it honestly or it will surprise you in Q3.
Who needs this now, and who can wait
You need this now if you are planning anything conversational, responsive, or generative wearing a licensed face or voice — branded assistants, interactive characters, persona-driven support. The exposure is live and the standard contract does not address it.
You can wait if your pipeline is still static deliverables. A licensed track under a video, a generated bed for a podcast, stems you mix and ship — the file-clearance model still describes that reality accurately, and chasing behavioral terms you do not need is wasted legal spend.
So here is the more honest version of the old rule. It is no longer "clear the rights and file the contract." It is: clear the rights for static assets, and for anything that can respond, license the behavior, define what it is forbidden to do, name the humans whose identities are inside it, and plan to watch it for as long as it runs.
To put my own practice where my argument is: every audio brief that leaves my desk now carries a single line under the deliverables, and it reads, "If this voice can be made to say something we didn't write, tell me before we build it." It has killed two projects early. Both deserved to die.
Try it yourself, free
Generate your first royalty-free track in seconds. No card, no catch — type a prompt and hit render.