Most people think voice cloning is a solved problem. Throw money at ElevenLabs, get a decent clone, done. But here’s the catch: cross-language cloning is still a mess. Your Chinese voice speaking Japanese? It sounds like a foreigner trying too hard.
NetEase Youdao’s Ziyue 4 changes that. The model itself is 27B parameters, top-tier in math and science benchmarks. But the real killer feature is the TTS. Three seconds of audio—your voice, cloned. Then you can make it speak 14 languages, and here’s the kicker: no accent transfer. Your Chinese vocal timbre speaking Japanese sounds native. Not like a Chinese person struggling through katakana. It’s subtle but massive for content creators.
The weight files are open-source. Download them, run locally. No API calls, no monthly fees, no data leaking to the cloud. This isn’t a demo with a paywall—it’s a real tool you can deploy on your own machine. For YouTubers doing dubbing, for digital human projects, for anyone tired of paying per character for decent voice synthesis, this is the inflection point.
Sure, there are trade-offs. Local deployment means you need decent hardware, and the 27B model isn’t lightweight. But the open-source community will optimize it. The real question is: will you still pay for a proprietary service when the open alternative is this good? Probably not.
The article is originally from a Chinese tech公众号, but the message translates: stop paying for voice cloning. Ziyue 4 just made it free—and better.