Donate to the Palestine's children, safe the people of Gaza.  >>>Donate Link...... Your contribution will help to save the life of Gaza people, who trapped in war conflict & urgently needed food, water, health care and more.

Prosody Modeling with 3D Visual Information

Prosody Modeling with 3D Visual Information for Expressive Video Dubbing

The automatic video dubbing task is proposed to meet personal and industrial demands for dubbing. Current methods mostly focus on duration matching and overlook the synchronization of prosody, and thus lack expressiveness. In this paper, we introduce visual prosody modeling to promote expressiveness for video dubbing , defined as the expression and head pose in 3D space, which has the advantages of 1) high relevance to the tone and stress of utterances; 2) more accurate than 2D images; 3) disentanglement from irrelevant factors such as speaker identity. We propose a 3D-VD (3D Video Dubber) system to incorporate visual prosody, utilizing a visual-text step-wise aligner to control the generated prosody. Experiments demonstrate that the proposed method outperforms previous methods that only consider 2D face images in terms of naturalness, lip-speech alignment, and synchronization of visual and auditory prosody. The case study demonstrates the correlation between expression and pitch.

To Get Daily Health Newsletter

We don’t spam! Read our privacy policy for more info.

Download Mobile Apps
Follow us on Social Media
© 2012 - 2025; All rights reserved by authors. Powered by Mediarx International LTD, a subsidiary company of Rx Foundation.
RxHarun
Logo