Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

基于 MuseTalk 模型推流时音画不同步 #129

Closed
yni9ht opened this issue Jun 21, 2024 · 7 comments
Closed

基于 MuseTalk 模型推流时音画不同步 #129

yni9ht opened this issue Jun 21, 2024 · 7 comments

Comments

@yni9ht
Copy link
Contributor

yni9ht commented Jun 21, 2024

目前在本地环境使用 MuseTalk 模型,使用 rtcpush 推流。
在推理完成后推流时,音画不同步,明显看到画面的速度比音频要快。

@lipku
Copy link
Owner

lipku commented Jun 24, 2024

更新代码试一下

@yni9ht
Copy link
Contributor Author

yni9ht commented Jun 24, 2024

首先感谢作者及时的更新。
基于新分支 98eeeb17 测试下来效果好了不少,但还是会有一点延迟。
另外还有一个问题,我看了下 museasr.py 代码中通过 batch_zie * 2 数量的音频帧来提取特征信息,这一步是否还有优化空间呢,单个切片和较长音频段来分析特征应该是会有比较大的差异吧(音频连贯性、上下文、边界值等影响因素)。目前测试同一段音频来进行推理,目前这版本推理出来的口型和 MuseTalk 推理出来完整的视频口型上还是有一些差异的。

@lipku
Copy link
Owner

lipku commented Jun 25, 2024

这是个tradeoff,离线的整个音频文件一起提取特征肯定效果最好。更长的音频会导致延时加大,在延时和质量之间折中。可以设置-l、-r来加大音频缓存长度

@yni9ht
Copy link
Contributor Author

yni9ht commented Jun 27, 2024

目前使用 rtcpush 的方式推流,画面和音频还是会有延迟,同时会伴有一定音画不同步的问题。

@lipku
Copy link
Owner

lipku commented Jun 30, 2024

静音和说话时的fps各是多少

@yni9ht
Copy link
Contributor Author

yni9ht commented Jul 1, 2024

静音时稳定在25FPS,说话时大部分帧率在22FPS,刚启动时可能会在十几帧

@lipku
Copy link
Owner

lipku commented Jul 1, 2024

要达到25fps才行,显卡性能不行

@yni9ht yni9ht closed this as completed Jul 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants