在遇到错误之前,尝试从NVIDIA TTS nemo模型生成的Tensor生成音频:
下面是它的代码:
import soundfile as sf
from nemo.collections.tts.models import FastPitchModel
from nemo.collections.tts.models import HifiGanModel
spec_generator = FastPitchModel.from_pretrained("tts_en_fastpitch")
vocoder = HifiGanModel.from_pretrained(model_name="tts_hifigan")
text = "Just keep being true to yourself, if you're passionate about something go for it. Don't sacrifice anything, just have fun."
parsed = spec_generator.parse(text)
spectrogram = spec_generator.generate_spectrogram(tokens=parsed)
audio = vocoder.convert_spectrogram_to_audio(spec=spectrogram)
audio = audio.to('cpu').detach().numpy()
sf.write("speech.wav", audio, 22050)
应获取音频文件speech.wav
1条答案
按热度按时间lh80um4z1#
查看示例,我发现音频形状为
(1, 173056)
。基于https://github.com/bastibe/python-soundfile/issues/309,我已经将音频转换为大小为
173056
的1D阵列,并且工作正常。使用代码:
此致,