python-3.x 正在获取声音文件. Libsndfile错误：打开“speech.wav”时出错：向声音文件提供2D numpy数组时无法识别格式

z9smfwbn 于 2023-01-10 发布在 Python

关注(0)|答案(1)|浏览(1436)

在遇到错误之前，尝试从NVIDIA TTS nemo模型生成的Tensor生成音频：
下面是它的代码：

import soundfile as sf

from nemo.collections.tts.models import FastPitchModel
from nemo.collections.tts.models import HifiGanModel

spec_generator = FastPitchModel.from_pretrained("tts_en_fastpitch")
vocoder = HifiGanModel.from_pretrained(model_name="tts_hifigan")

text = "Just keep being true to yourself, if you're passionate about something go for it. Don't sacrifice anything, just have fun."
parsed = spec_generator.parse(text)
spectrogram = spec_generator.generate_spectrogram(tokens=parsed)
audio = vocoder.convert_spectrogram_to_audio(spec=spectrogram)
audio = audio.to('cpu').detach().numpy()

sf.write("speech.wav", audio, 22050)

应获取音频文件speech.wav

python-3.x

来源：https://stackoverflow.com/questions/74821875/getting-soundfile-libsndfileerror-error-opening-speech-wav-format-not-recogn

1条答案

按热度按时间

lh80um4z1#

查看示例，我发现音频形状为(1, 173056)。
基于https://github.com/bastibe/python-soundfile/issues/309，我已经将音频转换为大小为173056的1D阵列，并且工作正常。
使用代码：

>>> import numpy as np
>>> sf.write("speech.wav", np.ravel(audio), sample_rate)

此致，

赞(0）回复(0）举报 2023-01-10

我来回答

python-3.x 正在获取声音文件. Libsndfile错误：打开“speech.wav”时出错：向声音文件提供2D numpy数组时无法识别格式

1条答案

相关问题

热门标签

最新问答