对于我的ML项目,我使用一个模型,我给一个视频和音频作为输入文件,以检测视频中的合成语音。
但是它在audio_processing()函数上返回一个错误:
音频处理()的代码
def audio_processing(wav_file, verbose=True):
rate, sig = wav.read(wav_file)
if verbose:
print("Sig length: {}, sample_rate: {}".format(len(sig), rate))
try:
mfcc_features = speechpy.feature.mfcc(sig, sampling_frequency=rate, frame_length=0.010, frame_stride=0.010)
except IndexError:
raise ValueError("ERROR: Index error occurred while extracting mfcc")
if verbose:
print("mfcc_features shape:", mfcc_features.shape)
# Number of audio clips = len(mfcc_features) // length of each audio clip
number_of_audio_clips = len(mfcc_features) // AUDIO_TIME_STEPS
if verbose:
print("Number of audio clips:", number_of_audio_clips)
# Don't consider the first MFCC feature, only consider the next 12 (Checked in syncnet_demo.m)
# Also, only consider AUDIO_TIME_STEPS*number_of_audio_clips features
mfcc_features = mfcc_features[:AUDIO_TIME_STEPS*number_of_audio_clips, 1:]
# Reshape mfcc_features from (x, 12) to (x//20, 12, 20, 1)
mfcc_features = np.expand_dims(np.transpose(np.split(mfcc_features, number_of_audio_clips), (0, 2, 1)), axis=-1)
if verbose:
print("Final mfcc_features shape:", mfcc_features.shape)
return mfcc_features
错误:
第一次
1条答案
按热度按时间mzillmmw1#
从外观上看,您的音频文件包含两个通道,您可以通过查看
wav.read
函数返回的数组的形状来检查这两个通道:sig.shape
.speechpy.feature.mfcc
函数需要一个单声道音频。我相信你能做的就是把你的音频转换成一个单声道,比如说把两个声道平均:如果您希望函数同时处理单通道和多通道数据,则仅当音频信号为多通道时,您才可以计算平均值: