scipy Assert错误:信号维数应为(N,)格式,但实际上为(743424,2)

hgncfbus  于 2022-11-23  发布在  其他
关注(0)|答案(1)|浏览(92)

对于我的ML项目,我使用一个模型,我给一个视频和音频作为输入文件,以检测视频中的合成语音。
但是它在audio_processing()函数上返回一个错误:

音频处理()的代码

def audio_processing(wav_file, verbose=True):

    rate, sig = wav.read(wav_file)
    if verbose:
        print("Sig length: {}, sample_rate: {}".format(len(sig), rate))

    try:
        mfcc_features = speechpy.feature.mfcc(sig, sampling_frequency=rate, frame_length=0.010, frame_stride=0.010)
    except IndexError:
        raise ValueError("ERROR: Index error occurred while extracting mfcc")

    if verbose:
        print("mfcc_features shape:", mfcc_features.shape)

    # Number of audio clips = len(mfcc_features) // length of each audio clip
    number_of_audio_clips = len(mfcc_features) // AUDIO_TIME_STEPS

    if verbose:
        print("Number of audio clips:", number_of_audio_clips)

    # Don't consider the first MFCC feature, only consider the next 12 (Checked in syncnet_demo.m)
    # Also, only consider AUDIO_TIME_STEPS*number_of_audio_clips features
    mfcc_features = mfcc_features[:AUDIO_TIME_STEPS*number_of_audio_clips, 1:]

    # Reshape mfcc_features from (x, 12) to (x//20, 12, 20, 1)
    mfcc_features = np.expand_dims(np.transpose(np.split(mfcc_features, number_of_audio_clips), (0, 2, 1)), axis=-1)

    if verbose:
        print("Final mfcc_features shape:", mfcc_features.shape)
    return mfcc_features

错误:

第一次

mzillmmw

mzillmmw1#

从外观上看,您的音频文件包含两个通道,您可以通过查看wav.read函数返回的数组的形状来检查这两个通道:sig.shape .
speechpy.feature.mfcc函数需要一个单声道音频。我相信你能做的就是把你的音频转换成一个单声道,比如说把两个声道平均:

sig = np.mean(sig, axis=1)

如果您希望函数同时处理单通道和多通道数据,则仅当音频信号为多通道时,您才可以计算平均值:

if sig.ndim == 2:
    sig = np.mean(sig, axis=1)

相关问题