scipy Assert错误：信号维数应为(N，)格式，但实际上为(743424，2)

hgncfbus 于 2022-11-23 发布在其他

关注(0)|答案(1)|浏览(93)

对于我的ML项目，我使用一个模型，我给一个视频和音频作为输入文件，以检测视频中的合成语音。
但是它在audio_processing（）函数上返回一个错误：

音频处理（）的代码

def audio_processing(wav_file, verbose=True):

    rate, sig = wav.read(wav_file)
    if verbose:
        print("Sig length: {}, sample_rate: {}".format(len(sig), rate))

    try:
        mfcc_features = speechpy.feature.mfcc(sig, sampling_frequency=rate, frame_length=0.010, frame_stride=0.010)
    except IndexError:
        raise ValueError("ERROR: Index error occurred while extracting mfcc")

    if verbose:
        print("mfcc_features shape:", mfcc_features.shape)

    # Number of audio clips = len(mfcc_features) // length of each audio clip
    number_of_audio_clips = len(mfcc_features) // AUDIO_TIME_STEPS

    if verbose:
        print("Number of audio clips:", number_of_audio_clips)

    # Don't consider the first MFCC feature, only consider the next 12 (Checked in syncnet_demo.m)
    # Also, only consider AUDIO_TIME_STEPS*number_of_audio_clips features
    mfcc_features = mfcc_features[:AUDIO_TIME_STEPS*number_of_audio_clips, 1:]

    # Reshape mfcc_features from (x, 12) to (x//20, 12, 20, 1)
    mfcc_features = np.expand_dims(np.transpose(np.split(mfcc_features, number_of_audio_clips), (0, 2, 1)), axis=-1)

    if verbose:
        print("Final mfcc_features shape:", mfcc_features.shape)
    return mfcc_features

错误：

第一次

scipy

来源：https://stackoverflow.com/questions/74516426/assertionerror-signal-dimention-should-be-of-the-format-of-n-but-it-is-7434

1条答案

按热度按时间

mzillmmw1#

从外观上看，您的音频文件包含两个通道，您可以通过查看wav.read函数返回的数组的形状来检查这两个通道：sig.shape .
speechpy.feature.mfcc函数需要一个单声道音频。我相信你能做的就是把你的音频转换成一个单声道，比如说把两个声道平均：

sig = np.mean(sig, axis=1)

如果您希望函数同时处理单通道和多通道数据，则仅当音频信号为多通道时，您才可以计算平均值：

if sig.ndim == 2:
    sig = np.mean(sig, axis=1)

赞(0）回复(0）举报 2022-11-23

我来回答

scipy Assert错误：信号维数应为(N，)格式，但实际上为(743424，2)

1条答案

相关问题

热门标签

最新问答