python 2个音频文件的相似性检查

ffscu2ro 于 2023-05-21 发布在 Python

关注(0)|答案(1)|浏览(363)

我想问一个关于语音相似度检查的问题。假设我有2个音频文件，其中包含相同的单词，由2个不同的扬声器录制，我想验证这2个音频文件是否相似，但我不想继续进行语音到文本（因为有些音频文件没有有意义的单词）。
我在预处理音频后提取mfccs向量并应用DTW（动态时间扭曲），我对同一音频（参考与参考）获得了0相似度分数，但当我将其应用于由2个不同扬声器录制的2个音频文件时，我获得了高相似度分数（表明它们不相似）。谁能给我建议一个解决这个问题的方法？我的方法有什么错误？下面是重新采样信号后的代码：

`from pydub import AudioSegment, silence

# Load the audio file 
audio_file = AudioSegment.from_wav('C://Users//10Rs6//Desktop//testapb.wav')

# Set the minimum length of a non-silent segment
min_silence_len = 100 # in milliseconds

# Set the threshold for detecting silence
silence_thresh = -25 # in dBFS

# Split the audio into non-silent segments
non_silent_segments = silence.split_on_silence(audio_file, 
                                                min_silence_len=min_silence_len, 
                                                silence_thresh=silence_thresh)

# Concatenate the non-silent segments into a new audio file
trimmed_audio = AudioSegment.empty()
for segment in non_silent_segments:
    trimmed_audio += segment

# Export the trimmed audio file
trimmed_audio.export('C://Users//10Rs6//Desktop//trimmed_audio5.wav', format='wav')

def preemphasis(signal, alpha=0.97):
    """
    Applies a pre-emphasis filter on the input signal.

    Parameters:
        signal (array-like): The input signal to filter.
        alpha (float): The pre-emphasis coefficient. Default is 0.97.

    Returns:
        The filtered signal.
    """
    return lfilter([1, -alpha], [1], signal)
pre_emphasised_test=preemphasis(resampled_audio_test)
pre_emphasised_ref=preemphasis(resampled_audio_ref)
normalized_test = librosa.util.normalize(pre_emphasised_test)
normalized_ref=librosa.util.normalize(pre_emphasised_ref)
# extract MFCCs
mfccsT = librosa.feature.mfcc(y=pre_emphasised_test, sr=41100, n_mfcc=13)

# normalize MFCCs
mfccsT = np.mean(mfccsT.T, axis=0)

# print MFCCs vector
print(mfccsT)
mfccsT.shape
# extract MFCCs
mfccsR = librosa.feature.mfcc(y=pre_emphasised_ref, sr=41100, n_mfcc=13)

# normalize MFCCs
mfccsR = np.mean(mfccsR.T, axis=0)

# print MFCCs vector
print(mfccsR)
mfccsR.shape
# assuming your MFCCs are in a variable called mfccs
# reshape to a 2D array
mfccsT_2d = np.reshape(mfccsT, (mfccsT.shape[0], -1))

# normalize the MFCCs
scaler = StandardScaler()
scaler.fit(mfccsT_2d)
normalized_mfccsT_2d = scaler.transform(mfccsT_2d)

# reshape back to the original shape
normalized_mfccsT = np.reshape(normalized_mfccsT_2d, mfccsT.shape)
print(normalized_mfccsT)
# assuming your MFCCs are in a variable called mfccs
# reshape to a 2D array
mfccsR_2d = np.reshape(mfccsR, (mfccsR.shape[0], -1))

# normalize the MFCCs
scaler = StandardScaler()
scaler.fit(mfccsR_2d)
normalized_mfccsR_2d = scaler.transform(mfccsR_2d)

# reshape back to the original shape
normalized_mfccsR = np.reshape(normalized_mfccsR_2d, mfccsR.shape)
print(normalized_mfccsR)
from dtw import dtw

normalized_mfccsT = normalized_mfccsT.reshape(-1, 1)
normalized_mfccsR = normalized_mfccsR.reshape(-1, 1)
from dtw import dtw

# Here, we use L2 norm as the element comparison distance
l2_norm = lambda normalized_mfccsT, normalized_mfccsR: (normalized_mfccsT - normalized_mfccsR) ** 2

dist, cost_matrix, acc_cost_matrix, path = dtw(normalized_mfccsT, normalized_mfccsR, dist=l2_norm)

dist`

谢谢

来源：https://stackoverflow.com/questions/76129794/similarity-check-of-2-audio-files

1条答案

按热度按时间

MFCC值不是语音 * 内容 * 相似性的良好表示，因为仍然存在许多“声学”信息。两个不同的人说同一个词会有很大的不同。甚至是同一个说话者用两个不同的麦克风录制，或者在两个不同的位置录制（特别是混响）。这里期望的是对设备/环境/噪声变化鲁棒的说话者无关表示。一个好的自动语音识别（ASR）系统总是具有这种特性。对于一些系统，可以获得学习的矢量表示，而不仅仅是预测的文本序列。
在这样的特征向量序列之上，将创建相似性度量。可能首先使用像PCA这样的投影来降低特征维度。然后可以尝试动态时间扭曲。

Wav2Vec

Wav2Vec是一个自监督语音模型。它通常用作各种语音和非语音音频任务的特征提取器。Huggingface transformers库在Wav 2 Vec 2FeatureExtractor中有一个很好的简单易用的实现。
异特龙
Allosaurus是一个预训练的通用phone识别器。它输出一个音素的矢量表示，这应该适用于世界上任何语言，并且可能对非文本语音也很有效。

赞(0）回复(0）举报 2023-05-21

相关问题

热门标签

Java query python Node 开发语言 request Util 数据库 Table 后端算法 Logger Message Element Parser

最新问答

xxl-job 安全组扫描到执行器端口服务存在信息泄露漏洞
回答(1) 发布于 3个月前
xxl-job 不能和nacos兼容？
回答(3) 发布于 3个月前
xxl-job 任务执行完后无法结束，日志一直转圈
回答(3) 发布于 3个月前
xxl-job-admin页面上查看调度日志样式问题
回答(1) 发布于 3个月前
xxl-job 参数512字符限制能否去掉
回答(1) 发布于 3个月前