scipy 如何在Python中关联两个音频事件(检测它们是否相似)

alen0pnh 于 2022-11-09 发布在 Python

关注(0)|答案(2)|浏览(134)

对于我的项目，我必须检测两个音频文件是否相似，以及第一个音频文件何时包含在第二个音频文件中。我的问题是，我试图使用librosa the numpy.correlate。我不知道我是否以正确的方式这样做。我如何才能检测音频是否包含在另一个音频文件中？

import librosa
import numpy
long_audio_series, long_audio_rate = librosa.load("C:\\Users\\Jerry\\Desktop\\long_file.mp3")
short_audio_series, short_audio_rate = librosa.load("C:\\Users\\Jerry\\Desktop\\short_file.mka")

for long_stream_id, long_stream in enumerate(long_audio_series):
    for short_stream_id, short_stream in enumerate(short_audio_series):
        print(numpy.correlate(long_stream, short_stream))

scipy

来源：https://stackoverflow.com/questions/57317733/how-to-correlate-two-audio-events-detect-if-they-are-similar-in-python

2条答案

按热度按时间

hwamh0ep1#

简单地比较音频信号long_audio_series和short_audio_series可能不会起作用。我建议做的是 * 音频指纹识别 ，更准确地说，本质上是Shazam的穷人版。当然有patent和paper，但你可能想从this very readable description开始。这是中心图像， 星座图 *（CM），摘自文章：

如果你不想扩展到很多歌曲，你可以跳过整个散列部分，集中精力寻找峰值。
所以你需要做的是：
1.创建功率谱图（使用librosa.core.stft很容易）。
1.在所有文件中查找局部峰（可通过scipy.ndimage.filters.maximum_filter完成），以创建CM，即仅包含峰的2D图像。生成的CM通常为二进制，即包含0（无峰）和1（有峰）。
1.在每个数据库CM（基于long_audio_series）上滑动查询CM（基于short_audio_series）。对于每个时间步长，计数有多少“星”（即1）对齐，并将计数与滑动偏移（基本上是长音频中短音频的位置）沿着存储。
1.选择最大计数并返回相应的短音频和长音频中的位置。您将需要转换frame numbers back to seconds。
“幻灯片”示例（未测试的示例代码）：

import numpy as np

scores = {}
cm_short = ...  # 2d constellation map for the short audio
cm_long = ...   # 2d constellation map for the long audio

# we assume that dim 0 is the time frame

# and dim 1 is the frequency bin

# both CMs contains only 0 or 1

frames_short = cm_short.shape[0]
frames_long = cm_long.shape[0]
for offset in range(frames_long-frames_short):
    cm_long_excerpt = cm_long[offset:offset+frames_short]
    score = np.sum(np.multiply(cm_long_excerpt, cm_short))
    scores[offset] = score

# TODO: find the highest score in "scores" and

# convert its offset back to seconds

现在，如果你的数据库很大，这将导致太多的比较，你还必须实现散列方案，这也是我在上面链接的文章中描述的。
请注意，所描述的过程仅匹配 * 相同 * 的录音，但允许出现噪音和轻微失真。如果这不是您想要的，请将 * 相似性 * 定义得更好一些，因为这可能是所有类型的内容（鼓模式、和弦序列、乐器......）。以下是一种基于DSP的经典方法，可用于查找这些功能的相似性：提取短帧（例如256个样本）的适当特征，然后计算相似度。例如，如果您对谐波内容感兴趣，则可以提取chroma vectors，然后计算色度矢量之间的距离，例如：余弦距离。当您计算数据库信号中的每个帧与查询信号中的每个帧的相似性时，您最终会得到类似于self similarity matrix (SSM)或recurrence matrix (RM)。SSM/RM中的对角线通常表示类似的部分。

赞(0）回复(0）举报 2022-11-09

qij5mzcb2#

我猜您只需要找到偏移量，但无论哪种方法，都需要先找到相似性，然后再找到从短文件到长文件的偏移量

测量相似性

首先你需要将它们解码成PCM，并确保它有特定的采样率，你可以事先选择（例如16KHz）。你需要重新采样具有不同采样率的歌曲。高采样率不是必需的，因为你需要模糊比较，但太低的采样率会丢失太多的细节。
您可以使用下列程式码：

ffmpeg -i audio1.mkv -c:a pcm_s24le output1.wav
ffmpeg -i audio2.mkv -c:a pcm_s24le output2.wav

下面是一个代码，它使用python从两个音频文件中获取一个从0到100的相似性数字，它的工作原理是从音频文件中生成指纹，并使用交叉相关性对它们进行比较
它需要安装Chromaprint和FFMPEG，而且它不适用于短音频文件，如果这是一个问题，你可以随时降低音频的速度，就像在这个guide，要知道这是要添加一点噪音。


# correlation.py

import subprocess
import numpy

# seconds to sample audio file for

sample_time = 500# number of points to scan cross correlation over
span = 150# step size (in points) of cross correlation
step = 1# minimum number of points that must overlap in cross correlation

# exception is raised if this cannot be met

min_overlap = 20# report match when cross correlation has a peak exceeding threshold
threshold = 0.5

# calculate fingerprint

def calculate_fingerprints(filename):
    fpcalc_out = subprocess.getoutput('fpcalc -raw -length %i %s' % (sample_time, filename))
    fingerprint_index = fpcalc_out.find('FINGERPRINT=') + 12
    # convert fingerprint to list of integers
    fingerprints = list(map(int, fpcalc_out[fingerprint_index:].split(',')))      
    return fingerprints  
    # returns correlation between lists
def correlation(listx, listy):
    if len(listx) == 0 or len(listy) == 0:
        # Error checking in main program should prevent us from ever being
        # able to get here.     
        raise Exception('Empty lists cannot be correlated.')    
    if len(listx) > len(listy):     
        listx = listx[:len(listy)]  
    elif len(listx) < len(listy):       
        listy = listy[:len(listx)]      

    covariance = 0  
    for i in range(len(listx)):     
        covariance += 32 - bin(listx[i] ^ listy[i]).count("1")  
    covariance = covariance / float(len(listx))     
    return covariance/32  
    # return cross correlation, with listy offset from listx
def cross_correlation(listx, listy, offset):    
    if offset > 0:      
        listx = listx[offset:]      
        listy = listy[:len(listx)]  
    elif offset < 0:        
        offset = -offset        
        listy = listy[offset:]      
        listx = listx[:len(listy)]  
    if min(len(listx), len(listy)) < min_overlap:       
    # Error checking in main program should prevent us from ever being      
    # able to get here.     
        return   
    #raise Exception('Overlap too small: %i' % min(len(listx), len(listy))) 
    return correlation(listx, listy)  
    # cross correlate listx and listy with offsets from -span to span
def compare(listx, listy, span, step):  
    if span > min(len(listx), len(listy)):      
    # Error checking in main program should prevent us from ever being      
    # able to get here.     
        raise Exception('span >= sample size: %i >= %i\n' % (span, min(len(listx), len(listy))) + 'Reduce span, reduce crop or increase sample_time.')

    corr_xy = []    
    for offset in numpy.arange(-span, span + 1, step):      
        corr_xy.append(cross_correlation(listx, listy, offset)) 
    return corr_xy  
    # return index of maximum value in list
def max_index(listx):   
    max_index = 0   
    max_value = listx[0]    
    for i, value in enumerate(listx):       
        if value > max_value:           
            max_value = value           
            max_index = i   
    return max_index  

def get_max_corr(corr, source, target): 
    max_corr_index = max_index(corr)    
    max_corr_offset = -span + max_corr_index * step 
    print("max_corr_index = ", max_corr_index, "max_corr_offset = ", max_corr_offset)
    # report matches    
    if corr[max_corr_index] > threshold:        
        print(('%s and %s match with correlation of %.4f at offset %i' % (source, target, corr[max_corr_index], max_corr_offset))) 

def correlate(source, target):  
    fingerprint_source = calculate_fingerprints(source) 
    fingerprint_target = calculate_fingerprints(target)     
    corr = compare(fingerprint_source, fingerprint_target, span, step)  
    max_corr_offset = get_max_corr(corr, source, target)  

if __name__ == "__main__":    
    correlate(SOURCE_FILE, TARGET_FILE)

转换为python 3的代码来自：https://shivama205.medium.com/audio-signals-comparison-23e431ed2207

查找偏移量

像前面一样，您需要将它们解码为PCM，并确保它具有特定的采样率。
同样，您可以使用以下代码来实现此目的：

ffmpeg -i audio1.mkv -c:a pcm_s24le output1.wav
ffmpeg -i audio2.mkv -c:a pcm_s24le output2.wav

然后，您可以使用以下代码，它将PCM数据归一化（即，找到最大样本值并重新调整所有样本，以便具有最大幅度的样本使用数据格式的整个动态范围），然后将其转换到频谱域（FFT），并使用互相关找到峰值，最终在几秒钟内返回偏移
根据您的情况，您可能希望避免规范化PCM数据，这样您就需要稍微更改下面的代码

import argparse

import librosa
import numpy as np
from scipy import signal

def find_offset(within_file, find_file, window):
    y_within, sr_within = librosa.load(within_file, sr=None)
    y_find, _ = librosa.load(find_file, sr=sr_within)

    c = signal.correlate(y_within, y_find[:sr_within*window], mode='valid', method='fft')
    peak = np.argmax(c)
    offset = round(peak / sr_within, 2)

    return offset

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('--find-offset-of', metavar='audio file', type=str, help='Find the offset of file')
    parser.add_argument('--within', metavar='audio file', type=str, help='Within file')
    parser.add_argument('--window', metavar='seconds', type=int, default=10, help='Only use first n seconds of a target audio')
    args = parser.parse_args()
    offset = find_offset(args.within, args.find_offset_of, args.window)
    print(f"Offset: {offset}s" )

if __name__ == '__main__':
    main()

来源及进一步解释：https://dev.to/hiisi13/find-an-audio-within-another-audio-in-10-lines-of-python-1866
然后，您需要根据具体情况将这两段代码组合在一起，可能您只想在音频相似的情况下找到偏移量，或者相反。

赞(0）回复(0）举报 2022-11-09

我来回答

scipy 如何在Python中关联两个音频事件(检测它们是否相似)

2条答案

测量相似性

查找偏移量

相关问题

热门标签

最新问答