python 将www.example.com转换moviepy.audio为长度为0 mel的对数光谱图结果

35g0bw71  于 2023-05-21  发布在  Python
关注(0)|答案(1)|浏览(152)

我有一个很长的视频,我希望只分离其中的一部分,并提取相应的分离部分的对数谱图。我正在使用moviepy加载原始文件,该文件的格式为mp4。然后使用subclip提取相关部分,使用.audio仅引用文件的音频。
有一个选项可以提取音频数据和采样率,就像使用librosa加载时一样。音频数据和采样率提取的完整代码如下:

with VideoFileClip(input_file) as video:
        
    # Use the first three seconds of the video
    clip = video.subclip(0, 3)
    
    # Get the audio data and sample rate
    y = clip.audio.to_soundarray()
    sr = clip.audio.fps
    l = clip.audio.duration

print(f'y:{y.shape}, sr:{sr}, length:{l}')

其结果是:

>>>  y:(132300, 2), sr:44100, length:3

接下来,我希望将上述数据转换为频谱图。当我尝试以下操作时,我的机器崩溃,或者我得到一个错误。

with VideoFileClip(input_file) as video:
        
    # Trim video
    clip = video.subclip(start_time_sec, end_time_sec)
    
    # Get length of the trimed video
    length = end_time_sec-start_time_sec

    # Get the audio data and sample rate
    y = clip.audio.to_soundarray()
    sr = clip.audio.fps
    l = clip.audio.duration

    # Do something with the audio data
    spectrogram = librosa.feature.melspectrogram(y=y, n_fft=2048, hop_length=512)
    librosa.display.specshow(spectrogram, sr=sr)

------->

Output exceeds the size limit. Open the full output data in a text editor---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[33], line 16
     14 # Do something with the audio data
     15 spectrogram = librosa.feature.melspectrogram(y=y, n_fft=2048, hop_length=512)
---> 16 librosa.display.specshow(spectrogram, sr=sr)

File /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/librosa/display.py:1215, in specshow(data, x_coords, y_coords, x_axis, y_axis, sr, hop_length, n_fft, win_length, fmin, fmax, tuning, bins_per_octave, key, Sa, mela, thaat, auto_aspect, htk, unicode, intervals, unison, ax, **kwargs)
   1211 x_coords = __mesh_coords(x_axis, x_coords, data.shape[1], **all_params)
   1213 axes = __check_axes(ax)
-> 1215 out = axes.pcolormesh(x_coords, y_coords, data, **kwargs)
   1217 __set_current_image(ax, out)
   1219 # Set up axis scaling

File /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/matplotlib/__init__.py:1442, in _preprocess_data..inner(ax, data, *args, **kwargs)
   1439 @functools.wraps(func)
   1440 def inner(ax, *args, data=None, **kwargs):
   1441     if data is None:
-> 1442         return func(ax, *map(sanitize_sequence, args), **kwargs)
   1444     bound = new_sig.bind(ax, *args, **kwargs)
   1445     auto_label = (bound.arguments.get(label_namer)
   1446                   or bound.kwargs.get(label_namer))

File /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/matplotlib/axes/_axes.py:6229, in Axes.pcolormesh(self, alpha, norm, cmap, vmin, vmax, shading, antialiased, *args, **kwargs)
   6225     C = C.ravel()
...
   1984             f"shading, A should have shape "
   1985             f"{' or '.join(map(str, ok_shapes))}, not {A.shape}")
   1986 return super().set_array(A)

ValueError: For X (129) and Y (132301) with flat shading, A should have shape (132300, 128, 3) or (132300, 128, 4) or (132300, 128) or (16934400,), not (132300, 128, 1)

最后,当我像下面的代码那样使用power_to_dbplt.imshow

ps = librosa.feature.melspectrogram(y=y, sr=sr)
ps_db= librosa.power_to_db(ps)
# librosa.display.specshow(ps_db, x_axis='s', y_axis='log')
plt.imshow(ps_db, origin="lower", cmap=plt.get_cmap("magma"))

我得到以下不希望的结果:

是因为重叠的尺寸还是什么?

6jygbczu

6jygbczu1#

librosa的多声道格式是channels-first,而你的音频似乎是channels-last。尝试y = clip.audio.to_soundarray().T,转换它。
此外,将立体mel光谱图传递到librosa.display.specshow也可能存在问题。如果可以在单声道中工作,则在进行处理之前使用y = librosa.to_mono(y)转换音频。

相关问题