我有一个很长的视频,我希望只分离其中的一部分,并提取相应的分离部分的对数谱图。我正在使用moviepy
加载原始文件,该文件的格式为mp4
。然后使用subclip
提取相关部分,使用.audio
仅引用文件的音频。
有一个选项可以提取音频数据和采样率,就像使用librosa
加载时一样。音频数据和采样率提取的完整代码如下:
with VideoFileClip(input_file) as video:
# Use the first three seconds of the video
clip = video.subclip(0, 3)
# Get the audio data and sample rate
y = clip.audio.to_soundarray()
sr = clip.audio.fps
l = clip.audio.duration
print(f'y:{y.shape}, sr:{sr}, length:{l}')
其结果是:
>>> y:(132300, 2), sr:44100, length:3
接下来,我希望将上述数据转换为频谱图。当我尝试以下操作时,我的机器崩溃,或者我得到一个错误。
with VideoFileClip(input_file) as video:
# Trim video
clip = video.subclip(start_time_sec, end_time_sec)
# Get length of the trimed video
length = end_time_sec-start_time_sec
# Get the audio data and sample rate
y = clip.audio.to_soundarray()
sr = clip.audio.fps
l = clip.audio.duration
# Do something with the audio data
spectrogram = librosa.feature.melspectrogram(y=y, n_fft=2048, hop_length=512)
librosa.display.specshow(spectrogram, sr=sr)
------->
Output exceeds the size limit. Open the full output data in a text editor---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[33], line 16
14 # Do something with the audio data
15 spectrogram = librosa.feature.melspectrogram(y=y, n_fft=2048, hop_length=512)
---> 16 librosa.display.specshow(spectrogram, sr=sr)
File /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/librosa/display.py:1215, in specshow(data, x_coords, y_coords, x_axis, y_axis, sr, hop_length, n_fft, win_length, fmin, fmax, tuning, bins_per_octave, key, Sa, mela, thaat, auto_aspect, htk, unicode, intervals, unison, ax, **kwargs)
1211 x_coords = __mesh_coords(x_axis, x_coords, data.shape[1], **all_params)
1213 axes = __check_axes(ax)
-> 1215 out = axes.pcolormesh(x_coords, y_coords, data, **kwargs)
1217 __set_current_image(ax, out)
1219 # Set up axis scaling
File /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/matplotlib/__init__.py:1442, in _preprocess_data..inner(ax, data, *args, **kwargs)
1439 @functools.wraps(func)
1440 def inner(ax, *args, data=None, **kwargs):
1441 if data is None:
-> 1442 return func(ax, *map(sanitize_sequence, args), **kwargs)
1444 bound = new_sig.bind(ax, *args, **kwargs)
1445 auto_label = (bound.arguments.get(label_namer)
1446 or bound.kwargs.get(label_namer))
File /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/matplotlib/axes/_axes.py:6229, in Axes.pcolormesh(self, alpha, norm, cmap, vmin, vmax, shading, antialiased, *args, **kwargs)
6225 C = C.ravel()
...
1984 f"shading, A should have shape "
1985 f"{' or '.join(map(str, ok_shapes))}, not {A.shape}")
1986 return super().set_array(A)
ValueError: For X (129) and Y (132301) with flat shading, A should have shape (132300, 128, 3) or (132300, 128, 4) or (132300, 128) or (16934400,), not (132300, 128, 1)
最后,当我像下面的代码那样使用power_to_db
和plt.imshow
时
ps = librosa.feature.melspectrogram(y=y, sr=sr)
ps_db= librosa.power_to_db(ps)
# librosa.display.specshow(ps_db, x_axis='s', y_axis='log')
plt.imshow(ps_db, origin="lower", cmap=plt.get_cmap("magma"))
我得到以下不希望的结果:
是因为重叠的尺寸还是什么?
1条答案
按热度按时间6jygbczu1#
librosa的多声道格式是channels-first,而你的音频似乎是channels-last。尝试
y = clip.audio.to_soundarray().T
,转换它。此外,将立体mel光谱图传递到
librosa.display.specshow
也可能存在问题。如果可以在单声道中工作,则在进行处理之前使用y = librosa.to_mono(y)
转换音频。