我已经采取了实时音频数据,并作为float32数组使用如何将其转换为azure AudioStreamInput
import numpy as np
import azure.cognitiveservices.speech as speechsdk
class NumpyAudioStream(speechsdk.audio.PullAudioInputStreamCallback):
def __init__(self, audio_array):
self.audio_array = audio_array
self.position = 0
def read(self, buffer, offset, count):
remaining = len(self.audio_array) - self.position
to_read = min(remaining, count)
buffer[:to_read] = self.audio_array[self.position:self.position+to_read]
self.position += to_read
return to_read
字符串
已尝试此操作,但发生'NumpyAudioStream' object has no attribute '_handle'
错误
如何创建Azure AudioInputStream
@socketio.on('audio_data')
def handle_audio_data(audioData):
float32_array = struct.unpack('f' * (len(audioData) // 4), audioData)
speech_config = speechsdk.SpeechConfig(subscription=os.environ.get('AZURE_SUBSCRIPTION_KEY'), region=os.environ.get('AZURE_REGION'))
stream = speechsdk.audio.PullAudioInputStream(NumpyAudioStream(float32_array))
audio_config = speechsdk.audio.AudioConfig(stream=stream)
recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
print(recognizer.recognize_once())
型
在这样做的时候,我得到了一个错误
Traceback (most recent call last): File "\py312env\Lib\site-packages\azure\cognitiveservices\speech\audio.py", line 213, in __read_callback File "\py312env\Lib\site-packages\azure\cognitiveservices\speech\audio.py", line 213, in __read_callback ^^^^^^^^^^^^^^ File "\py312env\Lib\site-packages\azure\cognitiveservices\speech\audio.py", line 213, in __read_callback TypeError: NumpyAudioStream.read() missing 2 required positional arguments: 'offset' and 'count' return obj.read(view) Exception ignored on calling ctypes callback function: <function PullAudioInputStream.__read_callback at 0x000001809EAB2F20> Traceback (most recent call last): ^^^^^^^^^^^^^^
型
为什么会出现错误?
1条答案
按热度按时间yrwegjxp1#
根据提供的信息,似乎需要格式化数据才能用作输入流。
以下是可以帮助解决问题的更改/修改。
1.在
__init__
方法中,float 32音频数组被转换为int 16,因为Azure Speech SDK需要int 16格式的音频数据。read
方法,缓冲区大小除以2,以说明int 16值占用2个字节的事实。这确保了从音频数组中读取正确数量的样本。1.使用
tobytes()
方法将音频数据写入缓冲区,该方法将int 16音频数组切片转换为字节数组。这是必要的,因为缓冲区需要字节数据。下面是修改后的代码供参考:
字符串
通过以上更改,我能够执行float 32音频数据文件并获得结果。x1c 0d1x