如何创建Azure AudioInputStream

y53ybaqx  于 2023-11-21  发布在  其他
关注(0)|答案(1)|浏览(113)

我已经采取了实时音频数据,并作为float32数组使用如何将其转换为azure AudioStreamInput

import numpy as np
import azure.cognitiveservices.speech as speechsdk

class NumpyAudioStream(speechsdk.audio.PullAudioInputStreamCallback):
    def __init__(self, audio_array):
        self.audio_array = audio_array
        self.position = 0

    def read(self, buffer, offset, count):
        remaining = len(self.audio_array) - self.position
        to_read = min(remaining, count)
        buffer[:to_read] = self.audio_array[self.position:self.position+to_read]
        self.position += to_read
        return to_read

字符串
已尝试此操作,但发生'NumpyAudioStream' object has no attribute '_handle'错误
如何创建Azure AudioInputStream

@socketio.on('audio_data')
    def handle_audio_data(audioData):
    float32_array = struct.unpack('f' * (len(audioData) // 4), audioData)
    speech_config = speechsdk.SpeechConfig(subscription=os.environ.get('AZURE_SUBSCRIPTION_KEY'), region=os.environ.get('AZURE_REGION'))
    stream = speechsdk.audio.PullAudioInputStream(NumpyAudioStream(float32_array))
    audio_config = speechsdk.audio.AudioConfig(stream=stream)
    recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
    print(recognizer.recognize_once())


在这样做的时候,我得到了一个错误

Traceback (most recent call last): File "\py312env\Lib\site-packages\azure\cognitiveservices\speech\audio.py", line 213, in __read_callback File "\py312env\Lib\site-packages\azure\cognitiveservices\speech\audio.py", line 213, in __read_callback ^^^^^^^^^^^^^^ File "\py312env\Lib\site-packages\azure\cognitiveservices\speech\audio.py", line 213, in __read_callback TypeError: NumpyAudioStream.read() missing 2 required positional arguments: 'offset' and 'count' return obj.read(view) Exception ignored on calling ctypes callback function: <function PullAudioInputStream.__read_callback at 0x000001809EAB2F20> Traceback (most recent call last): ^^^^^^^^^^^^^^


为什么会出现错误?

yrwegjxp

yrwegjxp1#

根据提供的信息,似乎需要格式化数据才能用作输入流。
以下是可以帮助解决问题的更改/修改。
1.在__init__方法中,float 32音频数组被转换为int 16,因为Azure Speech SDK需要int 16格式的音频数据。

  1. read方法,缓冲区大小除以2,以说明int 16值占用2个字节的事实。这确保了从音频数组中读取正确数量的样本。
    1.使用tobytes()方法将音频数据写入缓冲区,该方法将int 16音频数组切片转换为字节数组。这是必要的,因为缓冲区需要字节数据。
    下面是修改后的代码供参考:
class NumpyAudioStream(speechsdk.audio.PullAudioInputStreamCallback):
    def __init__(self, audio_array):
        super().__init__()
        self.audio_array = (audio_array * np.iinfo(np.int16).max).astype(np.int16)
        self.position = 0

    def read(self, buffer: memoryview) -> int:
        remaining = len(self.audio_array) - self.position
        to_read = min(remaining, buffer.nbytes // 2)
        buffer[:to_read * 2] = self.audio_array[self.position:self.position+to_read].tobytes()
        self.position += to_read
        return to_read * 2

    def close(self) -> None:
        pass

字符串
通过以上更改,我能够执行float 32音频数据文件并获得结果。x1c 0d1x

相关问题