azure 尝试构建一个应用程序,将真实的语音转换为文本

omhiaaxx  于 12个月前  发布在  其他
关注(0)|答案(1)|浏览(136)

我一直在尝试创建一个真实的时间语音文本使用react js作为前端和python flask作为后端.使用套接字连接这些之间发送真实的时间数据我已经尝试了很多方法,但数据没有正确转换或没有结果打印输出. python flask接收音频数据连续作为字节和使用pushAudioStream在azure python语音sdk创建一个流的AudioInputStream类,并给出了它来配置Azure语音Python SDK / speechRecognizer的conversation_transcriber。但结果并不令人满意,请帮助提供合适的解决方案
我需要作为一个文本输出,我已经作为语音输入使用的前端和 flask 作为后端的ReactJS

brqmpdu1

brqmpdu11#

尝试构建一个应用程序,将真实的语音转换为文本
下面的代码是针对使用React作为前端、Flask作为后端和Socket. IO的语音转文本应用程序的。

  • 此示例用于使用Azure语音到文本转录音频的实现。
from flask import Flask, render_template
from flask_socketio import SocketIO
from azure.cognitiveservices.speech import SpeechConfig, ResultReason
from azure.cognitiveservices.speech.audio import AudioConfig, AudioStreamFormat, PullAudioInputStreamCallback
import io
import numpy as np

app = Flask(__name__)
socketio = SocketIO(app)

# Set up your Speech Config
speech_config = SpeechConfig(subscription="AzureSpeechKey", region="AzureSpeechregion")

class StreamBuffer(PullAudioInputStreamCallback):
    def __init__(self, stream):
        super().__init__()
        self.stream = stream
        self.format = AudioStreamFormat(stream.sample_rate, stream.bits_per_sample, stream.channel_count)

    def read(self, buffer_size: int):
        data = self.stream.read(buffer_size)
        return data, len(data)

@socketio.on('audio')
def handle_audio(audio_data):
    audio_stream = io.BytesIO(audio_data)
    stream_buffer = StreamBuffer(audio_stream)

    # Configure your speech_recognizer
    speech_recognizer = speech_config.create_speech_recognizer()
    audio_config = AudioConfig(stream=stream_buffer)
    speech_recognizer.set_audio_config(audio_config)

    # Process audio stream
    result = speech_recognizer.recognize_once()

    # Emit the result back to the frontend
    if result.reason == ResultReason.RecognizedSpeech:
        socketio.emit('transcription', result.text)
    elif result.reason == ResultReason.NoMatch:
        socketio.emit('transcription', "No speech could be recognized")
    elif result.reason == ResultReason.Canceled:
        cancellation_details = result.cancellation_details
        socketio.emit('transcription', "Speech Recognition canceled: {}".format(cancellation_details.reason))

if __name__ == '__main__':
    socketio.run(app, debug=True)

个字符


的数据



相关问题