Azure认知服务语音SDK Python:使用合成回调的跳动声音

cgvd09ve  于 2023-08-02  发布在  Python
关注(0)|答案(1)|浏览(134)

使用synthesizing回调,我们如何正确地将音频数据流到一个文件?我想写一个文件,只要音频数据发生,这不是我的最终意图,但如果这工作,我可以继续与更多的能力后。
我必须使用synthesizing回调。
在下面的代码中server_bad_audio有一个跳动的声音,server_audio是所有好的。
这里有什么问题吗?有什么提示吗?

audio_queue = asyncio.Queue()
async def send_audio(self, queue):
        with wave.open("server_bad_audio.wav", "wb") as wav_file:
            wav_file.setnchannels(1)
            wav_file.setsampwidth(SAMPLE_WIDTH)
            wav_file.setframerate(FRAME_RATE)
            while True:
                audio_data = await queue.get()
                if audio_data is None:
                    break
                self.logger.info('Sending audio chunk of length {}'.format(len(audio_data)))
                wav_file.writeframes(audio_data)
def synthesize_callback(evt: SpeechSynthesisEventArgs):
            audio = evt.result.audio_data
            self.logger.info('Audio chunk received of length {}, duration {}'.format(len(audio), evt.result.audio_duration))
            audio_queue.put_nowait(audio)
...
audio_config = AudioOutputConfig(filename="server_audio.wav")
        synthesizer = SpeechSynthesizer(speech_config=self.speech_config, audio_config=audio_config)

synthesizer.synthesizing.connect(synthesize_callback)
result = synthesizer.speak_ssml_async(ssml_text).get()
...
audio_queue.put_nowait(None)
await send_audio_task

字符串

x759pob2

x759pob21#

问题是WAV文件格式要求在写入音频数据本身之前,用正确的音频属性写入头。

  • 修改send_audio函数,在写入音频数据之前写入WAV文件头。使用audio_queue调用send_audio函数。现在音频数据将通过回调接收。
import asyncio
import wave
import logging
import azure.cognitiveservices.speech as speechsdk

# Replace these with your Azure Speech Service credentials
SUBSCRIPTION_KEY = "YOUR_SUBSCRIPTION_KEY"
REGION = "YOUR_REGION"

# Global variables for audio properties
SAMPLE_WIDTH = 2  # 2 bytes per sample (16-bit audio)
FRAME_RATE = 16000  # 16 kHz sample rate

# Create a logger
logger = logging.getLogger("audio_logger")
logger.setLevel(logging.INFO)
handler = logging.StreamHandler()
formatter = logging.Formatter("%(asctime)s - %(levelname)s - %(message)s")
handler.setFormatter(formatter)
logger.addHandler(handler)

# Audio queue to hold audio data
audio_queue = asyncio.Queue()

async def send_audio(queue):
    with wave.open("generated_audio.wav", "wb") as wav_file:
        wav_file.setnchannels(1)
        wav_file.setsampwidth(SAMPLE_WIDTH)
        wav_file.setframerate(FRAME_RATE)

        while True:
            audio_data = await queue.get()
            if audio_data is None:
                # Break the loop when None is received to stop writing to the file.
                break
            logger.info('Writing audio chunk of length {}'.format(len(audio_data)))

            # Write the audio data to the file.
            wav_file.writeframes(audio_data)

async def synthesize_callback(evt: speechsdk.SpeechSynthesisEventArgs):
    audio = evt.result.audio_data
    logger.info('Audio chunk received of length {}, duration {}'.format(len(audio), evt.result.audio_duration))
    audio_queue.put_nowait(audio)

async def main():
    # Create an instance of the SpeechConfig with your subscription key and region
    speech_config = speechsdk.SpeechConfig(subscription=SUBSCRIPTION_KEY, region=REGION)

    # Create an instance of the SpeechSynthesizer with the SpeechConfig
    synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config)

    # Connect the callback
    synthesizer.synthesizing.connect(synthesize_callback)

    # SSML text to be synthesized
    ssml_text = "<speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xml:lang='en-US'> \
                    <voice name='en-US-JennyNeural'> \
                        Butta bomma, Butta bomma, nannu suttukuntiveyyy, Zindagi ke atta bommaiey. \
                        Janta kattu kuntiveyyy. \
                    </voice> \
                </speak>"

    # Create a task to run the send_audio() coroutine concurrently with the main() function.
    audio_task = asyncio.create_task(send_audio(audio_queue))

    # Start the synthesis process
    result = synthesizer.speak_ssml_async(ssml_text).get()

    # Signal the audio_queue to stop writing to the file
    audio_queue.put_nowait(None)

    # Wait for the send_audio() task to complete
    await audio_task

if __name__ == "__main__":
    asyncio.run(main())

字符串

  • 语音合成器连接合成回调,并启动SSML文本合成。
  • synthesize_callback()函数将接收音频块,send_audio()函数将音频数据流传输到WAV文件。

下面的声明将帮助您确定问题是否存在于收到的音频数据或WAV文件创建中。

async def synthesize_callback(evt: speechsdk.SpeechSynthesisEventArgs):
    audio = evt.result.audio_data
    logger.info('Audio chunk received of length {}, duration {}'.format(len(audio), evt.result.audio_duration))
    # Debug statement: Save the received audio to a file for inspection (optional)
    with open("received_audio.wav", "wb") as f:
        f.write(audio)
    audio_queue.put_nowait(audio)


检查wav文件是否在同一应用程序目录中生成。


的数据

输出:


相关问题