在java中使用麦克风进行VOSK语音识别

hrirmatl  于 2023-01-11  发布在  Java
关注(0)|答案(2)|浏览(333)

我正尝试在java项目中加入实时语音识别(最好是离线),通过搜索和尝试其他解决方案,我决定使用VOSK进行语音识别,然而,我遇到的主要问题是VOSK的文档很少,并且只有一个java示例文件,用于从预先录制的wav文件中提取文本,如下所示。

public static void main(String[] argv) throws IOException, UnsupportedAudioFileException {
        LibVosk.setLogLevel(LogLevel.DEBUG);

        try (Model model = new Model("src\\main\\resources\\model");
                    InputStream ais = AudioSystem.getAudioInputStream(new BufferedInputStream(new FileInputStream("src\\main\\resources\\python_example_test.wav")));
                    Recognizer recognizer = new Recognizer(model, 16000)) {

            int nbytes;
            byte[] b = new byte[4096];
            while ((nbytes = ais.read(b)) >= 0) {
                System.out.println(nbytes);
                if (recognizer.acceptWaveForm(b, nbytes)) {
                    System.out.println(recognizer.getResult());
                } else {
                    System.out.println(recognizer.getPartialResult());
                }
            }

            System.out.println(recognizer.getFinalResult());
        }
    }

我尝试将其转换为可以接受麦克风音频的内容,如下所示:

public static void main(String[] args) {
        LibVosk.setLogLevel(LogLevel.DEBUG);
        AudioFormat format = new AudioFormat(8000.0f, 16, 1, true, true);
        TargetDataLine microphone;
        SourceDataLine speakers;

        try (Model model = new Model("src\\main\\resources\\model");
                Recognizer recognizer = new Recognizer(model, 16000)) {
            try {
                microphone = AudioSystem.getTargetDataLine(format);

                DataLine.Info info = new DataLine.Info(TargetDataLine.class, format);
                microphone = (TargetDataLine) AudioSystem.getLine(info);
                microphone.open(format);
                microphone.start();
                
                ByteArrayOutputStream out = new ByteArrayOutputStream();
                int numBytesRead;
                int CHUNK_SIZE = 1024;
                int bytesRead = 0;
                
                DataLine.Info dataLineInfo = new DataLine.Info(SourceDataLine.class, format);
                speakers = (SourceDataLine) AudioSystem.getLine(dataLineInfo);
                speakers.open(format);
                speakers.start();
                byte[] b = new byte[4096];

                while (bytesRead <= 100000) {
                    numBytesRead = microphone.read(b, 0, CHUNK_SIZE);
                    bytesRead += numBytesRead;
                    
                    out.write(b, 0, numBytesRead); 

                    speakers.write(b, 0, numBytesRead);

                    if (recognizer.acceptWaveForm(b, numBytesRead)) {
                        System.out.println(recognizer.getResult());
                    } else {
                        System.out.println(recognizer.getPartialResult());
                    }
                }
                System.out.println(recognizer.getFinalResult());
                speakers.drain();
                speakers.close();
                microphone.close();
            } catch (Exception e) {
                e.printStackTrace();
            }

        }

    }

这看起来是正确的捕获麦克风数据正确(因为它也输出到扬声器),但VOSK显示没有输入,不断打印结果为空字符串。我做错了什么?我正在尝试甚至可能吗?我应该尝试找到一个不同的库语音识别?

mzaanser

mzaanser1#

这个代码工作正确为我你可以使用这个:

public static void main(String[] args) {
    
    LibVosk.setLogLevel(LogLevel.DEBUG);
    
    AudioFormat format = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED, 60000, 16, 2, 4, 44100, false);
    DataLine.Info info = new DataLine.Info(TargetDataLine.class, format);
    TargetDataLine microphone;
    SourceDataLine speakers;

    try (Model model = new Model("model");
         Recognizer recognizer = new Recognizer(model, 120000)) {
        try {

            microphone = (TargetDataLine) AudioSystem.getLine(info);
            microphone.open(format);
            microphone.start();

            ByteArrayOutputStream out = new ByteArrayOutputStream();
            int numBytesRead;
            int CHUNK_SIZE = 1024;
            int bytesRead = 0;

            DataLine.Info dataLineInfo = new DataLine.Info(SourceDataLine.class, format);
            speakers = (SourceDataLine) AudioSystem.getLine(dataLineInfo);
            speakers.open(format);
            speakers.start();
            byte[] b = new byte[4096];

            while (bytesRead <= 100000000) {
                numBytesRead = microphone.read(b, 0, CHUNK_SIZE);
                bytesRead += numBytesRead;

                out.write(b, 0, numBytesRead);

                speakers.write(b, 0, numBytesRead);

                if (recognizer.acceptWaveForm(b, numBytesRead)) {
                    System.out.println(recognizer.getResult());
                } else {
                    System.out.println(recognizer.getPartialResult());
                }
            }
            System.out.println(recognizer.getFinalResult());
            speakers.drain();
            speakers.close();
            microphone.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}
toiithl6

toiithl62#

我没有得到任何来自Squalmals或user 11370465的代码片段。
当使用您自己的音频文件时,请确保它具有正确的格式- PCM 16 khz 16 bit单声道。
下面的代码可以在我的系统Linux Mint 20和OpenJDK 11上运行。

public static void main(String[] argv) throws Exception{
    LibVosk.setLogLevel(LogLevel.DEBUG);

    AudioFormat format = new AudioFormat(16000f, 16, 1, true, false);
    DataLine.Info info = new DataLine.Info(TargetDataLine.class, format);
    TargetDataLine microphone;

    Model model = new Model("my-path/vosk-model-small-en-us-0.15");
    Recognizer recognizer = new Recognizer(model, 16000);

    microphone = (TargetDataLine)AudioSystem.getLine(info);
    microphone.open(format);
    microphone.start();

    int numBytesRead;
    int CHUNK_SIZE = 4096;
    int bytesRead = 0;

    byte[] b = new byte[4096];

    while(bytesRead<=100000000){
        numBytesRead = microphone.read(b, 0, CHUNK_SIZE);

        bytesRead += numBytesRead;

        if(recognizer.acceptWaveForm(b, numBytesRead)){
            System.out.println(recognizer.getResult());
        }else{
            System.out.println(recognizer.getPartialResult());
        }
    }

    System.out.println(recognizer.getFinalResult());
    
    microphone.close();
}

另外,JNAVosk Package 器对我来说并不是开箱即用的,我不得不在www.example.com中LibVosk.java进行更改

Native.register(LibVosk.class, "vosk")

Native.register(LibVosk.class, "my-path/lib/python3.8/site-packages/vosk/libvosk.so");

总的来说,与我试用过的其他离线语音识别工具相比,Vosk语音识别工具包似乎工作得很好。(还没有试用过CMUSphinx。)不过Vosk确实需要更好的文档和/或代码注解。

相关问题