windows 使用自定义视频编写器库编写音频时出现错误

hjzp0vay  于 2023-01-31  发布在  Windows
关注(0)|答案(2)|浏览(123)

我试图 Package 一小段方便的C代码,旨在使用VFW在Windows上生成视频+音频,C库lives here,描述如下:
使用Windows的视频(所以它不便携)。方便,如果你想快速录制视频的地方,不想涉水通过VfW文档自己。
我想在Python上使用那个C++库,所以我决定用swig把它包起来。
事情是,我有一些问题,当谈到编码的音频,出于某种原因,我试图理解为什么生成的视频是坏的,它似乎音频没有被正确写入视频文件。这意味着,如果我尝试用VLC或任何类似的视频播放器打开视频,我会收到一条消息,说视频播放器无法识别音频或视频编解码器。视频图像很好,所以它'我将音频写入文件的方式肯定有问题。
我附加了swig接口和一个小Python测试,它试图成为原始c++ test的一个端口。

    • 广告撰稿人. i**
%module aviwriter

%{
#include "aviwriter.h"
%}

%typemap(in) (const unsigned char* buffer) (char* buffer, Py_ssize_t length) %{
  if(PyBytes_AsStringAndSize($input,&buffer,&length) == -1)
    SWIG_fail;
  $1 = (unsigned char*)buffer;
%}

%typemap(in) (const void* buffer) (char* buffer, Py_ssize_t length) %{
  if(PyBytes_AsStringAndSize($input,&buffer,&length) == -1)
    SWIG_fail;
  $1 = (void*)buffer;
%}

%include "aviwriter.h"
    • 测试. py**
import argparse
import sys
import struct
from distutils.util import strtobool

from aviwriter import AVIWriter

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("-audio", action="store", default="1")
    parser.add_argument('-width', action="store",
                        dest="width", type=int, default=400)
    parser.add_argument('-height', action="store",
                        dest="height", type=int, default=300)
    parser.add_argument('-numframes', action="store",
                        dest="numframes", type=int, default=256)
    parser.add_argument('-framerate', action="store",
                        dest="framerate", type=int, default=60)
    parser.add_argument('-output', action="store",
                        dest="output", type=str, default="checker.avi")

    args = parser.parse_args()

    audio = strtobool(args.audio)
    framerate = args.framerate
    num_frames = args.numframes
    width = args.width
    height = args.height
    output = args.output

    writer = AVIWriter()

    if not writer.Init(output, framerate):
        print("Couldn't open video file!")
        sys.exit(1)

    writer.SetSize(width, height)

    data = [0]*width*height
    sampleRate = 44100
    samples_per_frame = 44100 / framerate
    samples = [0]*int(samples_per_frame)

    c1, s1, f1 = 24000.0, 0.0, 0.03
    c2, s2, f2 = 1.0, 0.0, 0.0013

    for frame in range(num_frames):
        print(f"frame {frame}")

        i = 0
        for y in range(height):
            for x in range(width):
                on = ((x + frame) & 32) ^ ((y+frame) & 32)
                data[i] = 0xffffffff if on else 0xff000000
                i += 1
        writer.WriteFrame(
            struct.pack(f'{len(data)}L', *data),
            width*4
        )

        if audio:
            for i in range(int(samples_per_frame)):
                c1 -= f1*s1
                s1 += f1*c1
                c2 += f2*s2
                s2 -= f2*c2

                val = s1 * (0.75 + 0.25 * c2)
                if(frame == num_frames - 1):
                    val *= 1.0 * (samples_per_frame - 1 - i) / \
                        samples_per_frame
                samples[i] = int(val)

                if frame==0:
                    print(f"i={i} val={int(val)}")

            writer.WriteAudioFrame(
                struct.pack(f'{len(samples)}i', *samples),
                int(samples_per_frame)
            )

    writer.Exit()

我不认为samples生成的不正确,因为我已经比较了python端生成的值和c端生成的值,只是为帧0编写的包。
我对错误的一些怀疑是我在swig上创建类型Map的方式,也许这不好......或者可能问题出在writer.WriteAudioFrame(struct.pack(f'{len(samples)}i', *samples), int(samples_per_frame))行,我不知道可能是什么,肯定是我从Python发送音频缓冲区到C
Package 器的方式不好。
那么,你知道如何修复附加的代码,以便www.example.com能够生成一个视频与正确的音频类似的c测试?test.py will be able to generate a video with the right audio similarly to the c test?
当生成正常时,视频将显示一个神奇的滚动棋盘,以催眠正弦波作为音频背景:D

    • 补充说明:**

1.看起来上面的代码没有使用writer.SetAudioFormat,而AVIFileCreateStreamAAVIStreamSetFormat函数需要使用writer.SetAudioFormat。问题是我不知道如何在swig上导出这个结构,这样我就可以像test.cpp一样在Python上使用它,从Mmreg. h中我看到这个结构看起来像这样:
类型定义结构tWAVEFORMATEX {字格式标记;/格式类型/WORD n通道;/通道数(即单声道、立体声......)/DWORD nSamplesPerSec;/采样率/DWORD n每秒平均字节数;/用于缓冲区估计/WORD nBlockAlign;/数据块大小/WORD wBitsPerSample;/单声道数据每个样本的位数/WORD cbSize;/额外信息大小的字节计数(在cbSize之后)/
}波形;
不幸的是,我不知道如何在aviwriter. i上 Package 这些东西?我试过使用% include windows. i并直接在块%{ ... %}上包含这些东西,但我得到的都是一堆错误:/
1.我不希望修改aviwriter.h和aviwriter.cpp,因为它们基本上都是外部工作代码。
1.假设我可以 Package WAVEFORMATEX,这样我就可以在Python中使用它,那么你是如何像test.cpp一样使用memset的呢?memset(&wfx,0,sizeof(wfx));

3z6pesqy

3z6pesqy1#

Two suggestions:

  • First, pack the data as short instead of int for the audio format, as per the C++ test. Audio data is 16-bit, not 32-bit. Use the 'h' extension for the packing format. For example, struct.pack(f'{len(samples)}h', *samples) .
  • Second, see code modification below. Expose WAVEFORMATX via SWIG, by editing aviwriter.i . Then call writer.SetAudioFormat(wfx) from Python.
  • In my tests, the memset() was not necessary. From python you could manually set the field cbSize to zero, that should be enough. The other six fields are mandatory so you'll be setting them anyways. It looks like this struct isn't meant to be revised in the future, because it does not have a struct size field, and also the semantics of cbSize (appending arbitrary data to the end of the struct) conflict with an extension anyways.

aviwriter.i:

%inline %{
typedef unsigned short WORD;
typedef unsigned long DWORD;
typedef struct tWAVEFORMATEX
{
    WORD    wFormatTag;        /* format type */
    WORD    nChannels;         /* number of channels (i.e. mono, stereo...) */
    DWORD   nSamplesPerSec;    /* sample rate */
    DWORD   nAvgBytesPerSec;   /* for buffer estimation */
    WORD    nBlockAlign;       /* block size of data */
    WORD    wBitsPerSample;    /* Number of bits per sample of mono data */    
    WORD    cbSize;            /* The count in bytes of the size of
                                extra information (after cbSize) */
} WAVEFORMATEX;
%}

test.py:

from aviwriter import WAVEFORMATEX

later in test.py:

wfx = WAVEFORMATEX()
    wfx.wFormatTag = 1 #WAVE_FORMAT_PCM
    wfx.nChannels = 1
    wfx.nSamplesPerSec = sampleRate
    wfx.nAvgBytesPerSec = sampleRate * 2
    wfx.nBlockAlign = 2
    wfx.wBitsPerSample = 16
    writer.SetAudioFormat(wfx)

Notes on SWIG: Since aviwriter.h only provides a forward declaration of tWAVEFORMATEX , no other information is provided to SWIG, preventing get/set wrappers from being generated. You could ask SWIG to wrap a Windows header declaring the struct ... and open a can of worms because those headers are too large and complex, exposing further problems. Instead, you can individually define WAVEFORMATEX as done above. The C++ types WORD and DWORD still are not declared, though. Including the SWIG file windows.i only creates wrappers which, for example, allow string "WORD" in a Python script file to be understood as indicating 16-bit data in memory. But that doesn't declare the WORD type from a C++ perspective. To resolve this, adding typedefs for WORD and DWORD in this %inline statement in aviwriter.i forces SWIG to copy that code directly inlined into the wrapper C++ file, making the declarations available. This also triggers get/set wrappers to be generated. Alternately, you could include that inlined code inside aviwriter.h if you're willing to edit it.

In short, the idea here is to fully enclose all types into standalone headers or declaration blocks. Remember that .i and .h file have separate functionality (wrappers and data conversion, versus functionality being wrapped). Similarly, notice how aviwriter.h is included twice in the aviwriter.i , once to trigger the generation of wrappers needed for Python, and once to declare types in the generated wrapper code needed for C++.

u5i3ibmn

u5i3ibmn2#

从我在代码中看到的,您没有初始化音频格式。这是通过在第44行调用writer.SetAudioFormat(&wfx);在原始test.cpp代码中完成的,然后将其设置为单声道44.1 kHz PCM。我相信,由于您没有初始化,写入了空白标头,视频播放器无法打开未知格式。

    • 更新**

因为你只需要传递二进制头结构,你不需要使用这个结构并在aviwriter.i中声明它,你可以直接从Python中使用下面的代码:

import struct
from collection import namedtuple

WAVEFORMATEX = namedtuple('WAVEFORMATEX', 'wFormatTag nChannels nSamplesPerSec nAvgBytesPerSec nBlockAlign wBitsPerSample cbSize ')
wfx = WAVEFORMATEX(    
    wFormatTag = 1,
    nChannels = 1,
    nSamplesPerSec = sampleRate,
    nAvgBytesPerSec = sampleRate * 2,
    nBlockAlign = 2,
    wBitsPerSample = 16,
    cbSize = 0)

audio_format_obj = struct.pack('<HHIIHHH', *list(wfx))
writer.SetAudioFormat(audio_format_obj)

这将自动解决您的第二个和第三个问题。
至于memset(&wfx,0,sizeof(wfx));,这只是旧C将结构中所有变量清零的一种丑陋方式。
正如@MichaelsonBritt提到的,你的音频数据格式必须与头中的声明相匹配。但是,你可以声明2个声道,而不是转换为16位short,所以你会得到一个声道静音的立体声。

相关问题