将包含公式的HTML页面转换为docx

mrwjdhj3 于 2022-11-20 发布在其他

关注(0)|答案(1)|浏览(133)

我正在尝试使用pandoc将html文档转换为docx。
网站Map---------------
在转换为docx的过程中，除了方程之外，一切都很顺利。在html文件中，方程看起来像这样：

<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
    <div class="jp-Cell-inputWrapper">
    <div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser">
    </div>
    <div class="jp-InputArea jp-Cell-inputArea"><div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
    \begin{equation}
    \log_{10}(\mu)={-2.64}+\frac{4437.038}{T-544.391}
    \end{equation}
    </div>
    </div>
    </div>
    </div>

运行pandoc命令后，docx文档中的结果为：
计算结果表明，该方法计算精度高、计算速度快、计算精度高、计算精度高，是一种有效的计算方法。
你知道我该如何克服这个问题吗？
谢谢

Html

来源：https://stackoverflow.com/questions/74490547/converting-html-with-equations-pages-to-docx

1条答案

按热度按时间

nmpmafwu1#

一个Lua filter可以帮助我们。下面的代码查找带有data-mime-type="text/markdown"属性的div元素，并将其上下文解析为LaTeX。然后用解析结果替换原来的div。

local stringify = pandoc.utils.stringify
function Div (div)
  if div.attributes['mime-type'] == 'text/markdown' then
    return pandoc.read(stringify(div), 'latex').blocks
  end
end

将代码保存到一个parse-math.lua文件中，并让pandoc使用--lua-filter/-L选项：

pandoc --lua-filter parse-math.lua ...

正如在注解中提到的，如果有其他HTML元素具有text/markdown媒体类型，这会变得稍微复杂一些。在这种情况下，我们将检查解析结果是否包含 only math，否则保留原始内容。

local stringify = pandoc.utils.stringify
function Div (div)
  if div.attributes['mime-type'] == 'text/markdown' then
    local result = pandoc.read(stringify(div), 'latex').blocks
    local first = result[1] and result[1].content or {}
    return (#first == 1 and first[1].t == 'Math')
      and result
      or nil
  end
end

赞(0）回复(0）举报 2022-11-20

我来回答

将包含公式的HTML页面转换为docx

1条答案

相关问题

热门标签

最新问答