apache 如何为同一段落设置/定义不同的样式

我试图转换html文本生成一个单词表。它工作得很好，创建的word文件是正确的，除了字符样式。
这是我第一次尝试使用Apache POI。
到目前为止，我能够检测到新的线（）标记（请参阅下面的代码）。但我还想检查一些其他标记，如、，并为每个部分设置正确的运行值。
举例来说：

这是我的文字，现在是斜体，但也在粗体取决于其重要性

我想我应该解析文本，并为每个部分应用不同的运行，但我不知道如何做。

private static  XWPFParagraph getTableParagraph(XWPFTableCell  cell,  String text)
{   
    int fontsize= 11; 
    XWPFParagraph paragraph = cell.addParagraph();
    cell.removeParagraph(0);
    paragraph.setSpacingAfterLines(0);
    paragraph.setSpacingAfter(0);
    XWPFRun myRun1 = paragraph.createRun();
    if (text==null) text="";
    else
    {
        while (true)
        {
            int x = text.indexOf("<br>"); 
            if (x <0) break;
            String work = text.substring(0,x );
            text= text.substring(x+4);
            myRun1.setText(work);
            myRun1.addBreak();
        }
    }
    myRun1.setText(text);
    myRun1.setFontSize(fontsize);
    return paragraph;
}

字符串

在转换HTML文本时，永远不要只使用字符串方法。XML和HTML都是标记语言。它们的内容是标记，而不仅仅是纯文本。需要遍历标记以获取所有单个节点及其含义。这个遍历过程从来都不是微不足道的，因此有专门的库。在这些库的深处也需要使用字符串方法，但这些方法被 Package 成用于遍历标记的有用方法。
对于遍历HTML，例如可以使用jsoup。特别是使用NodeVisitor的NodeTraversor对于遍历HTML非常有用。
我的示例创建了一个实现NodeVisitor的ParagraphNodeVisitor。这个接口请求方法public void head(Node node, int depth)，每当NodeTraversor在节点的头部时调用该方法，并且每当NodeTraversor在节点的尾部时调用public void tail(Node node, int depth)。在那些方法中，可以实现用于处理单个节点的过程。在我们的例子中，这个过程的主要部分是我们是否需要一个新的XWPFRun以及这个运行需要什么设置。
范例：

import java.io.FileOutputStream;
import org.apache.poi.xwpf.usermodel.*;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Node;
import org.jsoup.nodes.TextNode;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.jsoup.select.NodeVisitor;
import org.jsoup.select.NodeTraversor;
public class HTMLtoDOCX {
 private XWPFDocument document;
 public HTMLtoDOCX(String html, String docxPath) throws Exception {
  this.document = new XWPFDocument();
  XWPFParagraph paragraph = null;
  Document htmlDocument = Jsoup.parse(html);
  Elements htmlParagraphs = htmlDocument.select("p");
  for(Element htmlParagraph : htmlParagraphs) {
System.out.println(htmlParagraph);
   paragraph = document.createParagraph();
   createParagraphFromHTML(paragraph, htmlParagraph);
  }
  FileOutputStream out = new FileOutputStream(docxPath);
  document.write(out);
  out.close();
  document.close();
 }
 void createParagraphFromHTML(XWPFParagraph paragraph, Element htmlParagraph) {
  ParagraphNodeVisitor nodeVisitor = new ParagraphNodeVisitor(paragraph);
  NodeTraversor.traverse(nodeVisitor, htmlParagraph);
 }
 private class ParagraphNodeVisitor implements NodeVisitor {
  String nodeName;
  boolean needNewRun;
  boolean isItalic;
  boolean isBold;
  boolean isUnderlined;
  int fontSize;
  String fontColor;
  XWPFParagraph paragraph;
  XWPFRun run;
  ParagraphNodeVisitor(XWPFParagraph paragraph) {
   this.paragraph = paragraph;
   this.run = paragraph.createRun();
   this.nodeName = "";
   this.isItalic = false;
   this.isBold = false;
   this.isUnderlined = false;
   this.fontSize = 11;
   this.fontColor = "000000";
  }
  @Override
  public void head(Node node, int depth) {
   nodeName = node.nodeName();
System.out.println("Start "+nodeName+": " + node);
   if ("#text".equals(nodeName)) {
    run.setText(((TextNode)node).text());
   } else if ("i".equals(nodeName)) {
    isItalic = true;
   } else if ("b".equals(nodeName)) {
    isBold = true;
   } else if ("u".equals(nodeName)) {
    isUnderlined = true;
   } else if ("br".equals(nodeName)) {
    run.addBreak();
   } else if ("font".equals(nodeName)) {
    fontColor = (!"".equals(node.attr("color")))?node.attr("color").substring(1):"000000";
    fontSize = (!"".equals(node.attr("size")))?Integer.parseInt(node.attr("size")):11;
   } 
   run.setItalic(isItalic);
   run.setBold(isBold);
   if (isUnderlined) run.setUnderline(UnderlinePatterns.SINGLE); else run.setUnderline(UnderlinePatterns.NONE);
   run.setColor(fontColor); run.setFontSize(fontSize);
  }
  @Override
  public void tail(Node node, int depth) {
   nodeName = node.nodeName();
System.out.println("End "+nodeName);
   if ("#text".equals(nodeName)) {
    run = paragraph.createRun(); //after setting the text in the run a new run is needed  
   } else if ("i".equals(nodeName)) {
    isItalic = false;
   } else if ("b".equals(nodeName)) {
    isBold = false;
   } else if ("u".equals(nodeName)) {
    isUnderlined = false;
   } else if ("br".equals(nodeName)) {
    run = paragraph.createRun(); //after setting a break a new run is needed
   } else if ("font".equals(nodeName)) {
    fontColor = "000000";
    fontSize = 11;
   }
   run.setItalic(isItalic);
   run.setBold(isBold);
   if (isUnderlined) run.setUnderline(UnderlinePatterns.SINGLE); else run.setUnderline(UnderlinePatterns.NONE);
   run.setColor(fontColor); run.setFontSize(fontSize);
  }
 }
 public static void main(String[] args) throws Exception {
  String html = 
   "<p>Text without tags. <b> Then bold <br/> having break.</b> Then without tags again.</p>"
  +"<p><font size='32' color='#0000FF'><b>First paragraph.</font></b><br/>Just like a heading</p>"
  +"<p>This is my text <i>which now is in italic <b>but also in bold</b> depending on its <u>importance</u></i>.<br/>Now a <b><i><u>new</u></i></b> line starts <i>within <b>the same</b> paragraph</i>.</p>"
  +"<p><b>Last <u>paragraph <i>comes</u> here</b> finally</i>.</p>"
  +"<p>But yet <u><i><b>another</i></u></b> paragraph having <i><font size='22' color='#FF0000'>special <u>font</u> settings</font></i>. Now default font again.</p>"
  ;
  HTMLtoDOCX htmlToDOCX = new HTMLtoDOCX(html, "./CreateWordParagraphFromHTML.docx");
 }
}

字符串
测试结果：

的数据
免责声明：这是一个工作草案，显示的原则。它既不是完全准备好的，也不是在生产环境中使用的代码准备好的。

展开查看全部

apache 如何为同一段落设置/定义不同的样式

1条答案

相关问题

热门标签

最新问答