c++ 如何使用最新的Boost Spirit将代码定义文本解析为XML结构?

9o685dep  于 2023-06-07  发布在  其他
关注(0)|答案(2)|浏览(144)

我是C的新手,第一次使用Boost Spirit为我的团队承担学习和使用C的任务(来自Web开发人员背景:))。在互联网上搜索时,我看到了这个社区中的一些很好的例子(特别是Sehe),但由于XML结构的复杂性,我不能完全将所有的东西拼凑在一起来完成这项任务。
这个解析器将充当中间人,将结构代码定义(由其他一些团队编写)转换为XML,供多个集成团队使用,并根据XML结构将其生成为他们选择的语言的代码。
下面是代码结构定义文本的一个小示例(来自外部文件)。此文件可能非常大,具体取决于任务

Class Simple caption;
Class Simple columns "Column Name";

Class Container CONTAINER_NAME ( 
  Complex OBJECT_NAME ( 
    Simple obj_id 
    Simple obj_property1
    Simple obj_attribute enumeration(EnumOption1, EnumOption2,EnumOption3,EnumOption4)
    Container OBJECT_ITEMS (
      Complex OBJECT_ITEM (
        Simple obj_item_name
        Container set_value (
          Simple obj_item_value
        )
      )
    )
  )
);

解析器将计算并生成这种格式的XML

<task>
  <class>
    <simple>
      <identifier>caption</identifier>
      <literal>" "</literal>
    </simple>
  </class>
  <class>
    <simple>
      <identifier>caption</identifier>
      <literal>"Column Name"</literal>
    </simple>
  </class>
  <class>
    <container>
      <identifier>CONTAINER_NAME:CONTAINER_NAME</identifier>
      <literal>" "</literal>
      <complex>
        <identifier>CONTAINER_NAME:OBJECT_NAME</identifier>
        <literal>" "</literal>
        <simple>
          <identifier>CONTAINER_NAME:obj_id</identifier>
          <literal>" "</literal>
        </simple>
        <simple>
          <identifier>CONTAINER_NAME:obj_property1</identifier>
          <literal>" "</literal>
        </simple>
        <simple>
          <identifier>CONTAINER_NAME:obj_attribute</identifier>
          <literal>" "</literal>
          <enumeration>
            <word>EnumOption1</word>
            <word>EnumOption2</word>
            <word>EnumOption3</word>
            <word>EnumOption4</word>
          </enumeration>
        </simple>
        <container>
          <identifier>CONTAINER_NAME:OBJECT_ITEMS</identifier>
          <literal>" "</literal>
          <complex>
            <identifier>CONTAINER_NAME:OBJECT_ITEM</identifier>
            <literal>" "</literal>
            <simple>
              <identifier>CONTAINER_NAME:obj_item_name</identifier>
              <literal>" "</literal>
            </simple>
            <container>
              <identifier>CONTAINER_NAME:set_value</identifier>
              <literal>" "</literal>
              <simple>
                <identifier>CONTAINER_NAME:obj_item_value</identifier>
                <literal>" "</literal>
              </simple>
            </container>
          </complex>
        </container>
      </complex>
    </container>
  </class>
</task>

从我所读到的,我将需要(只是我的思维过程与一个非常基本的知识)以下:
1.语法定义与规则类,容器,复杂,简单,解析代码定义文本(我最大的挑战);
1.为每个组(简单、复杂、容器、类等)创建XML节点的某种语义动作/函数。我看到我可以在这里使用msxml6.dll作为xml生成器,但不知道如何将它们连接起来。
我看到了一些构建AST然后从中构建XML的例子,但是它们使用的XML结构并不完全遵循任何标准,因为Container可以有Complex,但是Complex也可以有Container
任何帮助或指导或例子,以指出我从哪里开始将不胜感激。

已更新

1.分号用于指示CLASS块的结束。
1.注解存在,但将位于单独的行上。无内联注解。
1.代码定义中没有***literal***标签。文字内容在双引号内。参见更新的代码定义结构块第2行。

enxuqcxy

enxuqcxy1#

好的,这些解释帮助我理解了输入和XML之间的对应关系。还有一些...规格不太清楚但我们继续吧

解析

1.#### AST
像往常一样,我从AST开始。这一次,它不是基于示例输入,而是基于输出XML:

namespace Ast {
    using boost::recursive_wrapper;

    using Id      = std::string;
    using Literal = std::string;
    using Enum    = std::vector<Id>;

    struct Base {
        Id      id;
        Literal literal;
    };

    struct Simple : Base {
        Enum enumeration;
    };

    struct Complex;
    struct Container;

    using Class = boost::variant<   
        Simple,                     
        recursive_wrapper<Complex>, 
        recursive_wrapper<Container>
    >;

    using Classes = std::vector<Class>;
    struct Container : Base { Class   element; };
    struct Complex   : Base { Classes members; };

    using Task = std::vector<Class>;
} // namespace Ast

到目前为止一切顺利。没有惊喜。主要的是使用递归变量来允许嵌套复杂/容器类型。作为旁注,我将所有类型的公共部分反映为Base。让我们调整这些以用作融合序列:

BOOST_FUSION_ADAPT_STRUCT(Ast::Simple,    id, literal, enumeration);
BOOST_FUSION_ADAPT_STRUCT(Ast::Complex,   id, literal, members)
BOOST_FUSION_ADAPT_STRUCT(Ast::Container, id, literal, element)

现在精神将知道如何传播属性没有进一步的帮助。
1.#### * 语法 *
框架很简单,只需将AST节点Map到规则:

template <typename It> struct Task : qi::grammar<It, Ast::Task()> {
    Task() : Task::base_type(start) {
        start = skip(space)[task_];
        // ...
    }

  private:
    qi::rule<It, Ast::Task()> start;

    using Skipper = qi::space_type;
    qi::rule<It, Ast::Task(), Skipper>      task_;
    qi::rule<It, Ast::Class(), Skipper>     class_;
    qi::rule<It, Ast::Simple(), Skipper>    simple_;
    qi::rule<It, Ast::Complex(), Skipper>   complex_;
    qi::rule<It, Ast::Container(), Skipper> container_;

    // lexemes:
    qi::rule<It, Ast::Id()>      id_;
    qi::rule<It, Ast::Literal()> literal_;
};

注意,我对词素(即do not allow a skipper)进行了分组,并将space skipper封装到了start规则中。
因为“classes”可以显式出现,但也可以不带Class关键字,所以我将引入一个额外的规则type_,这样我们就可以说:

task_  = *class_ > eoi;
    type_  = simple_ | complex_ | container_;
    class_ = "Class" > type_ > ';';

在可以接受Simple/Complex/Container的情况下,也可以使用type_
对于其余的,没有太多的惊喜,所以让我们展示整个构造函数块:

Task() : Task::base_type(start) {
    using namespace qi;

    start = skip(space)[task_];

    // lexemes:
    id_      = raw[alpha >> *('_' | alnum)];
    literal_ = '"' > *('\\' >> char_ | ~char_('"')) > '"';

    auto optlit = copy(literal_ | attr(std::string(" "))); // weird, but okay

    task_      = *class_ > eoi;
    type_      = simple_ | complex_ | container_;
    class_     = lit("Class") > type_ > ';';
    simple_    = lit("Simple") >> id_ >> optlit >> enum_;
    complex_   = lit("Complex") >> id_ >> optlit >> '(' >> *type_ >> ')';
    container_ = lit("Container") >> id_ >> optlit >> '(' >> type_ > ')';
    enum_      = -(lit("enumeration") >> '(' >> (id_ % ',') > ')');

    BOOST_SPIRIT_DEBUG_NODES(
        (task_)(class_)(type_)(simple_)(complex_)(container_)(enum_)(id_)(literal_))
}
  • 注意另一个“额外”(enum_)。当然,我也可以把它全部保存在simple_规则中。

下面是**Live Demo**打印样本输入的原始AST:

- (caption " " {})
 - (columns "Column Name" {})
 - (CONTAINER_NAME " " (OBJECT_NAME " " {(obj_id " " {}), (obj_property1 " " {}), (obj_attribute " " {EnumOption1, EnumOption2, EnumOption3, EnumOption4}), (OBJECT_ITEMS " " (OBJECT_ITEM " " {(obj_item_name " " {}), (set_value " " (obj_item_value " " {}))}))}))

遗憾的是,我所有漂亮的错误处理代码都没有被触发:)输出显然很难看,所以让我们来解决这个问题。

生成XML

我不是Microsoft的粉丝,但我更喜欢其他XML库(请参阅What XML parser should I use in C++?)。
所以我在这里选择PugiXML。
1.####生成器
简单地说,我们必须教计算机如何将任何Ast节点转换为XML:

#include <pugixml.hpp>
namespace Generate {
    using namespace Ast;

    struct XML {
        using Node = pugi::xml_node;

        // callable for variant visiting:
        template <typename T> void operator()(Node parent, T const& node) const { apply(parent, node); }

      private:
        void apply(Node parent, Ast::Class const& c) const {
            using std::placeholders::_1;
            boost::apply_visitor(std::bind(*this, parent, _1), c);
        }

        void apply(Node parent, Id const& id) const {
            auto identifier = named_child(parent, "identifier");
            identifier.text().set(id.c_str());
        }

        void apply(Node parent, Literal const& l) const {
            auto literal = named_child(parent, "literal");
            literal.text().set(l.c_str());
        }

        void apply(Node parent, Simple const& s) const {
            auto simple = named_child(parent, "simple");
            apply(simple, s.id);
            apply(simple, s.literal);
            apply(simple, s.enumeration);
        }

        void apply(Node parent, Enum const& e) const {
            if (!e.empty()) {
                auto enum_ = named_child(parent, "enumeration");
                for (auto& v : e)
                    named_child(enum_, "word").text().set(v.c_str());
            }
        }

        void apply(Node parent, Complex const& c) const {
            auto complex_ = named_child(parent, "complex");
            apply(complex_, c.id);
            apply(complex_, c.literal);
            for (auto& m : c.members)
                apply(complex_, m);
        }

        void apply(Node parent, Container const& c) const {
            auto cont = named_child(parent, "container");
            apply(cont, c.id);
            apply(cont, c.literal);
            apply(cont, c.element);
        }

        void apply(Node parent, Task const& t) const {
            auto task = named_child(parent, "task");
            for (auto& c : t)
                apply(task, c);
        }

      private:
        Node named_child(Node parent, std::string const& name) const {
            auto child = parent.append_child();
            child.set_name(name.c_str());
            return child;
        }
    };
} // namespace Generate

我不会说我在一瞬间就把它打出来了,但你会认出这个模式:它在阿斯特1:1之后取得了巨大的成功。

完整演示

集成以上所有内容,并打印XML输出:

Live On Compiler Explorer

// #define BOOST_SPIRIT_DEBUG 1
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/qi.hpp>
#include <iomanip>
namespace qi = boost::spirit::qi;

namespace Ast {
    using boost::recursive_wrapper;

    using Id      = std::string;
    using Literal = std::string;
    using Enum    = std::vector<Id>;

    struct Base {
        Id      id;
        Literal literal;
    };

    struct Simple : Base {
        Enum enumeration;
    };

    struct Complex;
    struct Container;

    using Class = boost::variant<    //
        Simple,                      //
        recursive_wrapper<Complex>,  //
        recursive_wrapper<Container> //
    >;

    using Classes = std::vector<Class>;
    struct Container : Base { Class   element; };
    struct Complex   : Base { Classes members; };

    using Task = std::vector<Class>;
} // namespace Ast

BOOST_FUSION_ADAPT_STRUCT(Ast::Simple,    id, literal, enumeration);
BOOST_FUSION_ADAPT_STRUCT(Ast::Complex,   id, literal, members)
BOOST_FUSION_ADAPT_STRUCT(Ast::Container, id, literal, element)

namespace Parser {
    template <typename It> struct Task : qi::grammar<It, Ast::Task()> {
        Task() : Task::base_type(start) {
            using namespace qi;

            start = skip(space)[task_];

            // lexemes:
            id_      = raw[alpha >> *('_' | alnum)];
            literal_ = '"' > *('\\' >> char_ | ~char_('"')) > '"';

            auto optlit = copy(literal_ | attr(std::string(" "))); // weird, but okay

            task_      = *class_ > eoi;
            type_      = simple_ | complex_ | container_;
            class_     = lit("Class") > type_ > ';';
            simple_    = lit("Simple") >> id_ >> optlit >> enum_;
            complex_   = lit("Complex") >> id_ >> optlit >> '(' >> *type_ >> ')';
            container_ = lit("Container") >> id_ >> optlit >> '(' >> type_ > ')';
            enum_      = -(lit("enumeration") >> '(' >> (id_ % ',') > ')');

            BOOST_SPIRIT_DEBUG_NODES(
                (task_)(class_)(type_)(simple_)(complex_)(container_)(enum_)(id_)(literal_))
        }

      private:
        qi::rule<It, Ast::Task()> start;

        using Skipper = qi::space_type;
        qi::rule<It, Ast::Task(), Skipper>      task_;
        qi::rule<It, Ast::Class(), Skipper>     class_, type_;
        qi::rule<It, Ast::Simple(), Skipper>    simple_;
        qi::rule<It, Ast::Complex(), Skipper>   complex_;
        qi::rule<It, Ast::Container(), Skipper> container_;
        qi::rule<It, Ast::Enum(), Skipper>      enum_;

        // lexemes:
        qi::rule<It, Ast::Id()>      id_;
        qi::rule<It, Ast::Literal()> literal_;
    };
}

#include <pugixml.hpp>
namespace Generate {
    using namespace Ast;

    struct XML {
        using Node = pugi::xml_node;

        // callable for variant visiting:
        template <typename T> void operator()(Node parent, T const& node) const { apply(parent, node); }

      private:
        void apply(Node parent, Ast::Class const& c) const {
            using std::placeholders::_1;
            boost::apply_visitor(std::bind(*this, parent, _1), c);
        }

        void apply(Node parent, std::string const& s, char const* kind) const {
            named_child(parent, kind).text().set(s.c_str());
        }

        void apply(Node parent, Simple const& s) const {
            auto simple = named_child(parent, "simple");
            apply(simple, s.id, "identifier");
            apply(simple, s.literal, "literal");
            apply(simple, s.enumeration);
        }

        void apply(Node parent, Enum const& e) const {
            if (!e.empty()) {
                auto enum_ = named_child(parent, "enumeration");
                for (auto& v : e)
                    named_child(enum_, "word").text().set(v.c_str());
            }
        }

        void apply(Node parent, Complex const& c) const {
            auto complex_ = named_child(parent, "complex");
            apply(complex_, c.id, "identifier");
            apply(complex_, c.literal, "literal");
            for (auto& m : c.members)
                apply(complex_, m);
        }

        void apply(Node parent, Container const& c) const {
            auto cont = named_child(parent, "container");
            apply(cont, c.id, "identifier");
            apply(cont, c.literal, "literal");
            apply(cont, c.element);
        }

        void apply(Node parent, Task const& t) const {
            auto task = named_child(parent, "task");
            for (auto& c : t)
                apply(task.append_child("class"), c);
        }

      private:
        Node named_child(Node parent, std::string const& name) const {
            auto child = parent.append_child();
            child.set_name(name.c_str());
            return child;
        }
    };
} // namespace Generate

int main() { 
    using It = std::string_view::const_iterator;
    static const Parser::Task<It> p;
    static const Generate::XML to_xml;

    for (std::string_view input :
         {
             R"(Class Simple caption;
                Class Simple columns "Column Name";

                Class Container CONTAINER_NAME ( 
                  Complex OBJECT_NAME ( 
                    Simple obj_id 
                    Simple obj_property1
                    Simple obj_attribute enumeration(EnumOption1, EnumOption2,EnumOption3,EnumOption4)
                    Container OBJECT_ITEMS (
                      Complex OBJECT_ITEM (
                        Simple obj_item_name
                        Container set_value (
                          Simple obj_item_value
                        )
                      )
                    )
                  )
                );)",
         }) //
    {
        try {
            Ast::Task t;

            if (qi::parse(begin(input), end(input), p, t)) {
                pugi::xml_document doc;
                to_xml(doc.root(), t);
                doc.print(std::cout, "  ", pugi::format_default);
                std::cout << std::endl;
            } else {
                std::cout << " -> INVALID" << std::endl;
            }
        } catch (qi::expectation_failure<It> const& ef) {
            auto f    = begin(input);
            auto p    = ef.first - input.begin();
            auto bol  = input.find_last_of("\r\n", p) + 1;
            auto line = std::count(f, f + bol, '\n') + 1;
            auto eol  = input.find_first_of("\r\n", p);

            std::cerr << " -> EXPECTED " << ef.what_ << " in line:" << line << "\n"
                << input.substr(bol, eol - bol) << "\n"
                << std::setw(p - bol) << ""
                << "^--- here" << std::endl;
        }
    }
}

打印令人垂涎的输出:

<task>
  <class>
    <simple>
      <identifier>caption</identifier>
      <literal> </literal>
    </simple>
  </class>
  <class>
    <simple>
      <identifier>columns</identifier>
      <literal>Column Name</literal>
    </simple>
  </class>
  <class>
    <container>
      <identifier>CONTAINER_NAME</identifier>
      <literal> </literal>
      <complex>
        <identifier>OBJECT_NAME</identifier>
        <literal> </literal>
        <simple>
          <identifier>obj_id</identifier>
          <literal> </literal>
        </simple>
        <simple>
          <identifier>obj_property1</identifier>
          <literal> </literal>
        </simple>
        <simple>
          <identifier>obj_attribute</identifier>
          <literal> </literal>
          <enumeration>
            <word>EnumOption1</word>
            <word>EnumOption2</word>
            <word>EnumOption3</word>
            <word>EnumOption4</word>
          </enumeration>
        </simple>
        <container>
          <identifier>OBJECT_ITEMS</identifier>
          <literal> </literal>
          <complex>
            <identifier>OBJECT_ITEM</identifier>
            <literal> </literal>
            <simple>
              <identifier>obj_item_name</identifier>
              <literal> </literal>
            </simple>
            <container>
              <identifier>set_value</identifier>
              <literal> </literal>
              <simple>
                <identifier>obj_item_value</identifier>
                <literal> </literal>
              </simple>
            </container>
          </complex>
        </container>
      </complex>
    </container>
  </class>
</task>
  • 我仍然不明白CONTAINER_NAME:“命名空间”是如何工作的,所以我将把它留给你去做。
rjzwgtxy

rjzwgtxy2#

再次感谢你给我上了这么棒的一课。要回答有关CONTAINER_NAME的问题,请执行以下操作:命名空间,它只是简单地用于分组(不是我的规则,只是提出定义结构的人希望这样)。
所以如果我们解析这行

Class Simple caption;

那么结果应该是:

<task>
  <class>
    <simple>
      <identifier>caption:caption</identifier>
      <literal>" "</literal>
    </simple>
  </class>
</task>

添加命名空间标题:,因为这是该类的第一个子级。但是如果我们解析

Class Container CONTAINER_NAME ( 
  Complex OBJECT_NAME ( 
    Simple obj_id 
    Container OBJECT_ITEMS (
      Complex OBJECT_ITEM (
        Simple obj_item_name
        Container set_value (
          Simple obj_item_value
        )
      )
    )
  )
);

然后**CONTAINER_NAME:**命名空间将被附加到所有chidren的标识符名称。

<class>
    <container>
      <identifier>CONTAINER_NAME:CONTAINER_NAME</identifier>
      <literal> </literal>
      <complex>
        <identifier>CONTAINER_NAME:OBJECT_NAME</identifier>
        <literal> </literal>
        <simple>
          <identifier>CONTAINER_NAME:obj_id</identifier>
          <literal> </literal>
        </simple>
        <container>
          <identifier>CONTAINER_NAME:OBJECT_ITEMS</identifier>
          <literal> </literal>
          <complex>
            <identifier>CONTAINER_NAME:OBJECT_ITEM</identifier>
            <literal> </literal>
            <simple>
              <identifier>CONTAINER_NAME:obj_item_name</identifier>
              <literal> </literal>
            </simple>
            <container>
              <identifier>CONTAINER_NAME:set_value</identifier>
              <literal> </literal>
              <simple>
                <identifier>CONTAINER_NAME:obj_item_value</identifier>
                <literal> </literal>
              </simple>
            </container>
          </complex>
        </container>
      </complex>
    </container>
  </class>

我添加以下函数来处理XML结构的名称空间。它完成了这项工作,但我很肯定你只会想出一行来做这件事...:)

std::string get_namespace(Node parent, std::string const& ident) const {
  auto parent_name = std::string(parent.name());
  std::string ns = ident + ":" + ident;  // Default namespace
  // If this is the child of class container, just return the object's identifier value and add colon (:)
  if (parent_name != "class") {
    // Parent is not a class type, just extract the namespace from
    // identifier of this parent node.
    std::string parent_id = parent.child("identifier").text().as_string();
    ns = parent_id.substr(0, parent_id.find(":") + 1) + ident;
  }
  return ns;
};

然后在处理Simple、Complex和Container的XML时调用此函数

void apply(Node parent, Simple const& s) const {
  auto simple = named_child(parent, "simple");
  apply(simple, get_namespace(parent, s.id), "identifier");
  apply(simple, s.literal, "literal");
  apply(simple, s.enumeration);
}

无论如何,我还有很多事情要做,因为我还需要解析if-else,case语句,但这给予了我一个很好的起点。再次感谢您花时间与我分享您的知识。

相关问题