使用C++范围从展平表创建层次树结构

xam8gpfp 于 2023-06-25 发布在其他

关注(0)|答案(2)|浏览(193)

我有一个简单的结构体，包含三个字段和一个飞船操作符<=>，如下所示：

using MyStruct = struct MyStruct {
    unsigned id;
    std::filesystem::path sourcePath;
    std::string functionName;
    auto operator<=>(const MyStruct&) const = default;
};

数据初始化如下所示：

std::vector<MyStruct> structs{
    {1, "c:/temp/file3.c", "chimp()"},
    {2, "c:/temp/file3.c", "ape()"},
    {3, "c:/temp/file1.c", "foo()"},
    {4, "c:/temp/file1.c", "bar()"},
    {5, "c:/temp/file1.c", "baz()"},
    {6, "c:/temp/sub/file3.c", "file3Fn2()"},
    {7, "c:/temp/sub/file3.c", "file3Fn1()"},
    {8, "c:/temp/file2.c", "file2Fn2()"},
    {9, "c:/temp/file2.c", "file2Fn3()"},
    {10, "c:/temp/file2.c", "file2Fn1()"},
};

我试图从以下2个字段（sourcePath和functionName）创建一个非常简单的排序树。我不在乎id字段。
为了创建树结构，我需要过滤出唯一的文件名，然后将函数名作为叶子添加到这些文件名中。
我想使用新的c++20范围或范围-v3来实现这一点，但我在将正确的范围/视图连接在一起时遇到了麻烦。我需要将展平的数据分解成两个循环：包含文件名的外部文件和包含与每个文件名相关联的函数名的内部文件。
我所追求的树是：

├───root
    ├───file1.c
    │   ├───foo()
    │   ├───foo()
    │   └───baz()
    ├───file2.c
    │   ├───file2Fn1()
    │   ├───file2Fn2()
    │   └───file2Fn3()
    ├───file3.c
    │   ├───ape()
    │   └───chimp()
    └───sub
        └───file3.c
            ├───file3Fn1()
            └───file3Fn2()

这是我到目前为止的代码（和结果）（Visual Studio 2022 Preview 2）

#include <algorithm>
#include <format>
#include <iostream>
#include <filesystem>
using MyStruct = struct MyStruct {
    unsigned id;
    std::filesystem::path sourcePath;
    std::string functionName;
    auto operator<=>(const MyStruct&) const = default;
};
//! Template partial specialization for use with std::formatter
template<>
struct std::formatter<MyStruct> : std::formatter<std::string_view> {
    // parse inherited from std::formatter<std::string_view>
    template <typename FormatContext>
    auto format(const MyStruct& arg, FormatContext& ctx) {
        return formatter<string_view>::format(std::format(
            "{}\t{}\t{}"
            , arg.id
            , arg.sourcePath.generic_string()
            , arg.functionName), ctx);
    }
};
void
print(std::string_view intro, const std::vector<MyStruct>& container) {
    std::cout << intro << '\n';
    for (const auto& next : container) {
        std::cout << std::format("{}\n", next);
    }
}
std::vector<MyStruct> structs{
    {1, "c:/temp/file3.c", "chimp()"},
    {2, "c:/temp/file3.c", "ape()"},
    {3, "c:/temp/file1.c", "foo()"},
    {4, "c:/temp/file1.c", "bar()"},
    {5, "c:/temp/file1.c", "baz()"},
    {6, "c:/temp/sub/file3.c", "file3Fn2()"},
    {7, "c:/temp/sub/file3.c", "file3Fn1()"},
    {8, "c:/temp/file2.c", "file2Fn2()"},
    {9, "c:/temp/file2.c", "file2Fn3()"},
    {10, "c:/temp/file2.c", "file2Fn1()"},
};
// comparator used for unique
static const auto customComp = [](const auto& lhs, const auto& rhs) {
    return std::tie(lhs.sourcePath, lhs.functionName) <
        std::tie(rhs.sourcePath, rhs.functionName);
    };
int
main() {
    auto copy = structs;
    std::ranges::sort(copy, {}, &MyStruct::sourcePath);
    print("after sorting by sourcePath", copy);
    std::ranges::sort(copy, customComp);
    print("after sorting by customComp", copy);
}

程序输出如下，注意customComp比较器的顺序似乎产生正确排序的结果。

after sorting by sourcePath
3       c:/temp/file1.c foo()
4       c:/temp/file1.c bar()
5       c:/temp/file1.c baz()
8       c:/temp/file2.c file2Fn2()
9       c:/temp/file2.c file2Fn3()
10      c:/temp/file2.c file2Fn1()
1       c:/temp/file3.c chimp()
2       c:/temp/file3.c ape()
6       c:/temp/sub/file3.c     file3Fn2()
7       c:/temp/sub/file3.c     file3Fn1()
after sorting by customComp
4       c:/temp/file1.c bar()
5       c:/temp/file1.c baz()
3       c:/temp/file1.c foo()
10      c:/temp/file2.c file2Fn1()
8       c:/temp/file2.c file2Fn2()
9       c:/temp/file2.c file2Fn3()
2       c:/temp/file3.c ape()
1       c:/temp/file3.c chimp()
7       c:/temp/sub/file3.c     file3Fn1()
6       c:/temp/sub/file3.c     file3Fn2()

c++

来源：https://stackoverflow.com/questions/76511234/using-c-ranges-to-create-hierarchical-tree-structure-from-a-flattened-table

2条答案

按热度按时间

p3rjfoxz1#

这是一个冗长的答案，因为它包含四种不同的方法，每种方法都有其优缺点。

使用`std::map`代替范围

根据我的理解，一个简单的std::map就足够了：

#include <vector>
#include <algorithm>
#include <iostream>
#include <filesystem>
#include <map>
using MyStruct = struct MyStruct {
    unsigned id;
    std::filesystem::path sourcePath;
    std::string functionName;
    auto operator<=>(const MyStruct&) const = default;
};
std::vector<MyStruct> structs{
    {1, "c:/temp/file3.c", "chimp()"},
    {2, "c:/temp/file3.c", "ape()"},
    {3, "c:/temp/file1.c", "foo()"},
    {4, "c:/temp/file1.c", "bar()"},
    {5, "c:/temp/file1.c", "baz()"},
    {6, "c:/temp/sub/file3.c", "file3Fn2()"},
    {7, "c:/temp/sub/file3.c", "file3Fn1()"},
    {8, "c:/temp/file2.c", "file2Fn2()"},
    {9, "c:/temp/file2.c", "file2Fn3()"},
    {10, "c:/temp/file2.c", "file2Fn1()"},
};
using Map = std::map<std::filesystem::path, std::vector<std::string>>;
int
main() {
    auto copy = structs;
    
    // sort by functionName
    std::ranges::sort(copy, std::less{}, [](auto const& s){ return s.functionName; });
    Map map;
    for (auto const& s : copy) {
        map[s.sourcePath].push_back(s.functionName);
    }
    for (auto const& [file, functions] : map) {
        std::cout << file << ":\t";
        int n{0};
        for (auto const& fun : functions) {
            std::cout << fun << (++n == functions.size()? "\n" : ", ");
        };
    }
}

输出：

c:/temp/file1.c:    bar(), baz(), foo()
c:/temp/file2.c:    file2Fn1(), file2Fn2(), file2Fn3()
c:/temp/file3.c:    ape(), chimp()
c:/temp/sub/file3.c:    file3Fn1(), file3Fn2()

https://godbolt.org/z/6P498sTGM

使用范围

假设您无论如何都想使用范围，用于可组合性和延迟求值。那么你有几个选择。

使用`std::ranges::unique`

对输入范围排序后，可以使用std::ranges::unique基于sourcePath过滤唯一条目。然后，你可以使用std::ranges::equal_range将唯一范围中的每个条目转换为函数名的范围：

// sort first by sourcePath, then by functionName
auto copy = structs;
std::ranges::sort(copy, std::ranges::less{}, [](auto const& s){ return std::tie(s.sourcePath, s.functionName); });
// create a second copy which contains only the first element for each sourcePath
auto sorted = copy;
auto const ret = std::ranges::unique(sorted, std::ranges::equal_to{}, &MyStruct::sourcePath);
sorted.erase(ret.begin(), ret.end());
print("unique", sorted);
// transform each entry in the unique range to the list of functions
auto outer = sorted | 
    std::views::transform(
        [&copy](auto const& s){ 
            return std::ranges::equal_range(
                copy, 
                s, 
                [](auto const& l, auto const& r){ return l.sourcePath < r.sourcePath; }
            ); 
        }
    );
for (auto const& inner : outer)
    print("", inner);

unique
4   c:/temp/file1.c bar()
10  c:/temp/file2.c file2Fn1()
2   c:/temp/file3.c ape()
7   c:/temp/sub/file3.c file3Fn1()
4   c:/temp/file1.c bar()
5   c:/temp/file1.c baz()
3   c:/temp/file1.c foo()
10  c:/temp/file2.c file2Fn1()
8   c:/temp/file2.c file2Fn2()
9   c:/temp/file2.c file2Fn3()
2   c:/temp/file3.c ape()
1   c:/temp/file3.c chimp()
7   c:/temp/sub/file3.c file3Fn1()
6   c:/temp/sub/file3.c file3Fn2()

https://godbolt.org/z/888734r79
这样你就能获得可组合性。内部循环中存在懒惰，但std::ranges::unique并不懒惰。

`std::map`存储范围

或者，你可以创建一个sourcePaths的Map，而不是存储一个带有函数名的容器，你可以存储一个用std::views::filter创建的惰性视图作为Map的值：

auto sorted_tree(auto const& rng)
{
    // this lambda returns a lambda, that checks if a MyStruct instance's sourcePath is equal to the input argument
    auto has_source_path = [](std::filesystem::path const& path){ 
        return [&](auto const& s){ 
            return s.sourcePath == path; 
        }; 
    };
    // store results in a map that maps strings to a std::ranges::filter_view
    using SubRangeType = std::ranges::filter_view<
        std::ranges::ref_view<std::remove_reference_t<decltype(rng)>>, 
        std::invoke_result_t<decltype(has_source_path), std::filesystem::path>
    >;
    using Map = std::map<std::filesystem::path, SubRangeType>;
    Map map;
    for (auto const& s : rng) {
        if (!map.contains(s.sourcePath)) {
            map.emplace(
                std::make_pair(
                    s.sourcePath, 
                    std::views::filter(rng, has_source_path(s.sourcePath))
                )
            );
        }
    }
    return map;
}

你可以这样使用它：

auto copy = structs;
// optional: Only if you want the function names to be alphabetical
std::ranges::sort(copy, std::ranges::less{}, &MyStruct::functionName);
for (auto& [key, range] : sorted_tree(copy))
    print(key.generic_string(), range);

输出：

c:/temp/file1.c
4   c:/temp/file1.c bar()
5   c:/temp/file1.c baz()
3   c:/temp/file1.c foo()
c:/temp/file2.c
10  c:/temp/file2.c file2Fn1()
8   c:/temp/file2.c file2Fn2()
9   c:/temp/file2.c file2Fn3()
c:/temp/file3.c
2   c:/temp/file3.c ape()
1   c:/temp/file3.c chimp()
c:/temp/sub/file3.c
7   c:/temp/sub/file3.c file3Fn1()
6   c:/temp/sub/file3.c file3Fn2()

https://godbolt.org/z/G38WeG5hT

使用`std::map`作为`std::ranges::unique`的懒惰版本

您可以通过使用std::views::filter和一个 predicate 来获得std::ranges::unique的懒惰版本，该 predicate 存储了已访问的sourcePaths的Map。

auto lazy_unique() {
    struct unique_predicate
    {
        bool operator()(MyStruct const& s) {
            if (visited.contains(s.sourcePath)) {
                return false;
            }
            visited[s.sourcePath] = {};
            return true;
        }
        struct Empty {};
        std::unordered_map<std::filesystem::path, Empty> visited;
    };
    return std::views::filter(unique_predicate{});
};

你可以这样使用它：

// sort first by sourcePath, then by functionName
auto copy = structs;
std::ranges::sort(copy, std::ranges::less{}, [](auto const& s){ return std::tie(s.sourcePath, s.functionName); });   
// transform each entry in the unique range to the list of functions
auto outer = copy | lazy_unique() | 
    std::views::transform(
        [&copy](auto const& s){ 
            return std::ranges::equal_range(
                copy, 
                s, 
                [](auto const& l, auto const& r){ return l.sourcePath < r.sourcePath; }
            ); 
        }
    );
for (auto const& inner : outer) {
    print(inner.begin()->sourcePath.generic_string(), inner);
}

输出：

"c:/temp/file1.c"
4   c:/temp/file1.c bar()
5   c:/temp/file1.c baz()
3   c:/temp/file1.c foo()
"c:/temp/file2.c"
10  c:/temp/file2.c file2Fn1()
8   c:/temp/file2.c file2Fn2()
9   c:/temp/file2.c file2Fn3()
"c:/temp/file3.c"
2   c:/temp/file3.c ape()
1   c:/temp/file3.c chimp()
"c:/temp/sub/file3.c"
7   c:/temp/sub/file3.c file3Fn1()
6   c:/temp/sub/file3.c file3Fn2()

https://godbolt.org/z/8ds96dG7G
但是，这种方法存在一个问题：unique_predicate返回的视图适配器在调用 predicate 时改变Map。因此，您不能重复使用范围：在它上面迭代两次将在第二遍中返回无效的范围。当你传递一个 * 附近的常量引用时，你也可能会遇到不直观的编译器错误（例如：print函数）*，因为不可能迭代范围的const引用，因为它需要变异。
https://godbolt.org/z/edr1PGfPc

展开查看全部

赞(0）回复(0）举报 2023-06-25

lf3rwulv2#

你的投影使用是错误的，你按路径排序。它的目的是接受一个参数并返回比较的基础，例如。按函数名排序：

std::ranges::sort(copy, {},  // same as &MyStruct::functionName
                  [](MyStruct& s) ->decltype(s.functionName)& {
                       return s.functionName;
                  } );

显然，除非使用std::identity作为投影，否则宇宙飞船操作符不会影响排序。
可以使用std::tuple投影。

std::ranges::sort( copy, {}, [](const MyStruct& s) { 
                        return std::tie(s.sourcePath, s.functionName); 
                   } );

这将对std::tuple使用默认的operator<=>，而不是对您的类。很明显，如果你不想像上面的例子一样，在引用/复制问题上遇到麻烦，也不想担心排序的语法是否正确，那么你必须编写一个自定义的操作符。在这种情况下，您可以使用更简单的sort函数版本。
为了在排序后打印为树，文件路径应该被视为一个范围（fs::path可以使用，因为它在其词法元素上有迭代器，或者你自己的等价物），这将增加print()函数中循环的深度。

赞(0）回复(0）举报 2023-06-25

我来回答

使用C++范围从展平表创建层次树结构

2条答案

使用`std::map`代替范围

使用范围

使用`std::ranges::unique`

`std::map`存储范围

使用`std::map`作为`std::ranges::unique`的懒惰版本

相关问题

热门标签

最新问答

使用C++范围从展平表创建层次树结构

2条答案

使用std::map代替范围

使用范围

使用std::ranges::unique

std::map存储范围

使用std::map作为std::ranges::unique的懒惰版本

相关问题

热门标签

最新问答

使用`std::map`代替范围

使用`std::ranges::unique`

`std::map`存储范围

使用`std::map`作为`std::ranges::unique`的懒惰版本