在保持顺序的同时对缩进的文本块进行排序

anhgbhbe  于 2021-08-25  发布在  Java
关注(0)|答案(1)|浏览(432)

我有一块文本需要重新排列(使用python),如下所示:

foo
    bar
        inner 1
        inner 3
        inner 2
    another
        stuff c
        stuff b
        stuff a
    more
        items z
        items x
        items y

这个排序函数的输出必须如下所示

foo
    another
        stuff a
        stuff b
        stuff c
    bar
        inner 1
        inner 2
        inner 3
    more
        items x
        items y
        items z

重要的细节包括:
如上面的示例所示,每个新的“深度”用4个空格表示

. 这在整个文本中是一致的。
在每个深度,项目应按字母顺序排序。但是,即使在排序之后,树的结构也必须保持不变。所以“物料a/b/c”必须始终将“bar”作为其父级。“项目x/y/z”必须始终将“更多”作为其父项。
这里有一个尝试,接近工作,但不完全。

import re
import textwrap

_EXPECTED_INDENTATION = "    "
_PARSER = re.compile(r"(?P<indentation>\s*)(?P<words>.+)")

def _iter_lists(item):
    if not isinstance(item, list):
        return

    yield item

    for group in item:
        for inner in _iter_lists(group):
            yield inner

def _group_by_depth(names):
    previous_depth = -1
    all_groups = []
    inner_group = []

    for depth, name in names:
        if previous_depth != -1 and depth != previous_depth:
            all_groups.append(inner_group)
            inner_group = []

        inner_group.append((depth, name))
        previous_depth = depth

    if inner_group:
        # Add the last group, just in case it was missed
        all_groups.append(inner_group)

    return all_groups

def _parse_by_depth(text):
    output = []

    for line in text.split("\n"):
        if not line.strip():
            continue

        match = _PARSER.match(line)
        count = int(match.group("indentation").count(_EXPECTED_INDENTATION))
        word = match.group("words")
        output.append((count, word))

    return output

def _sort_all(all_groups):
    for group in all_groups:
        for inner in _iter_lists(group):
            inner.sort()

def flatten_sequence(sequence):
    if not sequence:
        return sequence

    if isinstance(sequence[0], list):
        return flatten_sequence(sequence[0]) + flatten_sequence(sequence[1:])

    return sequence[:1] + flatten_sequence(sequence[1:])

def main():
    """Run the main execution of the current script."""
    text = textwrap.dedent(
        """\
        foo
            bar
                inner 1
                inner 3
                inner 2
            another
                stuff c
                stuff b
                stuff a
            more
                items z
                items x
                items y
        """
    )

    names = _parse_by_depth(text)

    # `_parse_by_depth` should generate
    # names = [
    #     (0, 'foo'),
    #         (1, 'bar'),
    #             (2, 'inner 1'),
    #             (2, 'inner 3'),
    #             (2, 'inner 2'),
    #         (1, 'another'),
    #             (2, 'stuff c'),
    #             (2, 'stuff b'),
    #             (2, 'stuff a'),
    #         (1, 'more'),
    #             (2, 'items z'),
    #             (2, 'items x'),
    #             (2, 'items y'),
    # ]

    all_groups = _group_by_depth(names)
    _sort_all(all_groups)

    flattened = flatten_sequence(all_groups)

    for depth, name in flattened:
        print("{indentation}{name}".format(indentation=_EXPECTED_INDENTATION * depth, name=name))

if __name__ == "__main__":
    main()

但它不起作用

foo
    bar
        inner 1
        inner 2
        inner 3
    another
        stuff a
        stuff b
        stuff c
    more
        items x
        items y
        items z

因为 _sort_all 只能对连续块进行正确排序。e、 g.“内部1/2/3”和“物料a/b/c”将被正确排序,但父项(如酒吧、另一家等)的顺序仍然错误。如何修改 _group_by_depth 和/或 _sort_all 要得到预期的订单?

8xiog9wr

8xiog9wr1#

我建议采取这种做法:
我们可以将输入解释为一个包含几列的表,其中缩进对应于跳转到下一列。假定跳过的列与“父”行具有相同的值。我们可以想象,此表删除了那些“重复值”:
第1列第2列第3列foo(foo)bar(foo)(bar)internal 1(foo)(bar)internal 3(foo)(bar)internal 2(foo)other(foo)(其他)stuff c(foo)(其他)stuff b(foo)(其他)stuff a(foo)more(foo)(更多)项目z(foo)(更多)项目x(foo)(更多)项目y
一个想法是构建这个2d列表(包括重复值),然后对其排序,然后将其转换回原始格式。
以下是代码:

def sort_indented_text(text, spacing):
    data = []
    row = []
    for line in text.splitlines():
        stripped = line.lstrip()
        row = row[0:(len(line) - len(stripped)) // spacing] + [stripped]
        data.append(row)

    return "\n".join(
        " " * (spacing * (len(row) - 1)) + row[-1] for row in sorted(data)
    )

您可以按如下方式使用它:

text = """foo
    bar
        inner 1
        inner 3
        inner 2
    another
        stuff c
        stuff b
        stuff a
    more
        items z
        items x
        items y"""

print(sort_indented_text(text, 4))

相关问题