使用Python移除在Markdown中没有无序子列表的无序列表

vmjh9lq9  于 2023-03-16  发布在  Python
关注(0)|答案(1)|浏览(141)

首先,看看它在Markdown中的样子:

# Known

* Languages with Latin alphabet:
  * English
  * French
  * Portuguese

* Languages with Greek alphabet:
  * Greek

* Languages with Armenian alphabet:

* Languages with Ethiopic script:

* Languages with Tamil script:

* Languages with Hiragana characters:
  * Japanese

* Languages with Klingon script:

# Wanted

**These wanted languages:**

* Languages with Sumero-Akkadian script:

* Languages with Hangul characters:
  * Korean

* Languages with Linear A script:

仔细观察那些没有子列表或有新行的列表应该被删除,并添加一个新行。
我尝试使用正则表达式和^$\n来删除这些行,并留下了注解,以便您更好地理解:

file_name = "Language support"

def remove_empty_lists():
  # It will read the file
  with open("{}.md".format(file_name), "r") as f:
    # It will search all the lines of the whole text
    lines = f.readlines()
    for line in lines:

      # It will find all the lines that contain "* Language with ..."
      if re.search(r"\* Languages with", line):
       # then if these contained lines have the symbol ":"
        if re.search(r":$", line):
        # It will find the empy new line
          if re.search(r"^$\n", line):
            # Then finally it will remove the list and the new line
            lines.remove(line)
  # It will rewrite the same file
  with open("{}.md".format(file_name), "w") as f:
    for line in lines:
      f.write(line)

我希望它看起来像这样:

# Known

* Languages with Latin alphabet:
  * English
  * French
  * Portuguese

* Languages with Greek alphabet:
  * Greek

* Languages with Hiragana characters:
  * Japanese

# Wanted

**These wanted languages:**

* Languages with Hangul characters:
  * Korean

我就是这么试的。

nmpmafwu

nmpmafwu1#

这样做的结果正好相反:你写一个正则表达式来找到你想要的东西而不是你不想要的东西。
你可能想使用这个网站来轻松创建正则表达式。例如,如果你想保留新行或文件的其他部分,你可以让正则表达式也匹配这些。https://coding.tools/regex-replace

import re
file_name = "md.md"

# regex of things we want to keep
regex = re.compile(r"^\*\s.*\n([ \t]+.*\n)+", re.MULTILINE)

with open(file_name, "r") as f:
    # read all of the file, not line by line
    lines = f.read()

    clean_file = ""
    for res in regex.findall(lines):
        clean_file += res

    print(clean_file)

输出为

* Languages with Latin alphabet:
  * English
* Languages with Greek alphabet:
  * Greek
* Languages with Hiragana characters:
  * Japanese
* Languages with Hangul characters:
  * Korean

相关问题