python string pattern into json file [已关闭]

pu3pd22g  于 2023-10-21  发布在  Python
关注(0)|答案(1)|浏览(114)

已关闭此问题为not reproducible or was caused by typos。它目前不接受回答。

此问题是由打印错误或无法再重现的问题引起的。虽然类似的问题可能是on-topic在这里,这一个是解决的方式不太可能帮助未来的读者。
4小时前关闭
Improve this question
我有一个代码,试图在字符串中找到匹配的模式。我想把这些模式存储在JSON文件中,因此我把这些Python字符串转换成JSON格式。但是,它不工作,我得到错误的匹配,不包含正确的值的数量。
下面是我的python模式列表:

patterns = [
    r"^vol\.(\d+),no\.(\d+)\((\d+)\),p\.(\d+)-(\d+)$",       #Vol. 12,  no. 3, (2009), p. 118-120 
    r"^vol\.(\d+),no\.(\d+)\((\d+)\),p\.\[(\d+)-(\d+)\]$", #vol.1,no.7(2022),p.[1-6]
    r"^vol\.(\d+),no\.(\d+)\((\d+)\),p\.(\d+-\d+)-(\d+-\d+)$", #vol.10,no.3(2018),p.03015-1-03015-4
    r"^vol\.(\d+),no\.([\d-]+)\((\d+)\)p\.(\d+)-(\d+)$",
    r"^vol(\d+),no\.([\d-]+)\((\d+)\),p\.(\d+)-(\d+)$",
    r"^vol\.(\d+[a-z]?),no\.(\d+)\((\d+)\),p\.(\d+)-(\d+)$", #Vol. 35A, no. 3 (2004), p. 751-759
    r"^vol\.(\d+),no\.([\d,]+)\((\d+)\),p\.(\d+)-(\d+)$", #Vol. 17, no. 7,8,9,10(2020), p. 573 - 582
    r"^vol\.(\d+),no\.([\w]+)\((\d+)\),p\.(\d+)-(\d+)$" # vol.7,no.6C(2019),p.38-45
    ]

这就是我在JSON中的方法:

{
    "patterns": [
        "^vol\\.(\\d+),no\\.(\\d+)\\((\\d+)\\),p\\.(\\d+)-(\\d+)$",
        "^vol\\.(\\d+),no\\.(\\d+)\\((\\d+)\\),p\\.\\[(\\d+)-(\\d+)\\]$",
        "^vol\\.(\\d+),no\\.(\\d+)\\((\\d+)\\),p\\.(\\d+-\\d+)-(\\d+-\\d+)$",
        "^vol\\.(\\d+),no\\.[\\d-]+\\((\\d+)\\)p\\.(\\d+)-(\\d+)$",
        "^vol(\\d+),no\\.[\\d-]+\\((\\d+)\\),p\\.(\\d+)-(\\d+)$",
        "^vol\\.(\\d+[a-z]?),no\\.(\\d+)\\((\\d+)\\),p\\.(\\d+)-(\\d+)$",
        "^vol\\.(\\d+),no\\.[\\d,]+\\((\\d+)\\),p\\.(\\d+)-(\\d+)$",
        "^vol\\.(\\d+),no\\.[\\w]+\\((\\d+)\\),p\\.(\\d+)-(\\d+)$"
    ]
}

这是我如何使用它们的例子,并帮助打印错误:

config_filename = "configuration\most_used_patterns_Vol_Iss_Year_Page.json"
config = load_config(config_filename)
if config is not None:
    patterns = config.get('patterns', [])
    
for pattern in patterns:
    match = re.match(pattern, row)
    if match:
        print(pattern)
        print(match)
        groups = match.groups()
        print(groups[0])
        print(groups[1])
        print(groups[2])
        print(groups[3])
        volume, issue, year, start_page, end_page = match.groups()

输出值:

^vol\.(\d+),no\.[\d,]+\((\d+)\),p\.(\d+)-(\d+)$
<re.Match object; span=(0, 29), match='vol.37,no.6,7(1987),p.370-377'>
37
1987
370
377
ValueError: not enough values to unpack (expected 5, got 4)

当我不从json文件中读取它们时,只是像我提到的那样将它们以python的形式保存,它就可以工作了。有人能给我解释一下这些json文件有什么问题吗?或者使用不同类型的文件来存储这些字符串模式是否是更好的选择。

62lalag4

62lalag41#

你的JSON序列化好像出错了
最好使用python从输入数据创建JSON:

patterns = [
    r"^vol\.(\d+),no\.(\d+)\((\d+)\),p\.(\d+)-(\d+)$",       #Vol. 12,  no. 3, (2009), p. 118-120 
    r"^vol\.(\d+),no\.(\d+)\((\d+)\),p\.\[(\d+)-(\d+)\]$", #vol.1,no.7(2022),p.[1-6]
    r"^vol\.(\d+),no\.(\d+)\((\d+)\),p\.(\d+-\d+)-(\d+-\d+)$", #vol.10,no.3(2018),p.03015-1-03015-4
    r"^vol\.(\d+),no\.([\d-]+)\((\d+)\)p\.(\d+)-(\d+)$",
    r"^vol(\d+),no\.([\d-]+)\((\d+)\),p\.(\d+)-(\d+)$",
    r"^vol\.(\d+[a-z]?),no\.(\d+)\((\d+)\),p\.(\d+)-(\d+)$", #Vol. 35A, no. 3 (2004), p. 751-759
    r"^vol\.(\d+),no\.([\d,]+)\((\d+)\),p\.(\d+)-(\d+)$", #Vol. 17, no. 7,8,9,10(2020), p. 573 - 582
    r"^vol\.(\d+),no\.([\w]+)\((\d+)\),p\.(\d+)-(\d+)$" # vol.7,no.6C(2019),p.38-45
    ]

import json
with open("patterns.json", "w") as f:
   json.dump({"patterns": patterns}, f, indent=2)

这应该会给你一个给予正确的JSON文件,包含以下内容:

{
  "patterns": [
    "^vol\\.(\\d+),no\\.(\\d+)\\((\\d+)\\),p\\.(\\d+)-(\\d+)$",
    "^vol\\.(\\d+),no\\.(\\d+)\\((\\d+)\\),p\\.\\[(\\d+)-(\\d+)\\]$",
    "^vol\\.(\\d+),no\\.(\\d+)\\((\\d+)\\),p\\.(\\d+-\\d+)-(\\d+-\\d+)$",
    "^vol\\.(\\d+),no\\.([\\d-]+)\\((\\d+)\\)p\\.(\\d+)-(\\d+)$",
    "^vol(\\d+),no\\.([\\d-]+)\\((\\d+)\\),p\\.(\\d+)-(\\d+)$",
    "^vol\\.(\\d+[a-z]?),no\\.(\\d+)\\((\\d+)\\),p\\.(\\d+)-(\\d+)$",
    "^vol\\.(\\d+),no\\.([\\d,]+)\\((\\d+)\\),p\\.(\\d+)-(\\d+)$",
    "^vol\\.(\\d+),no\\.([\\w]+)\\((\\d+)\\),p\\.(\\d+)-(\\d+)$"
  ]
}

将其阅读回Python并与原始数据进行比较,应该可以证明它存储正确。

相关问题