regex 在字典列表中搜索子字符串

8aqjt8rx  于 2022-11-26  发布在  其他
关注(0)|答案(2)|浏览(181)

我有一个字典列表
我需要搜索“接收器”键,并且只输出与任何其他指令共享接收器值内最后X个字符的指令。
在本例中,我们将针对所有其他Receiver值搜索每个Receiver值的最后3个字符。
这是我目前所拥有的

transactions = [
{"Receiver":"alice111","Amount":50},
{"Receiver":"alice222","Amount":60},
{"Receiver":"alice111","Amount":70},
{"Receiver":"bob111","Amount":50},
{"Receiver":"bob222","Amount":150},
{"Receiver":"bob333","Amount":100},
{"Receiver":"kyle444","Amount":260},
{"Receiver":"richard555","Amount":260}
]
new_list=[]

for value in transactions:
    receiver = value["Receiver"]
    last_3 = receiver[-3:]
    #print(receiver)
    #print(last_3)
    for substring in transactions:
        if re.search(last_3 + r"$",substring["Receiver"]):
            #print("MATCH" + str(substring))
            new_list.append(substring)

print(new_list)
#[{'Receiver': 'alice111', 'Amount': 50}, {'Receiver': 'alice111', 'Amount': 70}, {'Receiver': 'bob111', 'Amount': 50}, {'Receiver': 'alice222', 'Amount': 60}, {'Receiver': 'bob222', 'Amount': 150}, {'Receiver': 'alice111', 'Amount': 50}, {'Receiver': 'alice111', 'Amount': 70}, {'Receiver': 'bob111', 'Amount': 50}, {'Receiver': 'alice111', 'Amount': 50}, {'Receiver': 'alice111', 'Amount': 70}, {'Receiver': 'bob111', 'Amount': 50}, {'Receiver': 'alice222', 'Amount': 60}, {'Receiver': 'bob222', 'Amount': 150}, {'Receiver': 'bob333', 'Amount': 100}, {'Receiver': 'kyle444', 'Amount': 260}, {'Receiver': 'richard555', 'Amount': 260}]

不幸的是,这完全是错误的,因为它多次重复相同的值。如果列表更长,这将是一场灾难。
期望输出
[{"Receiver":"alice111","Amount":50},{"Receiver":"alice222","Amount":60},{"Receiver":"alice111","Amount":70},{"Receiver":"bob111","Amount":50},{"Receiver":"bob222","Amount":150}]
应省略以下内容

[{"Receiver":"bob333","Amount":100},{"Receiver":"kyle444","Amount":260},{"Receiver":"richard555","Amount":260}
]

正如您所看到的,在任何其他receiver值中没有“333”、“444”或“555”作为最后一个字符,因此它们被省略,因为我对输出unique不感兴趣
Update:
如果我希望匹配那些不具有相同的字符前缀(在最后3个字符后缀之前)的条目,

transactions1 = [
{"Receiver":"alice111","Amount":50},
{"Receiver":"alice111","Amount":70},
{"Receiver":"bob222","Amount":50},
{"Receiver":"bob222","Amount":150},
{"Receiver":"bob222","Amount":100},
{"Receiver":"richard111","Amount":260},
{"Receiver":"bob333","Amount":100},
{"Receiver":"alice333","Amount":300},

]

新的期望输出:
[{"Receiver":"alice111","Amount":50}, {"Receiver":"alice111","Amount":70},{"Receiver":"richard111","Amount":50},{"Receiver":"bob333","Amount":100},{"Receiver":"alice333","Amount":300}]
我们只在以下情况下匹配:

  • 后缀的最后3个字符匹配,并且存在不同的名称前缀
    希望你听清楚了。
atmip9wb

atmip9wb1#

我希望我没理解错你的问题。从你的问题中得到了新的启示:

transactions1 = [
    {"Receiver": "alice111", "Amount": 50},
    {"Receiver": "alice111", "Amount": 70},
    {"Receiver": "bob222", "Amount": 50},
    {"Receiver": "bob222", "Amount": 150},
    {"Receiver": "bob222", "Amount": 100},
    {"Receiver": "richard111", "Amount": 260},
    {"Receiver": "bob333", "Amount": 100},
    {"Receiver": "alice333", "Amount": 300},
]

tmp = {}
for t in transactions1:
    suffix = t["Receiver"][-3:]
    tmp.setdefault(suffix, set()).add(t["Receiver"])

out = [t for t in transactions1 if len(tmp[t["Receiver"][-3:]]) > 1]
print(out)

印刷品:

[
    {"Receiver": "alice111", "Amount": 50},
    {"Receiver": "alice111", "Amount": 70},
    {"Receiver": "richard111", "Amount": 260},
    {"Receiver": "bob333", "Amount": 100},
    {"Receiver": "alice333", "Amount": 300},
]
noj0wjuj

noj0wjuj2#

您可以先计算出现次数,然后根据计数筛选清单。

from collections import Counter

transactions = [
    {"Receiver":"alice111","Amount":50},
    {"Receiver":"alice222","Amount":60},
    {"Receiver":"alice111","Amount":70},
    {"Receiver":"bob111","Amount":50},
    {"Receiver":"bob222","Amount":150},
    {"Receiver":"bob333","Amount":100},
    {"Receiver":"kyle444","Amount":260},
    {"Receiver":"richard555","Amount":260}
]

counter = Counter(transaction['Receiver'][-3:] for transaction in transactions)
output = [transaction for transaction in transactions if counter[transaction['Receiver'][-3:]] > 1]

print(output)
# [{'Receiver': 'alice111', 'Amount': 50},
#  {'Receiver': 'alice222', 'Amount': 60},
#  {'Receiver': 'alice111', 'Amount': 70},
#  {'Receiver': 'bob111', 'Amount': 50},
#  {'Receiver': 'bob222', 'Amount': 150}]

相关问题