我有一些数据集,它有两列:ID和SEQUENCEL_RESULT。
Dataframe 如下所示,列1d0d1e已经用literal_eval
求值:
id list_of_sequencies
2 [(74, [1-1]), (51, [1-1, 0-47]), (23, [1-2]), (18, [1-2, 0-46]), (10, [0-1, 1-1]), (9, [0-1, 1-1, 0-46]), (9, [1-1, 0-46]), (6, [1-3]), (5, [0-2, 1-1]), (5, [1-1, 0-45])]
3 [(61, [1-1]), (24, [1-2]), (18, [0-1, 1-1]), (14, [1-8]), (14, [1-8, 0-40]), (12, [1-3]), (12, [1-6]), (11, [1-1, 0-47]), (10, [0-2, 1-1]), (10, [1-2, 0-46]), (2, [0-1, 1-1, 0-46])]
4 [(frequency,[pattern-A,pattern-B,pattern-C,...]),(...),...]
...
每个序列列表如下所示,每个元组包含一个频率和一个列表。
[
(269, [1 - 5]),
(260, [1 - 5, 0 - 40]),
(171, [0 - 3, 1 - 5]),
(167, [0 - 3, 1 - 5, 0 - 40]),
(162, [1 - 1]),
(105, [1 - 1, 0 - 40]),
(105, [1 - 6]),
(86, [1 - 1, 1 - 5]),
(84, [1 - 1, 1 - 5, 0 - 40]),
(83, [1 - 6, 0 - 39]),
]
or
[
(178, ["1-9"]),
(140, ["1-9", "0-39"]),
(102, ["1-10"]),
(87, ["1-10", "0-38"]),
(75, ["1-1"]),
(53, ["1-8"]),
(50, ["0-1", "1-1"]),
(35, ["1-8", "0-40"]),
(32, ["1-9", "1-1"]),
(30, ["1-1", "0-36"]),
]
如何制作一个函数,让我可以很容易地根据内部列表的数量对它们进行排名?就像我输入一个序列:[0-1, 1-1, 0-46]
一样,该函数可以找到我输入的所有匹配项,并根据频率进行排名。那么结果表应该类似于[2,3],因为[0-1, 1-1, 0-46]
在id=2中出现9次,在id=3中出现2次。
正如@mozway所要求的。生品
{'id': ['1', '2', '3', '4', '5'],
'list_of_sequencies': ["[(8, ['1-1']), (4, ['0-3', '1-1']), (2, ['0-4', '1-1']), (2, ['1-2']), (1, ['1-1', '0-3']), (1, ['1-1', '0-41']), (1, ['1-1', '0-42']), (1, ['1-1', '0-43']), (1, ['1-1', '0-44']), (1, ['1-1', '0-45'])]",
"[(15, ['1-1']), (5, ['0-1', '1-1']), (4, ['0-2', '1-1']), (4, ['1-1', '1-1']), (3, ['0-4', '1-1']), (3, ['1-1', '0-4']), (3, ['1-1', '0-4', '1-1']), (3, ['1-1', '0-40']), (3, ['1-1', '0-46']), (3, ['1-3'])]",
"[(16, ['1-1']), (7, ['1-2']), (4, ['0-1', '1-1']), (4, ['1-2', '0-46']), (3, ['1-1', '0-42']), (3, ['1-3']), (2, ['1-1', '0-40']), (2, ['1-1', '0-41']), (2, ['1-1', '0-47']), (2, ['1-1', '1-1'])]",
"[(74, ['1-1']), (51, ['1-1', '0-47']), (23, ['1-2']), (18, ['1-2', '0-46']), (10, ['0-1', '1-1']), (9, ['0-1', '1-1', '0-46']), (9, ['1-1', '0-46']), (6, ['1-3']), (5, ['0-2', '1-1']), (5, ['1-1', '0-45'])]",
"[(178, ['1-9']), (140, ['1-9', '0-39']), (102, ['1-10']), (87, ['1-10', '0-38']), (75, ['1-1']), (53, ['1-8']), (50, ['0-1', '1-1']), (35, ['1-8', '0-40']), (32, ['1-9', '1-1']), (30, ['1-1', '0-36'])]"]}
如果我的输入是:['0-1', '1-1']
,结果将如下所示,并且顺序完全相同,如下所示:
ID 5包含:(50,[‘0-1’,‘1-1’])
ID 4:(10,[‘0-1’,‘1-1’])
ID 2:(5,[‘0-1’,‘1-1’])
ID 3:(4,[‘0-1’,‘1-1’])
{'id': ['5', '4', '2', '3', and their list_of_sequencies (don't want copy it) }
1条答案
按热度按时间ubof19bj1#
您可以使用列表理解来筛选所需的项目,并对其频率求和,然后对数据进行排序:
输出:
考虑频率
输出: