从另一个表中获取列值以在主表中创建新列：Pandas合并带方括号

tkclm6bt 于 2023-01-01 发布在其他

关注(0)|答案(1)|浏览(156)

正如您在下面看到的，我有两个表主表和引用表。在主表中，我有一个列"Subject"，它包含由"，"分隔的"[]"中的tr_id。我使用"tr_id"将它与引用表匹配，以便在主表中将"test_no"作为"Linked_Test_No"获取。
主表：

my_id    Name            Subject
12       Ash             The test [101 , 105]
15       Brock           The testing of the subject [101,102]
16       Misty           Subject Test [102,106]
18       Tracy           Subject Testing [101]
10       Oak             Test 
19       Paul            Testing []
21       Gary            Testing :  [107]
44       Selena          Subject : [104]

参考表：

tr_id      latest_em                                     test_no

101     pichu@icloud.com; paul@gmail.com                  120                            
102     ash@yahoo.com                                     130            
103     squirtle@gmail.com                                160
104     charmander@gmail.com                              180                              
105     ash@yahoo.com;misty@yahoo.com                     100

目前，我正在使用str.extract()获取tr_id，然后使用pd.merge()连接两个表，然后将test_no整理到一个列"Linked_Test_no"中，这有很多步骤;我们能用很少的代码行来实现这个吗我的编程技术很基础。
预期输出：

my_id    Name            Subject                                   Linked_Test_No
12       Ash             The test [101 , 105]                      [120,100]
15       Brock           The testing of the subject [101,102]      [120,130]
16       Misty           Subject Test [102,106]                    [130]
18       Tracy           Subject Testing [101]                     [120]
10       Oak             Test                                       
19       Paul            Testing []                           
21       Gary            Testing :  [107]                          
44       Selena          Subject : [104]                           [180]

pandas

来源：https://stackoverflow.com/questions/74947797/fetching-the-column-values-from-another-table-to-create-a-new-column-in-the-main

1条答案

按热度按时间

qncylg1j1#

下面的代码为链接分数生成两列。一列忽略引用表（df_s）中不存在的tr_id，另一列在缺少tr_id时使用None。请根据您的用例使用其中一列。

# dict of id and score
s_map = df_s.set_index('tr_id')['test_no'].to_dict()

# extract subcode and store in list
df_m['Subcode'] = df_m['Subject'].str.extract('(\[.*\])', expand=False).apply(lambda x: [] if pd.isnull(x) else eval(x))

# fetch score from s_map 
df_m['Linked_Test_No'] = df_m['Subcode'].apply(lambda x: [s_map[xi] for xi in x if xi in s_map])

# similar to above line but it represent missing values in s_map by None instead of ignoring them
df_m['Linked_Test_No_alt'] = df_m['Subcode'].apply(lambda x: [s_map.get(xi, None) for xi in x])

赞(0）回复(0）举报 2023-01-01

我来回答

从另一个表中获取列值以在主表中创建新列：Pandas合并带方括号

1条答案

相关问题

热门标签

最新问答