Python / pandas：比较多个列，每次创建一个新列

ybzsozfc 于 2024-01-04 发布在 Python

关注(0)|答案(2)|浏览(112)

我有一个巨大的表，有数百列，我必须比较它们，以找出2列之间的差异。要比较的列可以很容易地通过它们的后缀“..._x”和“..._y”找到。第一个名称是相同的，只有后缀的变化。
例如：
| cost_x| cost_y|数量_x|数量_y| x型|Y型|
| --|--|--|--|--|--|
| 1 | 1 | 1 | 0 | 1 | 1 |
| 1 | 0 | 1 | 1 | 0 | 1 |
每次我进行比较时，都会创建一个后缀为“..._change”的新列。
| cost_x| cost_y|数量_x|数量_y| x型|Y型|成本变动|数量变化|类型变化|
| --|--|--|--|--|--|--|--|--|
| 1 | 1 | 1 | 0 |一|一| 1 | 0 | 1 |
| 1 | 0 | 0 | 0 |B| C| 0 | 1 | 0 |
到目前为止，我尝试用pandas中的定义来做这件事：

def label_check1(row):
  if row['cost_x'] == row ['cost_y']:     return 1 
  return 0

def label_check2(row):
  if row['amount_x'] == row ['amount_y']:     return 1 
  return 0

def label_check3(row):
  if row['type_x'] == row ['type_y']:     return 1 
  return 0

result_df['cost_change'] = result_df.apply(label_check1, axis=1)
result_df['amount_change'] = result_df.apply(label_check2, axis=1)
result_df['type_change'] = result_df.apply(label_check3, axis=1)

字符串
通过这样做，可能可以比较6列，但是，我必须比较app.85列，我想在某种循环中或者在UDF中进行比较。列的名称都是已知的，我也有一个列的列表。
有人知道如何用更少的代码做得更好吗？

pandas

来源：https://stackoverflow.com/questions/77654258/python-pandas-compare-several-columns-and-create-a-new-column-each-time

2条答案

按热度按时间

hi3rlvi21#

给定列的模式，您可以简单地提取所有独特的功能（下划线之前的部分），并使用for循环来创建新列：

features = pd.Series(df.columns).apply(lambda s: s.split("_")[0]).unique()
# array(['cost', 'amount', 'type'], dtype=object)

for v in features:
    df[v+"_change"] = (df[v+"_x"] == df[v+"_y"]).astype(int)

字符串
这给

cost_x  cost_y  amount_x  amount_y  type_x  type_y  cost_change  amount_change  type_change
0       1       1         1         0       1       1            1              0            1
1       1       0         1         1       0       1            0              1            0

型

赞(0）回复(0）举报 2024-01-04

xxhby3vn2#

您可以将列重新制作为MultiIndex，然后分别对x和y进行切片，将它们和join与原始DataFrame进行比较：

tmp = df.set_axis(df.columns.str.split('_', expand=True), axis=1).swaplevel(axis=1)

out = df.join(tmp['x'].eq(tmp['y']).astype(int).add_suffix('_change'))

字符串
输出量：

cost_x  cost_y  amount_x  amount_y  type_x  type_y  cost_change  amount_change  type_change
0       1       1         1         0       1       1            1              0            1
1       1       0         1         1       0       1            0              1            0

型

赞(0）回复(0）举报 2024-01-04

我来回答

Python / pandas：比较多个列，每次创建一个新列

2条答案

相关问题

热门标签

最新问答