pandas 如何根据多个条件删除行?

dl5txlt9  于 2022-12-21  发布在  其他
关注(0)|答案(3)|浏览(182)

我有以下** Dataframe 设置**:

dic = {'customer_id': [102, 102, 105, 105, 110, 110, 111], 
       'product':['skateboard', 'skateboard', 'skateboard', 'skateboard', 'shoes', 'skateboard', 'skateboard'],
       'brand': ['Vans', 'Converse', 'Vans', 'Converse', 'Converse','Converse', 'Vans'],
      'membership': ['member', 'not-member', 'not-member', 'not-member', 'member','not-member', 'not-member']}
df = pd.DataFrame(dic)

**要求:**如果客户是任何品牌的“成员”,我需要删除customer_id和产品粒度的成员资格为“非成员"的行。

例如,在上面的 Dataframe 中,我们删除了产品为“skateboard”的客户“102”,其中会员资格为“non-member”,因为他们已经是某个品牌(Vans)的会员。我们没有删除105,因为他们不是任何品牌的会员。我们没有删除110,因为产品不同。
因此,输出应如下所示:

**我的方法:**首先列出唯一的customer_id + product(例如:102_skateboard).然后循环遍历列表,然后过滤掉唯一客户-产品对的 Dataframe ,然后检查 Dataframe 是否包含成员,如果为真,则删除非成员行.这给了我预期的输出,但我想知道是否有更好的方法来做到这一点.

df['customer_product'] = df['customer_id'].astype(str) + '_' + df['product']
unique_customer_product = df['customer_product'].unique()
for pair in unique_customer_product:
    filtered_df = df[df['customer_product'] == pair]
    if 'member' in filtered_df['membership'].values:
        df = df.drop(df[(df.customer_product == pair) & (df.membership == 'not-member')].index)
0dxa2lsx

0dxa2lsx1#

创建一个辅助布尔列is_member,这样就可以非常直接地表达条件。

df['is_member'] = df.membership.eq('member')
drop_mask = (
    ~df['is_member']
    & df.groupby(['customer_id', 'product'])['is_member'].transform('any')
)
df = df.loc[~drop_mask].reset_index(drop=True)
e3bfsja2

e3bfsja22#

下面是一种方法:

df = df.sort_values(by=["customer_id", "product", "brand", "membership"])

df = pd.concat(
    [
        df.loc[df["customer_id"] == idx, :].drop_duplicates(["product"], keep="last")
        if not df.loc[df["customer_id"] == idx, :]
        .drop_duplicates(["product"], keep="last")
        .pipe(lambda df_: df_.loc[df_["membership"] == "member", :])
        .empty
        else df.loc[df["customer_id"] == idx, :]
        for idx in df["customer_id"].unique()
    ],
).sort_values(["customer_id", "brand"], ascending=[True, False], ignore_index=True)

然后:

print(df)
# Output
   customer_id     product     brand  membership
0          102  skateboard      Vans      member
1          105  skateboard      Vans  not-member
2          105  skateboard  Converse  not-member
3          110       shoes  Converse      member
4          110  skateboard  Converse  not-member
5          111  skateboard      Vans  not-member
c0vxltue

c0vxltue3#

df['row'] = True

def my_func(x):
    aaa = x[x['membership'] == 'not-member']
    if len(x[x['membership'] == 'member']) > 0 and len(aaa) > 0:
        df.loc[aaa.index, 'row'] = False

df.groupby(['customer_id', 'product']).apply(my_func)

df = df[df['row']].reset_index().drop(['index', 'row'], axis=1)

print(df)

产出

customer_id     product     brand  membership
0          102  skateboard      Vans      member
1          105  skateboard      Vans  not-member
2          105  skateboard  Converse  not-member
3          110       shoes  Converse      member
4          110  skateboard  Converse  not-member
5          111  skateboard      Vans  not-member

这里是创建'row'帮助器列的位置。行按列'customer_id'、'product'为grouped,行被传递到applyapply调用my_func函数。函数中有一个检查:必须存在值“not-member”和“member”。如果是,请使用loc将“不是成员”设置为False。

相关问题