如何在编辑某些值后将值保存在数据框中

tyg4sfes  于 2021-09-08  发布在  Java
关注(0)|答案(2)|浏览(381)

我有一个看起来像这样的 Dataframe (它包含虚拟数据)-

我想删除每个单元格中出现在“\uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu”标识符之后的文本。我编写了如下代码(逻辑:添加一个包含nan的新列,并将编辑后的值保存在该列中)-

import pandas as pd
import numpy as np

df = pd.read_excel(r'Desktop\Trial.xlsx')

NaN = np.nan
df["Body2"] = NaN

substring = "____________"

for index, row in df.iterrows():
    if substring in row["Body"]:
        split_string = row["Body"].split(substring,1)
        row["Body2"] = split_string[0]

print(df)

但是body2列仍然显示nan,而不是编辑的值。

任何帮助都将不胜感激!

vs91vp4v

vs91vp4v1#

`for index, row in df.iterrows():
    if substring in row["Body"]:
    split_string = row["Body"].split(substring,1)
    #row["Body2"] = split_string[0] # instead use below line         
    df.at[index,'Body2'] = split_string[0]`

使用at修改该值

vs91vp4v

vs91vp4v2#

不要遍历行,而是一次对所有行执行该操作。您可以使用expand将值拆分为多列,我认为这正是您想要的。

substring = "____________"
df = pd.DataFrame({'Body': ['a____________b', 'c____________d', 'e____________f', 'gh']})
df[['Body1', 'Body2']] = df['Body'].str.split(substring, expand=True)
print(df)

# Body Body1 Body2

# 0  a____________b     a     b

# 1  c____________d     c     d

# 2  e____________f     e     f

# 3              gh    gh  None

相关问题