pandas 为iloc和loc的行分配新值会产生不同的结果,如何避免出现与iloc相同的SettingToCopyWarning？

pgvzfuti 于 2022-11-20 发布在其他

关注(0)|答案(2)|浏览(209)

bounty将在23小时后过期。回答此问题可获得+200声望奖励。Sean正在寻找来自知名来源的答案。

我当前有一个形状为（16280，13）的DataFrame。我想将值分配给单个列中的特定行。我最初是使用以下对象执行此操作的：

for idx, row in enumerate(df.to_dict('records')):
    instances = row['instances']
    labels = row['labels'].split('|')

    for instance in instances:
        if instance not in relevant_labels:
            labels = ['O' if instance in l else l for l in labels]

        df.iloc[idx]['labels'] = '|'.join(labels)

但由于最后一行的原因，它一直返回SettingWithCopyWarning。我尝试将其更改为df.loc[idx, 'labels'] = '|'.join(labels)，它不再返回警告，但在代码的后面部分导致了错误。
我注意到使用iloc时DataFrame的大小为（16280，13），使用loc时为（16751，13）。
如何防止打印警告并获得与使用iloc相同的功能？

pandas

来源：https://stackoverflow.com/questions/74383862/assigning-new-values-to-rows-with-iloc-and-loc-produce-different-results-how-do

2条答案

按热度按时间

g6ll5ycj1#

您有许多地方需要我们改进。
首先，尽量不要在 Dataframe 上循环，而是使用panda包提供的一些工具。但是，如果无法避免，在 Dataframe 的行上循环最好使用.iterrows()方法，而不是.to_dict()。记住，如果使用iterrows，在迭代时不应该修改 Dataframe 。
然后，对于iloc/loc使用。Loc使用的是键名（像字典一样），而iloc使用的是键索引（像数组一样）。这里idx是索引，而不是键名，那么如果键名与索引不相同，df.loc[idx, 'labels']将导致一些错误。我们可以很容易地使用它们，如下所示：df.iloc[idx, : ].loc['labels'] .
要说明loc和iloc之间的区别：

df_example = pd.DataFrame({"a": [1, 2, 3, 4],
                           "b": ['a', 'b', 'a', 'b']},
                          index=[0, 1, 3, 5])

print(df_example.loc[0] == df_example.iloc[0])  # 0 is the first key, loc and iloc same results
print(df_example.loc[1] == df_example.iloc[1])  # 1 is the second key, loc and iloc same results
try:
    print(df_example.loc[2] == df_example.iloc[2])  # 2 is not a key, then it will crash on loc (Keyerror)
except KeyError:
    pass
print(df_example.loc[3] == df_example.iloc[3])  # 3 the third key, then iloc and loc will lead different results
try:
    print(df_example.loc[5] == df_example.iloc[5])  # 5 is the last key but there is no 6th key so it will crash on iloc (indexerror)
except IndexError:
    pass

请记住，链接 Dataframe 将返回数据的副本，而不是切片：这就是为什么df.iloc[idx]['labels']和df.iloc[idx, : ].loc['labels']都会触发警告。如果labels是第i列，则df.iloc[idx, i ]不会触发警告。

赞(0）回复(0）举报 2022-11-20

xkrw2x1b2#

请注意，在您的情况下，SettingWithCopyWarning是一个有效的警告，因为链接的赋值未按预期工作。df.iloc[idx]返回切片的副本，而不是原始对象中的切片。因此，df.iloc[idx]['labels'] = '|'.join(labels)修改行的副本，而不是原始df的行。当 Dataframe 具有混合数据类型时，似乎会发生这种情况。
关于.loc和.iloc的不同结果，这是因为行标签与行整数位置不同（可能是由于训练测试拆分）。当行标签不存在时，.loc无法在现有行中找到它，因此它生成新行（.loc获取带有行（和/或列）标签的行（和/或列），而.iloc获取带有整数位置的行（和/或列）。）
请在解答后找到示例。

溶液

基本思想：您应该避免链式赋值并且使用正确的标签/整数位置。

溶液1：`reset_index`和`.loc`

如果不需要保留行索引，一个解决方案是在代码之前执行reset_index，并使用df.loc[idx, 'labels'] = '|'.join(labels)。
第一个
这将使 Dataframe 行标签与行整数位置相同。因此.loc[n, 'labels']与.iloc[n, 'labels']引用相同的内容。

解决方案2：使用列整数位置“labels”和`.iloc`

范例：将第4列的labels更新为100

col_idx = df.columns.get_loc("labels")  # get the column integer locations of 'labels'
df.iloc[3, col_idx] = 100
df

    instances   labels
0           a        1
2           b        2
4           c        3
5           d      100

更多示例

有效`SettingWithCopyWarning`的示例

import pandas as pd

df = pd.DataFrame({'instances': ["a", "b", "c", "d"],
                   'labels': [1, 2, 3, 4]},
                   index=[0, 2, 4, 5])
df

    instances   labels
0           a        1
2           b        2
4           c        3
5           d        4

假设我想将第一行的labels更新为100。

df.iloc[0]['labels'] = 100
df

它返回了警告，并且无法更新该值。

/usr/local/lib/python3.7/dist-packages/pandas/core/series.py:1056: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cacher_needs_updating = self._check_is_chained_assignment_possible()

    instances   labels
0           a        1
2           b        2
4           c        3
5           d        4

如果所有列都具有相同的数据类型（例如：所有str，所有int），iloc将工作，并且不会返回SettingWithCopyWarning。显然，pandas在进行链式赋值时处理混合类型和单类型 Dataframe 的方式不同。引用指向此Github issue的post。
您也可以阅读post或pandas文档，以更好地理解链接赋值。

`.loc`的附加行示例

df

    instances   labels
0           a        1
2           b        2
4           c        3
5           d        4

在我们的示例中，行标签为(0, 2, 4, 5)，而行整数位置为(0, 1, 2, 3)。当您将.loc与不存在的标签一起使用时，它将创建一个新行。

df.loc[1, 'labels'] = 100
df

    instances   labels
0           a        1
2           b        2
4           c        3
5           d        4
1         NaN      100

赞(0）回复(0）举报 2022-11-20

我来回答

pandas 为iloc和loc的行分配新值会产生不同的结果,如何避免出现与iloc相同的SettingToCopyWarning？

2条答案

溶液

溶液1：`reset_index`和`.loc`

解决方案2：使用列整数位置“labels”和`.iloc`

更多示例

有效`SettingWithCopyWarning`的示例

`.loc`的附加行示例

相关问题

热门标签

最新问答

pandas 为iloc和loc的行分配新值会产生不同的结果,如何避免出现与iloc相同的SettingToCopyWarning？

2条答案

溶液

溶液1：reset_index和.loc

解决方案2：使用列整数位置“labels”和.iloc

更多示例

有效SettingWithCopyWarning的示例

.loc的附加行示例

相关问题

热门标签

最新问答

溶液1：`reset_index`和`.loc`

解决方案2：使用列整数位置“labels”和`.iloc`

有效`SettingWithCopyWarning`的示例

`.loc`的附加行示例