Python Pandas模块使用NaN填充DataFrame中的列，即使输入是来自另一个DataFrame的列

dgiusagp 于 2024-01-04 发布在 Python

关注(0)|答案(3)|浏览(107)

代码如下：

import pandas as pd
text =  pd.DataFrame(["it", "never", "forget", "it", "hello", "listener's", "books", "at", "cya", "in", "the", "village", "deliberate", "mistake", "hello", "again", "i'd", "seen", "the", "thing", "and", "i'd", "love", "to", "check"])

c_mask = text[0] == "i'd"
v_mask = c_mask.shift(fill_value=False)

check_c = pd.DataFrame()
check_c["contractions"] = text[c_mask]
check_c["followup"] = text[v_mask]
print(check_c)

Out[46]
   contractions followup
16          i'd      NaN
21          i'd      NaN

字符串
我怎么也想不通！

check_c["contractions"] = text[c_mask]

check_c["followup"] = text[v_mask]

型
据我所知，这两行是相同的。此外，先做“followup”列，然后是“contractions”，使“followup”正常填充，“contractions”用NaN。我以为这可能是索引问题，但使用.reset_index（）方法没有帮助，在尝试将第二行添加为列之前将其转换为Series也没有帮助。有人能解释一下发生了什么吗？为什么会发生这种情况？

pandas

来源：https://stackoverflow.com/questions/77727271/python-pandas-module-fills-a-column-in-a-dataframe-with-nan-even-though-the-inp

3条答案

按热度按时间

6xfqseft1#

我设法解决了这个问题，通过编辑第二行：

check_c["followup"] = text.loc[v_mask,0].values

字符串
我想这是一个索引的问题，但我仍然不确定。如果有人能解释一下那里实际发生了什么，我会非常感激。

赞(0）回复(0）举报 2024-01-04

kx7yvsdv2#

有不同的索引问题，因为移位掩码，所以新列由NaN s填充，因为第二个掩码不存在index=17,22。

print(text[v_mask])
       0
17  seen
22  love

print(text[c_mask])
      0
16  i'd
21  i'd

字符串
另一个问题是一个列的DataFrame，所以不能创建一维数组，如果想像你的解决方案中那样赋值，需要Series：

print(text[v_mask].to_numpy())
[['seen']
 ['love']]

print(text.loc[v_mask, 0].to_numpy())
['seen' 'love']

型
如果"i'd"是列的最后一个值，那么你的解决方案不起作用，因为数组只返回一个元素，并且ValueError被提升：

text =  pd.DataFrame(["it", "never", "forget", "it", "hello", "listener's",
                      "books", "at", "cya", "in", "the", "village", "deliberate", 
                      "mistake", "hello", "again", "i'd", "seen", "the", "thing",
                      "and", "i'd"])

c_mask = text[0] == "i'd"
v_mask = c_mask.shift(fill_value=False)

print (text.loc[v_mask,0].values)
['seen']

check_c = pd.DataFrame()
check_c["contractions"] = text[c_mask]
check_c["followup"] = text.loc[v_mask,0].values

型
ValueError：值的长度（% 1）与索引的长度（% 2）不匹配
我建议先移位，然后过滤：

c_mask = text[0] == "i'd"

check_c = pd.DataFrame()
check_c["contractions"] = text.loc[c_mask, 0]
check_c["followup"] = text.shift(-1).loc[c_mask, 0]
print(check_c)
   contractions followup
16          i'd     seen
21          i'd     None

型

赞(0）回复(0）举报 2024-01-04

kmbjn2e33#

问题是shift方法返回一个Series，结果Series的数据类型将与原始Series相同。在您的情况下，c_mask和v_mask的数据类型都将是具有布尔值的pandas Series。
因此，你可以直接应用v_mask逻辑来过滤新列的移位，检查下面正确的代码。

import pandas as pd

text = pd.DataFrame(["it", "never", "forget", "it", "hello", "listener's", "books", "at", "cya", "in", "the", "village", "deliberate", "mistake", "hello", "again", "i'd", "seen", "the", "thing", "and", "i'd", "love", "to", "check"])

c_mask = text[0] == "i'd"
print(c_mask)

check_c = pd.DataFrame()
print(check_c)
check_c["contractions"] = text[0]
print(check_c)

# Shift the v_mask one more time to get the correct "followup" row
check_c["followup"] = text[0].shift(periods=-1, fill_value=False)

# Filter rows where "i'd" is present in "contractions"
result_df = check_c.loc[c_mask]

print(result_df)

字符串

赞(0）回复(0）举报 2024-01-04

我来回答

Python Pandas模块使用NaN填充DataFrame中的列，即使输入是来自另一个DataFrame的列

3条答案

相关问题

热门标签

最新问答