pandas 为什么更改列类型会导致此错误?

ryevplcw  于 2023-03-11  发布在  其他
关注(0)|答案(1)|浏览(133)
import pandas as pd
import numpy as np

df = pd.read_csv("dirtydata.csv")
dfn = df.convert_dtypes()
bike_sales_ds = dfn.copy()

# Create new age column with general age range groups
age_conditions = [
    (bike_sales_ds['Age'] <= 30),
    (bike_sales_ds['Age'] >= 31) & (bike_sales_ds['Age'] <= 40),
    (bike_sales_ds['Age'] >= 41) & (bike_sales_ds['Age'] <= 55),
    (bike_sales_ds['Age'] >= 56) & (bike_sales_ds['Age'] <= 69),
    (bike_sales_ds['Age'] >= 70)
                ]
age_choices = ['30 or Less', '31 to 40', '41 to 55', '56 to 69', '70 or Older']

bike_sales_ds['Age_Range'] = np.select(age_conditions, age_choices, default='error')

The dataset I'm working from
这个数据集不是我创建的,我是前阵子从youtube视频上得到的,视频不是关于Pandas的。
错误
追溯(最近调用最后调用):文件"C:\用户\dmcfa\PycharmProjects\自行车销售数据清理01\main.py",第43行,bike_sales_ds ['年龄范围']= www.example.com(年龄条件,年龄选择,默认值= 0)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^np.select应为布尔值ndarray invalid entry 0 in condlist: should be boolean ndarray
这就避免了我的错误:

df.convert_dtypes(convert_integer=False)

但是,首先是什么原因导致了这种情况呢?www.example.com()说,无论我使用df. convert_dtypes(),该列都是Int64。pd.info() says that the column is an Int64 whether I use df.convert_dtypes().

9o685dep

9o685dep1#

你的代码在我的输入 Dataframe 中运行良好,但是,你可以使用pd.cut来检查问题是否仍然存在:

age_conditions = [0, 30, 40, 55, 69, np.inf]
age_choices = ['30 or Less', '31 to 40', '41 to 55', '56 to 69', '70 or Older']

bike_sales_ds['Age_Range'] = pd.cut(bike_sales_ds['Age'],
                                    bins=age_conditions,
                                    labels=age_choices)

输出:

>>> bike_sales_ds
    Age    Age_Range
0    87  70 or Older
1    25   30 or Less
2    70  70 or Older
3    55     41 to 55
4    33     31 to 40
..  ...          ...
95   89  70 or Older
96   79  70 or Older
97   67     56 to 69
98   71  70 or Older
99   78  70 or Older

[100 rows x 2 columns]

输入:

import pandas as pd
import numpy as np

np.random.seed(2023)
bike_sales_ds = pd.DataFrame({'Age': np.random.randint(0, 100, 100)})

相关问题