我尝试在数据集(csv文件)上创建一个新列,该列组合了先前存在的列的内容。
import numpy as np
import pandas as pd
df = pd.read_csv('books.csv', encoding='unicode_escape', error_bad_lines=False)
#List of columns to keep
columns =['title', 'authors', 'publisher']
#Function to combine the columns/features
def combine_features(data):
features = []
for i in range(0, data.shape[0]):
features.append( data['title'][i] +' '+data['authors'][i]+' '+data['publisher'][i])
return features
#Column to store the combined features
df['combined_features'] =combine_features(df)
#Show data
df
我希望发现新列将使用标题、作者和出版商一起创建,但是我收到了错误“ValueError:值(1)的长度与索引(11123)"的长度不匹配。
为了解决这个问题,尝试使用命令“df.reset_index(inplace=True,drop=True)”,这是一个建议的解决方案,但不起作用,我仍然收到相同的错误。
下面是完整的错误消息:
ValueError Traceback (most recent call last)
<ipython-input-24-40cc76d3cd85> in <module>
1 #Create a column to store the combined features
----> 2 df['combined_features'] =combine_features(df)
3 df
3 frames
/usr/local/lib/python3.8/dist-packages/pandas/core/frame.py in __setitem__(self, key, value)
3610 else:
3611 # set column
-> 3612 self._set_item(key, value)
3613
3614 def _setitem_slice(self, key: slice, value):
/usr/local/lib/python3.8/dist-packages/pandas/core/frame.py in _set_item(self, key, value)
3782 ensure homogeneity.
3783 """
-> 3784 value = self._sanitize_column(value)
3785
3786 if (
/usr/local/lib/python3.8/dist-packages/pandas/core/frame.py in _sanitize_column(self, value)
4507
4508 if is_list_like(value):
-> 4509 com.require_length_match(value, self.index)
4510 return sanitize_array(value, self.index, copy=True, allow_2d=True)
4511
/usr/local/lib/python3.8/dist-packages/pandas/core/common.py in require_length_match(data, index)
529 """
530 if len(data) != len(index):
--> 531 raise ValueError(
532 "Length of values "
533 f"({len(data)}) "
ValueError: Length of values (1) does not match length of index (11123)
1条答案
按热度按时间bqujaahr1#
原因是函数中的
return
语句不应该在for循环中,因为它在for循环中,它在一次迭代后已经返回,所以值的长度是1,而不是11123。取消缩进return
一次。