python-3.x 如何将CSV文件从函数属性读取到pandas框架中?

oxiaedzo  于 2023-11-20  发布在  Python
关注(0)|答案(1)|浏览(103)

我写了一个函数来计算一个数组的日志返回值。函数的参数接受csv文件名,应该返回一个csv文件的日志返回值的数组。csv文件已经位于我的机器上。下面是我试图执行的代码。

import pandas as pd
import numpy as np

def portfolio_log_returns(portfolio):
    dataset = pd.read_csv(portfolio)
    log_returns = pd.DataFrame(columns=dataset.columns)

    for col in dataset.columns:
        log_returns[col] = np.log(dataset[col]/dataset[col].shift(1))
    
    log_returns = log_returns.dropna()
    return log_returns

log_returns_df = portfolio_log_returns('some_csv_file.csv')

字符串
当我尝试执行代码时,我得到以下错误:

log_returns_df = portfolio_log_returns('some_csv_file.csv')
Traceback (most recent call last):

  File D:\Users\Mahmoud\anaconda3\Lib\site-packages\pandas\core\ops\array_ops.py:171 in _na_arithmetic_op
result = func(left, right)

  File D:\Users\Mahmoud\anaconda3\Lib\site-packages\pandas\core\computation\expressions.py:239 in evaluate
return _evaluate(op, op_str, a, b)  # type: ignore[misc]

  File D:\Users\Mahmoud\anaconda3\Lib\site-packages\pandas\core\computation\expressions.py:128 in _evaluate_numexpr
result = _evaluate_standard(op, op_str, a, b)

  File D:\Users\Mahmoud\anaconda3\Lib\site-packages\pandas\core\computation\expressions.py:70 in _evaluate_standard
return op(a, b)

TypeError: unsupported operand type(s) for /: 'str' and 'NoneType'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

  Cell In[4], line 1
log_returns_df = portfolio_log_returns('some_csv_file.csv')

  Cell In[1], line 9 in portfolio_log_returns
log_returns[col] = np.log(dataset[col]/dataset[col].shift(1))

  File D:\Users\Mahmoud\anaconda3\Lib\site-packages\pandas\core\ops\common.py:81 in new_method
return method(self, other)

  File D:\Users\Mahmoud\anaconda3\Lib\site-packages\pandas\core\arraylike.py:210 in __truediv__
return self._arith_method(other, operator.truediv)

  File D:\Users\Mahmoud\anaconda3\Lib\site-packages\pandas\core\series.py:6112 in _arith_method
return base.IndexOpsMixin._arith_method(self, other, op)

  File D:\Users\Mahmoud\anaconda3\Lib\site-packages\pandas\core\base.py:1348 in _arith_method
result = ops.arithmetic_op(lvalues, rvalues, op)

  File D:\Users\Mahmoud\anaconda3\Lib\site-packages\pandas\core\ops\array_ops.py:232 in arithmetic_op
res_values = _na_arithmetic_op(left, right, op)  # type: ignore[arg-type]

  File D:\Users\Mahmoud\anaconda3\Lib\site-packages\pandas\core\ops\array_ops.py:178 in _na_arithmetic_op
result = _masked_arith_op(left, right, op)

  File D:\Users\Mahmoud\anaconda3\Lib\site-packages\pandas\core\ops\array_ops.py:116 in _masked_arith_op
result[mask] = op(xrav[mask], yrav[mask])

TypeError: unsupported operand type(s) for /: 'str' and 'str'


问题是什么,我如何解决它?

3phpmpom

3phpmpom1#

我能够用CSV复制您的问题,其中包含以下数据:

value1,value2,value3
1,2.2,3.3
4,5.5,"6.6"
8,"9.9a",10.1
28,"9.97",10.16
38,"19.9",106.1

字符串
由于Pandas无法将"9.9a"转换为数字,因此它将整个字段转换为object类型,而不是int64float64。您可以通过将print(col, dataset[col].dtype)添加到循环中来找出哪个字段。
As suggested in this answer如果你只想跳过这些,你可以在有问题的列上添加一个pd.to_numeric,或者在所有列上添加一个pd.to_numeric,因为你要在所有列上执行命令。一个可行的解决方案可能如下所示:

import pandas as pd
import numpy as np

def portfolio_log_returns(portfolio):
    dataset = pd.read_csv(portfolio)
    log_returns = pd.DataFrame(columns=dataset.columns)

    for col in dataset.columns:
        # print(col, dataset[col].dtype)
        dataset[col] = pd.to_numeric(dataset[col], errors='coerce')
        log_returns[col] = np.log(dataset[col]/dataset[col].shift(1))
    
    log_returns = log_returns.dropna()
    return log_returns

log_returns_df = portfolio_log_returns('some_csv_file.csv')

相关问题