Pandas to_gbq错误由于MacOS和Windows之间的类型不一致

brc7rcf0  于 2023-03-21  发布在  Mac
关注(0)|答案(1)|浏览(154)

我的Python/Pandas代码在我的MacOS上运行良好,但现在我已经将其移动到Windows,由于类型差异,它无法正常工作,并且在尝试写入gbq(Google Big Query)时出现错误:
代码如下所示:

def formatNumber(x):
    if math.isnan(x):
        f_number = 0.0
    else:
        f_number = str(round(x, 8))

    return f_number

... <reading df from file> ...

print("A")
print(df.info())
df['Date'] = [x.date().strftime("%Y-%m-%d") for x in df['Date']]
df['A'] = [formatNumber(x) for x in df['A']]

# drop duplicates
print(df.shape)
df = df.drop_duplicates()
print(df.shape)

# upload to bigquery
print("B")
print(df.info())

table_schema = [{
    'name': 'Date',
    'type': 'date'
}, {
    'name': 'A',
    'type': 'numeric'
}, {
    'name': 'B',
    'type': 'string'
}]

df.to_gbq('tablename',
                 'dbname',
                 chunksize=None,
                 if_exists='replace',
                 table_schema=table_schema,
                 credentials=credentials
                 )

输出为:

A
<class 'pandas.core.frame.DataFrame'>
Int64Index: 82624 entries, 0 to 9
Data columns (total 13 columns):
 #   Column                   Non-Null Count  Dtype
---  ------                   --------------  -----
 0   Date                     82624 non-null  datetime64[ns]
 1   A                        82624 non-null  float64
 2   B                        80769 non-null  object
 ...

dtypes: datetime64[ns](1), float64(6), object(6)
memory usage: 8.8+ MB
None
(82624, 13)
(82624, 13)

[5 rows x 13 columns]
B
<class 'pandas.core.frame.DataFrame'>
Int64Index: 82624 entries, 0 to 9
Data columns (total 13 columns):
 #   Column                   Non-Null Count  Dtype
---  ------                   --------------  -----
 0   Date                     82624 non-null  datetime64[ns]
 1   A                        82624 non-null  object
 2   B                        80769 non-null  object
 ...

dtypes: datetime64[ns](1), float64(6), object(6)
memory usage: 8.8+ MB

错误信息:

File "pyarrow\array.pxi", line 1044, in pyarrow.lib.Array.from_pandas
  File "pyarrow\array.pxi", line 316, in pyarrow.lib.array
  File "pyarrow\array.pxi", line 83, in pyarrow.lib._ndarray_to_array
  File "pyarrow\error.pxi", line 123, in pyarrow.lib.check_status
pyarrow.lib.ArrowTypeError: Expected bytes, got a 'datetime.time' object

我注意到在MacOS和Windows上运行它的另一个区别是MacOS上的索引更改,而Windows上没有任何更改。
MacOS操作系统:

  • A --〉Int 64索引:82624个条目,0到1015
  • B --〉范围索引:1016个条目,0到1015

窗口:

  • A和B --〉Int 64索引:82624个条目,0到9
jckbn6z7

jckbn6z71#

试图改变

df['Date'] = [x.date().strftime("%Y-%m-%d") for x in df['Date']]

df['Date'] = [x.date().strftime("%Y-%m-%d %Z") for x in df['Date']]

您收到的错误提示datetime.time对象和预期的bytes类型之间存在类型不兼容。这可能是由于macOS和Windows上datetime对象的strftime()方法的行为差异造成的。

相关问题