pandas python -将Panda Dataframe 字符串列转换为bigquery,SchemaField“TIMESTAMP”

x8diyxa7  于 2023-01-07  发布在  Python
关注(0)|答案(1)|浏览(89)

我正在尝试从python panda Dataframe加载BigQuery Table
csv文件包含以下内容:

t_time
2023-01-01 07:20:54.272000 UTC
2023-01-02 04:22:26.914000 UTC
2023-01-03 04:32:38.663000 UTC

bigQuery表有一个数据类型为TIMESTAMPt_time
架构:bigquery.SchemaField("t_time", "TIMESTAMP", mode="NULLABLE")
代码片段:

from google.cloud import bigquery
import pandas as pd
import ... 
client = bigquery.Client()

df=pd.read_csv("./my_times.csv",  header=1, names=['t_time'])   
print(f"> {df['t_time']}")
df.info()
job_config = bigquery.LoadJobConfig(
  schema = [
    bigquery.SchemaField("t_time", "TIMESTAMP"),
  ]
  write_disposition="WRITE_TRUNCATE",
)
client.load_table_from_dataframe(df, "myproj.mydataset.mytable", job_config=job_config).result()

输出:

0     2022-08-03 07:20:54.272000 UTC
    1     2022-08-04 04:22:26.914000 UTC
    2     2022-08-03 04:32:38.663000 UTC
Name: t_time, dtype: object
Error object of type <class 'str'> cannot be converted to int

问题出在bigquery.SchemaField("insert_timestamp", "TIMESTAMP"),中,我想知道为什么,因为我有其他时间戳格式的表和<date> <time> UTC的时间表。
我也尝试过将 Dataframe 列t_time转换为时间戳,但没有成功(不确定是否可以从该格式转换为时间戳)。
对于给予的CSV格式(使用UTC),使用数据类型timestampbigquery表的正确方法是什么?

mf98qq94

mf98qq941#

你能试试这个吗:

from google.cloud import bigquery
import pandas as pd
client = bigquery.Client()
 
df=pd.read_csv("./csv_t_time - Sheet1.csv",  header=1, names=['t_time'])  
print(f"> {df['t_time']}")
 
job_config = bigquery.job.LoadJobConfig(
schema = [
 
   bigquery.SchemaField("t_time", "TIMESTAMP"),
  
],autodetect=False,
   source_format=bigquery.SourceFormat.CSV,  write_disposition="WRITE_TRUNCATE",allow_quoted_newlines = True,
 
)
 
client.load_table_from_dataframe(df, "myproj.mydataset.mytable", job_config=job_config).result()

为了避免这个错误,你需要在job_config中设置source_format=bigquery.SourceFormat.CSV,并且在加载带有嵌入式换行符的CSV时,你需要指定allowQuotedNewlines=True

相关问题