我尝试使用Apache Beam Python管道读取云存储桶中的Excel文件,但它不起作用。我试着用Pandas阅读,但我不能使用Pcollection中的数据。
你知道怎么做吗?
def read_data_from_excel_file():
bucket_name = "nidec-ga-transient"
blob_name = "ConcessoesERestricoes.xlsx"
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(blob_name)
data_bytes = blob.download_as_bytes()
df = pd.read_excel(data_bytes, 'Lista de Gargalos')
return df
Pipeline = (
pipeline_load_data
| "Importar Dados CloudStorage" >> read_data_from_excel_file()
# | "Write_to_BQ" >> beam.io.WriteToBigQuery(
# tabela,
# schema=table_schema,
# write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND,
# create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
# custom_gcs_temp_location = 'gs://ddc-test-262213-staging/henrique.klock@dojo.technology/temp' )
)
当我运行代码时,我得到了这个错误:
TypeError:>>不支持的操作数类型:'str'和'NoneType'
1条答案
按热度按时间jxct1oxe1#
在Apache Beam Python管道中阅读Excel数据可以使用apache_beam.io.fileio模块来实现。此模块提供以并行和分布式方式读取文件的功能。