我使用的是气流版本2.4.3 composer 版本2.1.11
# this function creates a dictionary of sql file names and file paths from google cloud
def _file_dict(*op_args):
bucket=op_args[0]
path=op_args[1]
from google.cloud import storage
client = storage.Client()
blobs = client.list_blobs(bucket, prefix=path)
my_dict={}
for blob in blobs:
if(blob.name.endswith('.sql')):
my_dict[blob.name]=blob.path
print(my_dict.items())
return my_dict
# and this function creates a bigquery task for each sql file
def _sql_file(file_nm:str, file_path:str, dag=None):
hook = GCSHook()
print(f'bucket name: {dag_bucket}')
print(f'sql file name: {file_nm}')
print(f'sql file path: {file_path}')
try:
object_name = f'{file_path}/{file_nm}'
resp_byte = hook.download_as_byte_array(
bucket_name = dag_bucket,
object_name = object_name,
)
except:
e = sys.exc_info()
print(e)
sys.exit(1)
resp_string = resp_byte.decode("utf-8")
return BigQueryInsertJobOperator(
task_id=f'create_view_{file_nm}',
configuration={
"query": {
"query": f'{resp_string}',
"useLegacySql": False,
}
},
location='US',
dag=dag
)
dag = DAG(dag_name,
default_args=default_args,
catchup=True,
max_active_runs=1,
schedule_interval=schedule_interval,
description=dag_description,
render_template_as_native_obj=True,
)
make_file_dict = PythonOperator(
task_id='make_file_dict',
provide_context=True,
python_callable=_file_dict,
op_args=[ (dag_bucket),('view_definitions') ],
dag=dag
)
file_dict = "{{ task_instance.xcom_pull(task_ids='make_file_dict') }}"
run_sql_from_file = [
_sql_file(file_nm=k, file_path=v, dag=dag)
for k,v in file_dict.items()
]
make_file_dict>>run_sql_from_file
我期望xcom_pull得到一个字典,因为render_template_as_native_obj是true。airflow/python的一些方面我还不理解(即装饰器和jinja模板),我显然做错了什么,但我不确定如何使其工作。
如果我创建一个硬编码值的字典,它工作得很好。我不想这样做,有两个原因:1)我希望作业自动拾取任何推送到存储库中的.sql文件,2)有很多文件。
任何帮助都很感激。
误差是
Broken DAG: [/home/airflow/gcs/dags/basic_view_dag.py] Traceback (most recent call last):
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/home/airflow/gcs/dags/basic_view_dag.py", line 165, in <module>
for k,v in file_dict.items()
AttributeError: 'str' object has no attribute 'items'
我尝试过以下变化,但都失败了
ast.literal_eval("{{ task_instance.xcom_pull(task_ids='make_file_dict') }}")
json.loads("{{ task_instance.xcom_pull(task_ids='make_file_dict') }}")
def str_to_dict(string):
string = string.strip('{}')
string = string.strip('[]')
pairs = string.split(', ')
return {key[1:-2]: int(value) for key, value in (pair.split(': ') for pair in pairs)}
1条答案
按热度按时间a7qyws3x1#
您没有正确使用
file_dict
。尝试将其作为op_args
或op_kwargs
传递给ptyhon_callable。这将使airflow呈现jinja模板并将其作为文本传递给函数。