您可以在转换中利用union_many(更详细的example here)并手动列出其输入。请注意,您可以使用数据沿袭快速复制粘贴所有数据集(select datasets > top right > "view histogram" icon > "copy paths")的路径。
from transforms.api import transform_df, Input, Output
from transforms.verbs import dataframes as D
@transform_df(
Output("/path/to/dataset/unioned"),
source_df_1=Input("/path/to/dataset/one"),
source_df_2=Input("/path/to/dataset/two"),
source_df_3=Input("/path/to/dataset/three"),
)
def compute(source_df_1, source_df_2, source_df_3):
return D.union_many(
source_df_1,
source_df_2,
source_df_3,
)
同样的方式,但更容易复制粘贴,您可以参数化您的转换使用一个数组的路径作为输入
from transforms.verbs import dataframes as D
from transforms.api import transform_df, Input, Output
# Configure the number of datasets to generate
list_datasets_paths = [
"/path/to/dataset/one",
"/path/to/dataset/two",
"/path/to/dataset/three"]
# Convert the list of paths in a dict of Input()
input_dict = {}
for dataset_path in list_datasets_paths:
input_dict[dataset_path.split("/")[-1]] = Input(dataset_path)
# Provide the dict of Input() to the transform
@transform_df(
Output("/path/to/dataset/unioned"),
**input_dict
)
def compute_2(**inputs_dataframes):
# Create a list of dataframes from the input dict
dataframes_list = inputs_dataframes.values()
# Union the list of dataframes
return D.union_many(*dataframes_list)
1条答案
按热度按时间xxe27gdn1#
更专业的是:
union_many
(更详细的example here)并手动列出其输入。请注意,您可以使用数据沿袭快速复制粘贴所有数据集(select datasets > top right > "view histogram" icon > "copy paths"
)的路径。注意:您还可以使用其他工具来构建管道,从而构建联合数据集,如Pipeline Builder/docs。