我正在使用GCP composer运行一个算法,在流的末尾,我想运行一个任务,该任务将执行几个操作从卷复制和删除文件和文件夹到存储桶我试图通过kubernetespodoperator
执行这些复制和删除操作。我很难找到正确的方法来运行几个命令使用“cmds”我还尝试使用“cmds”与“arguments”下面是我尝试的KubernetesPodOperator
以及cmds和arguments的组合:
post_algo_run = kubernetes_pod_operator.KubernetesPodOperator(
task_id="multi-coher-post-operations",
name="multi-coher-post-operations",
namespace="default",
image="google/cloud-sdk:alpine",
### doesn't work ###
cmds=["gsutil", "cp", "/data/splitter-output\*.csv", "gs://my_bucket/data" , "&" , "gsutil", "rm", "-r", "/input"],
#Error:
#[2022-01-27 09:31:38,407] {pod_manager.py:197} INFO - CommandException: Destination URL must name a directory, bucket, or bucket
#[2022-01-27 09:31:38,408] {pod_manager.py:197} INFO - subdirectory for the multiple source form of the cp command.
####################
### doesn't work ###
# cmds=["gsutil", "cp", "/data/splitter-output\*.csv", "gs://my_bucket/data ;","gsutil", "rm", "-r", "/input"],
# [2022-01-27 09:34:06,865] {pod_manager.py:197} INFO - CommandException: Destination URL must name a directory, bucket, or bucket
# [2022-01-27 09:34:06,866] {pod_manager.py:197} INFO - subdirectory for the multiple source form of the cp command.
####################
### only preform the first command - only copying ###
# cmds=["bash", "-cx"],
# arguments=["gsutil cp /data/splitter-output\*.csv gs://my_bucket/data","gsutil rm -r /input"],
# [2022-01-27 09:36:09,164] {pod_manager.py:197} INFO - + gsutil cp '/data/splitter-output*.csv' gs://my_bucket/data
# [2022-01-27 09:36:11,200] {pod_manager.py:197} INFO - Copying file:///data/splitter-output\Coherence Results-26-Jan-2022-1025Part1.csv [Content-Type=text/csv]...
# [2022-01-27 09:36:11,300] {pod_manager.py:197} INFO - / [0 files][ 0.0 B/ 93.0 KiB]
# / [1 files][ 93.0 KiB/ 93.0 KiB]
# [2022-01-27 09:36:11,302] {pod_manager.py:197} INFO - Operation completed over 1 objects/93.0 KiB.
# [20 22-01-27 09:36:12,317] {kubernetes_pod.py:459} INFO - Deleting pod: multi-coher-post-operations.d66b4c91c9024bd289171c4d3ce35fdd
####################
volumes=[
Volume(
name="nfs-pvc",
configs={
"persistentVolumeClaim": {"claimName": "nfs-pvc"}
},
)
],
volume_mounts=[
VolumeMount(
name="nfs-pvc",
mount_path="/data/",
sub_path=None,
read_only=False,
)
],
)
字符串
2条答案
按热度按时间ffscu2ro1#
我发现了一种运行多个命令的技术。首先,我找到了Kubernetepodoperator cmds和Docker的ENTRYPOINT和CMD的arguments属性之间的关系。
Kubernetespodoperator cmds覆盖docker原始ENTRYPOINT,Kubernetespodoperator参数等同于docker的CMD。
因此,为了从Kubernetespodoperator运行多个命令,我使用了以下语法:我设置了Kubernetespodoperator cmds,以使用-c运行bash:
字符串
我设置了Kubernetespodoperator参数来运行两个echo命令,用&分隔:
型
我的Kubernetespodoperator看起来像这样:
型
omjgkv6w2#
对于你的第一个命令,你需要确保在你的docker中,你能够到达允许你找到文件
/data/splitter-output\*.csv
的工作目录。[“gsutil”,“cp”,“/data/splitter-output*.csv”,“gs://my_bucket/data”]
你可以使用docker
RUN
在你的docker镜像上测试你的命令,这样你就可以验证你是否正确地提供了命令。在第二条语句中,如果你引用了docker镜像中的路径,请再次使用
run
进行测试。如果你引用的是google storage,则必须提供完整的路径。[“gsutil”,“rm”,“-r”,“/input”]
值得一提的是,
ENTRYPOINT
将在容器开始运行时运行,如了解cmd和entrypoint如何交互所述。正如评论中提到的,如果你看一下代码cmds
,它将替换Docker映像ENTRYPOINT
。它还建议遵循Define a Command and Arguments for a Container的指导方针