如何使用hdfs dfs cp和xargs来解决linux参数列表限制?

v64noz0r  于 2022-12-09  发布在  HDFS
关注(0)|答案(1)|浏览(224)

I have a lot of files to copy on HDFS and I encounter the maximum argument list limit of the operating system. A work around that currently works is to generate a single command for a single file to process. However, that requires time.
I am trying to work with xargs to get around the argument limit and reduce processing time. But I am not able to make it work.
Here is the current situation.
I echo (because I have read somewhere that echo is not subject to argument limit) the file names and pipe them to xarg.

echo "/user/florian_castelain/test/yolo /user/florian_castelain/ignore_dtl" | xargs -I %  hdfs dfs -cp -p % /user/florian_castelain/test/xargs/

However this throws:
cp: `/user/florian_castelain/test/yolo /user/florian_castelain/ignore_dtl': No such file or directory
Based on this example , I tried with:

echo "/user/florian_castelain/test/yolo" "/user/florian_castelain/ignore_dtl" | xargs -0 -I %  hdfs dfs -cp -p % /user/florian_castelain/test/xargs/

Which prints:
cp: /user/florian_castelain/test/yolo /user/florian_castelain/ignore_dtl But no file has been copied at all. How can I usexargcombined with thehdfs dfs -cp` command to handle the copy of multiple files at once ?

  • Hadoop 2.6.0-cdh5.13.0

Edit 1

With the verbose flag and this config', I have the following output:

echo "/user/florian_castelain/test/yolo /user/florian_castelain/ignore_dtl" | xargs -I %  -t  hdfs dfs -cp -p % /user/florian_castelain/test/xargs/
hdfs dfs -cp -p /user/florian_castelain/test/yolo /user/florian_castelain/ignore_dtl /user/florian_castelain/test/xargs/

Which throws:
cp: `/user/florian_castelain/test/yolo /user/florian_castelain/ignore_dtl': No such file or directory
While executing this command manually works fine. Why is that ?

Edit 2

Based on jjo answer, I tried the following:

printf "%s\n" /user/florian_castelain/test/yolo /user/florian_castelain/ignore_dtl | xargs -0 -t -I % hdfs dfs -cp -p % /user/florian_castelain/test/xargs/

Which prints:

hdfs dfs -cp -p /user/florian_castelain/test/yolo
/user/florian_castelain/ignore_dtl
 /user/florian_castelain/test/xargs/

And does not copy anything.
So I tried removing new line character before passing to xargs:

printf "%s\n" /user/florian_castelain/test/yolo /user/florian_castelain/ignore_dtl | tr -d "\n"  | xargs -0 -t -I % hdfs dfs -cp -p % /user/florian_castelain/test/xargs/

Which prints:

hdfs dfs -cp -p /user/florian_castelain/test/yolo/user/florian_castelain/ignore_dtl /user/florian_castelain/test/xargs/

But nothing is copied as well. :(

eivgtgni

eivgtgni1#

我看到您所面临的问题是yolo中的空格,加上xargs,这些xargs将stdin条目作为 * 由换行符 * 分隔。
由于您的文件是本地的,因此您应该利用find -0 | xargs -0,例如:

find /user/florian_castelain/foo/bar -type f -0 | xargs -0 -I hdfs dfs -cp -p % /some/dst

如果你仍然需要/想要用“空格分隔的文件名”来填充xargs,请使用printf "%s\n"(它也是bash中的一个内置函数,就像echo一样),这样 * 每个 * 文件都会在输出时在以下内容之间加一个换行符:

printf "%s\n" /user/florian_castelain/test/yolo /user/florian_castelain/ignore_dtl | xargs -I %  hdfs dfs -cp -p % /some/dst

相关问题