I have a lot of files to copy on HDFS and I encounter the maximum argument list limit of the operating system. A work around that currently works is to generate a single command for a single file to process. However, that requires time.
I am trying to work with xargs to get around the argument limit and reduce processing time. But I am not able to make it work.
Here is the current situation.
I echo (because I have read somewhere that echo is not subject to argument limit) the file names and pipe them to xarg.
echo "/user/florian_castelain/test/yolo /user/florian_castelain/ignore_dtl" | xargs -I % hdfs dfs -cp -p % /user/florian_castelain/test/xargs/
However this throws:
cp: `/user/florian_castelain/test/yolo /user/florian_castelain/ignore_dtl': No such file or directory
Based on this example , I tried with:
echo "/user/florian_castelain/test/yolo" "/user/florian_castelain/ignore_dtl" | xargs -0 -I % hdfs dfs -cp -p % /user/florian_castelain/test/xargs/
Which prints:
cp: /user/florian_castelain/test/yolo /user/florian_castelain/ignore_dtl But no file has been copied at all. How can I use
xargcombined with the
hdfs dfs -cp` command to handle the copy of multiple files at once ?
- Hadoop 2.6.0-cdh5.13.0
Edit 1
With the verbose flag and this config', I have the following output:
echo "/user/florian_castelain/test/yolo /user/florian_castelain/ignore_dtl" | xargs -I % -t hdfs dfs -cp -p % /user/florian_castelain/test/xargs/
hdfs dfs -cp -p /user/florian_castelain/test/yolo /user/florian_castelain/ignore_dtl /user/florian_castelain/test/xargs/
Which throws:
cp: `/user/florian_castelain/test/yolo /user/florian_castelain/ignore_dtl': No such file or directory
While executing this command manually works fine. Why is that ?
Edit 2
Based on jjo answer, I tried the following:
printf "%s\n" /user/florian_castelain/test/yolo /user/florian_castelain/ignore_dtl | xargs -0 -t -I % hdfs dfs -cp -p % /user/florian_castelain/test/xargs/
Which prints:
hdfs dfs -cp -p /user/florian_castelain/test/yolo
/user/florian_castelain/ignore_dtl
/user/florian_castelain/test/xargs/
And does not copy anything.
So I tried removing new line character before passing to xargs:
printf "%s\n" /user/florian_castelain/test/yolo /user/florian_castelain/ignore_dtl | tr -d "\n" | xargs -0 -t -I % hdfs dfs -cp -p % /user/florian_castelain/test/xargs/
Which prints:
hdfs dfs -cp -p /user/florian_castelain/test/yolo/user/florian_castelain/ignore_dtl /user/florian_castelain/test/xargs/
But nothing is copied as well. :(
1条答案
按热度按时间eivgtgni1#
我看到您所面临的问题是
yolo
中的空格,加上xargs,这些xargs将stdin条目作为 * 由换行符 * 分隔。由于您的文件是本地的,因此您应该利用
find -0 | xargs -0
,例如:如果你仍然需要/想要用“空格分隔的文件名”来填充xargs,请使用
printf "%s\n"
(它也是bash
中的一个内置函数,就像echo
一样),这样 * 每个 * 文件都会在输出时在以下内容之间加一个换行符: