-query…”只插入一行而插入多行？

0pizxfdo 于 2021-05-29 发布在 Hadoop

关注(0)|答案(4)|浏览(427)

我正在用hadoop，sqoop，pig，flume获得“实践经验”。。。
在我的地方 mysql 我有一个表叫做 Employee 结构如下：

`emp_id` int(11) NOT NULL AUTO_INCREMENT
`first_name` varchar(30) NOT NULL
`last_name` varchar(30) NOT NULL
`create_date` datetime NOT NULL

employee表有四行。
我运行了以下sqoop命令：

sqoop --options-file import.txt \
--query "select 1 as emp_id, 'Barry' as first_name, 'Williams' as last_name, '2016-04-20 15:41:00' as create_date from test.Employee where \$CONDITIONS" \
--target-dir /user/<username>/Employee  \
--split-by emp_id \
-m 1

在 sqoop 命令 select ... 只有一行的数据。因此，只能插入一行。
试验结果 sqoop 命令：
当我运行以下命令时：

hdfs dfs -cat /user/<username>/Employee/part-m-00000

我知道了：

1,Barry,Williams,2016-04-20 15:41:00
1,Barry,Williams,2016-04-20 15:41:00
1,Barry,Williams,2016-04-20 15:41:00
1,Barry,Williams,2016-04-20 15:41:00

问题：

1) Why were four rows inserted instead of one?
2) Is it because there were four rows in the table when the `sqoop` command ran? 
3) Is this a bug?

提前谢谢。

hadoop mysql hdfs sqoop

来源：https://stackoverflow.com/questions/36753657/why-sqoop-import-query-inserts-multiple-rows-when-only-one-row-should-be

4条答案

按热度按时间

wooyq4lh1#

我不确定这是否是一个bug，但这很有趣，我从未尝试过以这种方式执行sqoop命令。 --split-by 指定的列（主键）由sqoop用于拆分工作单元。 -m 1 正在强制sqoop仅使用1个Map器。
您有一个自由形式的查询导入，基于查询sqoop应该只创建1行。我的假设是你们两个都有 --split-by & -m 1 传递给独家新闻的选项/参数；也许吧 --split-by 优先于 -m . 通常sqoop只在以下情况下使用4个Map器执行作业 -m 如果没有指定，我猜每个Map器在sql语句中创建了一行硬编码字段。
在不使用 --split-by 争论。

赞(0）回复(0）举报 2021-05-30

t98cgbkg2#

不，不是虫子。你以错误的方式询问。
您需要为sql查询添加限制。更新后的查询将如下所示：

sqoop --options-file import.txt \
--query "select 1 as emp_id, 'Barry' as first_name, 'Williams' as last_name, '2016-04-20 15:41:00' as create_date from test.Employee  LIMIT 1 where \$CONDITIONS" \
--target-dir /user/<username>/Employee  \
--split-by emp_id \
-m 1

赞(0）回复(0）举报 2021-05-30

gpnt7bae3#

sqoop运行良好。尝试对数据库运行此查询，您将看到输出将等于该表中的行数。

赞(0）回复(0）举报 2021-05-29

jogvjijk4#

我不知道你为什么会得到4张唱片。我的系统中只有一条记录。请在“选择”末尾添加限制1。。。查询where$条件后查看。希望这能奏效

赞(0）回复(0）举报 2021-05-29