有人能帮助我在sqoop增量模式中append和lastmodified模式的确切区别吗?当使用append模式(使用--check列作为timestamp)可以完成相同的操作时,lastmodified需要做什么?它的工作原理相同,并导入更新的和插入的记录。
ie3xauqp1#
模式:append可用于知道最后一个值的列。mode:lastmodified模式可用于timestamp列。可能很难记住上次修改的时间戳。如果你知道上次修改的时间戳,你可以直接使用第一种方法。模式:追加
mysql> describe emp;+--------+--------------+------+-----+---------+-------+| Field | Type | Null | Key | Default | Extra |+--------+--------------+------+-----+---------+-------+| id | int(11) | YES | | NULL | || name | varchar(100) | YES | | NULL | || deg | varchar(100) | YES | | NULL | || salary | int(11) | YES | | NULL | || dep | varchar(10) | YES | | NULL | |+--------+--------------+------+-----+---------+-------+5 rows in set (0.00 sec)
mysql> describe emp;
+--------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
| id | int(11) | YES | | NULL | |
| name | varchar(100) | YES | | NULL | |
| deg | varchar(100) | YES | | NULL | |
| salary | int(11) | YES | | NULL | |
| dep | varchar(10) | YES | | NULL | |
5 rows in set (0.00 sec)
使用sqoop命令导入数据。
sqoop import \--connect jdbc:mysql://quickstart:3306/retail_db \--username retail_dba \--password cloudera \-m 3 \--table emp \--split-by id \--columns id,name,deg \--warehouse-dir /user/sqoop/ \--delete-target-dir \--as-textfile
sqoop import \
--connect jdbc:mysql://quickstart:3306/retail_db \
--username retail_dba \
--password cloudera \
-m 3 \
--table emp \
--split-by id \
--columns id,name,deg \
--warehouse-dir /user/sqoop/ \
--delete-target-dir \
--as-textfile
hdfs输出
[cloudera@quickstart lib]$ hadoop fs -ls -R /user/sqoop/drwxr-xr-x - cloudera supergroup 0 2017-12-02 13:14 /user/sqoop/emp-rw-r--r-- 1 cloudera supergroup 0 2017-12-02 13:14 /user/sqoop/emp/_SUCCESS-rw-r--r-- 1 cloudera supergroup 70 2017-12-02 13:14 /user/sqoop/emp/part-m-00000-rw-r--r-- 1 cloudera supergroup 64 2017-12-02 13:14 /user/sqoop/emp/part-m-00001-rw-r--r-- 1 cloudera supergroup 86 2017-12-02 13:14 /user/sqoop/emp/part-m-00002mysql> select * from emp;+------+---------+--------------+--------+------+| id | name | deg | salary | dep |+------+---------+--------------+--------+------+| 1201 | gopal | manager | 50000 | tp || 1202 | manisha | Proof reader | 50000 | TP || 1203 | php dev | TECH WRITER | 50000 | AC || 1204 | Nilesh | Domino dev | 70000 | AF || 1205 | Vinayak | Java dev | 50000 | IT || 1206 | Amish | Cog dev | 60000 | IT || 1207 | Jatin | Oracel dev | 40001 | IT || 1208 | Viren | Java dev | 70004 | IT || 1209 | Ashish | Oracel dev | 40001 | IT || 1210 | Satish | Java dev | 70004 | IT |+------+---------+--------------+--------+------+10 rows in set (0.00 sec)
[cloudera@quickstart lib]$ hadoop fs -ls -R /user/sqoop/
drwxr-xr-x - cloudera supergroup 0 2017-12-02 13:14 /user/sqoop/emp
-rw-r--r-- 1 cloudera supergroup 0 2017-12-02 13:14 /user/sqoop/emp/_SUCCESS
-rw-r--r-- 1 cloudera supergroup 70 2017-12-02 13:14 /user/sqoop/emp/part-m-00000
-rw-r--r-- 1 cloudera supergroup 64 2017-12-02 13:14 /user/sqoop/emp/part-m-00001
-rw-r--r-- 1 cloudera supergroup 86 2017-12-02 13:14 /user/sqoop/emp/part-m-00002
mysql> select * from emp;
+------+---------+--------------+--------+------+
| id | name | deg | salary | dep |
| 1201 | gopal | manager | 50000 | tp |
| 1202 | manisha | Proof reader | 50000 | TP |
| 1203 | php dev | TECH WRITER | 50000 | AC |
| 1204 | Nilesh | Domino dev | 70000 | AF |
| 1205 | Vinayak | Java dev | 50000 | IT |
| 1206 | Amish | Cog dev | 60000 | IT |
| 1207 | Jatin | Oracel dev | 40001 | IT |
| 1208 | Viren | Java dev | 70004 | IT |
| 1209 | Ashish | Oracel dev | 40001 | IT |
| 1210 | Satish | Java dev | 70004 | IT |
10 rows in set (0.00 sec)
在表中插入新记录。
mysql> insert into emp values(1211,'Jag', 'be', 20000, 'IT');Query OK, 1 row affected (0.03 sec)mysql> select * from emp;+------+---------+--------------+--------+------+| id | name | deg | salary | dep |+------+---------+--------------+--------+------+| 1201 | gopal | manager | 50000 | tp || 1202 | manisha | Proof reader | 50000 | TP || 1203 | php dev | TECH WRITER | 50000 | AC || 1204 | Nilesh | Domino dev | 70000 | AF || 1205 | Vinayak | Java dev | 50000 | IT || 1206 | Amish | Cog dev | 60000 | IT || 1207 | Jatin | Oracel dev | 40001 | IT || 1208 | Viren | Java dev | 70004 | IT || 1209 | Ashish | Oracel dev | 40001 | IT || 1210 | Satish | Java dev | 70004 | IT || 1211 | Jag | be | 20000 | IT |+------+---------+--------------+--------+------+11 rows in set (0.00 sec)
mysql> insert into emp values(1211,'Jag', 'be', 20000, 'IT');
Query OK, 1 row affected (0.03 sec)
| 1211 | Jag | be | 20000 | IT |
11 rows in set (0.00 sec)
增量导入命令
sqoop import \--connect jdbc:mysql://quickstart:3306/retail_db \--username retail_dba \--password cloudera \--table emp \--split-by id \--check-column id \--incremental append \--last-value 1210 \--warehouse-dir /user/sqoop/ \--as-textfile
--check-column id \
--incremental append \
--last-value 1210 \
导入后
[cloudera@quickstart lib]$ hadoop fs -ls -R /user/sqoop/emp-rw-r--r-- 1 cloudera supergroup 0 2017-12-02 13:14 /user/sqoop/emp/_SUCCESS-rw-r--r-- 1 cloudera supergroup 70 2017-12-02 13:14 /user/sqoop/emp/part-m-00000-rw-r--r-- 1 cloudera supergroup 64 2017-12-02 13:14 /user/sqoop/emp/part-m-00001-rw-r--r-- 1 cloudera supergroup 86 2017-12-02 13:14 /user/sqoop/emp/part-m-00002-rw-r--r-- 1 cloudera cloudera 21 2017-12-02 13:48 /user/sqoop/emp/part-m-00003[cloudera@quickstart lib]$ hadoop fs -cat /user/sqoop/emp/part-m-000031211,Jag,be,20000,IT
[cloudera@quickstart lib]$ hadoop fs -ls -R /user/sqoop/emp
-rw-r--r-- 1 cloudera cloudera 21 2017-12-02 13:48 /user/sqoop/emp/part-m-00003
[cloudera@quickstart lib]$ hadoop fs -cat /user/sqoop/emp/part-m-00003
1211,Jag,be,20000,IT
模式:上次修改
mysql> describe orders;+-------------------+-------------+------+-----+---------+----------------+| Field | Type | Null | Key | Default | Extra |+-------------------+-------------+------+-----+---------+----------------+| order_id | int(11) | NO | PRI | NULL | auto_increment || order_date | datetime | NO | | NULL | || order_customer_id | int(11) | NO | | NULL | || order_status | varchar(45) | NO | | NULL | |+-------------------+-------------+------+-----+---------+----------------+4 rows in set (0.00 sec)
mysql> describe orders;
+-------------------+-------------+------+-----+---------+----------------+
| order_id | int(11) | NO | PRI | NULL | auto_increment |
| order_date | datetime | NO | | NULL | |
| order_customer_id | int(11) | NO | | NULL | |
| order_status | varchar(45) | NO | | NULL | |
4 rows in set (0.00 sec)
将订单导入hdfs
sqoop import \--connect jdbc:mysql://quickstart:3306/retail_db \--username retail_dba \--password cloudera \--table orders \--split-by order_id \--target-dir /user/sqoop/orders \--as-textfile
--table orders \
--split-by order_id \
--target-dir /user/sqoop/orders \
[cloudera@quickstart lib]$ hadoop fs -ls -R /user/sqoop/orders-rw-r--r-- 1 cloudera supergroup 0 2017-12-02 16:01 /user/sqoop/orders/_SUCCESS-rw-r--r-- 1 cloudera supergroup 741597 2017-12-02 16:01 /user/sqoop/orders/part-m-00000-rw-r--r-- 1 cloudera supergroup 753022 2017-12-02 16:01 /user/sqoop/orders/part-m-00001-rw-r--r-- 1 cloudera supergroup 752368 2017-12-02 16:01 /user/sqoop/orders/part-m-00002-rw-r--r-- 1 cloudera supergroup 752940 2017-12-02 16:01 /user/sqoop/orders/part-m-00003
[cloudera@quickstart lib]$ hadoop fs -ls -R /user/sqoop/orders
-rw-r--r-- 1 cloudera supergroup 0 2017-12-02 16:01 /user/sqoop/orders/_SUCCESS
-rw-r--r-- 1 cloudera supergroup 741597 2017-12-02 16:01 /user/sqoop/orders/part-m-00000
-rw-r--r-- 1 cloudera supergroup 753022 2017-12-02 16:01 /user/sqoop/orders/part-m-00001
-rw-r--r-- 1 cloudera supergroup 752368 2017-12-02 16:01 /user/sqoop/orders/part-m-00002
-rw-r--r-- 1 cloudera supergroup 752940 2017-12-02 16:01 /user/sqoop/orders/part-m-00003
更新订单数据
mysql> select * from orders where order_id=10;+----------+---------------------+-------------------+-----------------+| order_id | order_date | order_customer_id | order_status |+----------+---------------------+-------------------+-----------------+| 10 | 2013-07-25 00:00:00 | 5648 | PENDING_PAYMENT |+----------+---------------------+-------------------+-----------------+1 row in set (0.00 sec)mysql> update orders set order_status='CLOSED', order_date=now() where order_id=10;Query OK, 1 row affected (0.01 sec)Rows matched: 1 Changed: 1 Warnings: 0mysql> select * from orders where order_id=10;+----------+---------------------+-------------------+--------------+| order_id | order_date | order_customer_id | order_status |+----------+---------------------+-------------------+--------------+| 10 | 2017-12-02 16:19:23 | 5648 | CLOSED |+----------+---------------------+-------------------+--------------+1 row in set (0.00 sec)
mysql> select * from orders where order_id=10;
+----------+---------------------+-------------------+-----------------+
| order_id | order_date | order_customer_id | order_status |
| 10 | 2013-07-25 00:00:00 | 5648 | PENDING_PAYMENT |
1 row in set (0.00 sec)
mysql> update orders set order_status='CLOSED', order_date=now() where order_id=10;
Query OK, 1 row affected (0.01 sec)
Rows matched: 1 Changed: 1 Warnings: 0
+----------+---------------------+-------------------+--------------+
| 10 | 2017-12-02 16:19:23 | 5648 | CLOSED |
导入其他数据
sqoop import \--connect jdbc:mysql://quickstart:3306/retail_db \--username retail_dba \--password cloudera \--table orders \--split-by order_id \--check-column order_date \--merge-key order_id \--incremental lastmodified \--target-dir /user/sqoop/orders1 \--as-textfile
--check-column order_date \
--merge-key order_id \
--incremental lastmodified \
--target-dir /user/sqoop/orders1 \
输出
[cloudera@quickstart lib]$ hadoop fs -ls -R /user/sqoop/orders1-rw-r--r-- 1 cloudera cloudera 0 2017-12-02 16:07 /user/sqoop/orders1/_SUCCESS-rw-r--r-- 1 cloudera cloudera 2999918 2017-12-02 16:07 /user/sqoop/orders1/part-r-00000
[cloudera@quickstart lib]$ hadoop fs -ls -R /user/sqoop/orders1
-rw-r--r-- 1 cloudera cloudera 0 2017-12-02 16:07 /user/sqoop/orders1/_SUCCESS
-rw-r--r-- 1 cloudera cloudera 2999918 2017-12-02 16:07 /user/sqoop/orders1/part-r-00000
注意:如果我们使用的目录(订单)与以前的相同,那么它将删除旧文件并创建新零件。
1条答案
按热度按时间ie3xauqp1#
模式:append可用于知道最后一个值的列。
mode:lastmodified模式可用于timestamp列。可能很难记住上次修改的时间戳。如果你知道上次修改的时间戳,你可以直接使用第一种方法。
模式:追加
使用sqoop命令导入数据。
hdfs输出
在表中插入新记录。
增量导入命令
导入后
模式:上次修改
将订单导入hdfs
导入后
更新订单数据
导入其他数据
输出
注意:如果我们使用的目录(订单)与以前的相同,那么它将删除旧文件并创建新零件。