带有timestamp列的sqoop incremental import append模式

jum4pzuy  于 2021-06-03  发布在  Sqoop
关注(0)|答案(1)|浏览(426)

有人能帮助我在sqoop增量模式中append和lastmodified模式的确切区别吗?
当使用append模式(使用--check列作为timestamp)可以完成相同的操作时,lastmodified需要做什么?它的工作原理相同,并导入更新的和插入的记录。

ie3xauqp

ie3xauqp1#

模式:append可用于知道最后一个值的列。
mode:lastmodified模式可用于timestamp列。可能很难记住上次修改的时间戳。如果你知道上次修改的时间戳,你可以直接使用第一种方法。
模式:追加

  1. mysql> describe emp;
  2. +--------+--------------+------+-----+---------+-------+
  3. | Field | Type | Null | Key | Default | Extra |
  4. +--------+--------------+------+-----+---------+-------+
  5. | id | int(11) | YES | | NULL | |
  6. | name | varchar(100) | YES | | NULL | |
  7. | deg | varchar(100) | YES | | NULL | |
  8. | salary | int(11) | YES | | NULL | |
  9. | dep | varchar(10) | YES | | NULL | |
  10. +--------+--------------+------+-----+---------+-------+
  11. 5 rows in set (0.00 sec)

使用sqoop命令导入数据。

  1. sqoop import \
  2. --connect jdbc:mysql://quickstart:3306/retail_db \
  3. --username retail_dba \
  4. --password cloudera \
  5. -m 3 \
  6. --table emp \
  7. --split-by id \
  8. --columns id,name,deg \
  9. --warehouse-dir /user/sqoop/ \
  10. --delete-target-dir \
  11. --as-textfile

hdfs输出

  1. [cloudera@quickstart lib]$ hadoop fs -ls -R /user/sqoop/
  2. drwxr-xr-x - cloudera supergroup 0 2017-12-02 13:14 /user/sqoop/emp
  3. -rw-r--r-- 1 cloudera supergroup 0 2017-12-02 13:14 /user/sqoop/emp/_SUCCESS
  4. -rw-r--r-- 1 cloudera supergroup 70 2017-12-02 13:14 /user/sqoop/emp/part-m-00000
  5. -rw-r--r-- 1 cloudera supergroup 64 2017-12-02 13:14 /user/sqoop/emp/part-m-00001
  6. -rw-r--r-- 1 cloudera supergroup 86 2017-12-02 13:14 /user/sqoop/emp/part-m-00002
  7. mysql> select * from emp;
  8. +------+---------+--------------+--------+------+
  9. | id | name | deg | salary | dep |
  10. +------+---------+--------------+--------+------+
  11. | 1201 | gopal | manager | 50000 | tp |
  12. | 1202 | manisha | Proof reader | 50000 | TP |
  13. | 1203 | php dev | TECH WRITER | 50000 | AC |
  14. | 1204 | Nilesh | Domino dev | 70000 | AF |
  15. | 1205 | Vinayak | Java dev | 50000 | IT |
  16. | 1206 | Amish | Cog dev | 60000 | IT |
  17. | 1207 | Jatin | Oracel dev | 40001 | IT |
  18. | 1208 | Viren | Java dev | 70004 | IT |
  19. | 1209 | Ashish | Oracel dev | 40001 | IT |
  20. | 1210 | Satish | Java dev | 70004 | IT |
  21. +------+---------+--------------+--------+------+
  22. 10 rows in set (0.00 sec)

在表中插入新记录。

  1. mysql> insert into emp values(1211,'Jag', 'be', 20000, 'IT');
  2. Query OK, 1 row affected (0.03 sec)
  3. mysql> select * from emp;
  4. +------+---------+--------------+--------+------+
  5. | id | name | deg | salary | dep |
  6. +------+---------+--------------+--------+------+
  7. | 1201 | gopal | manager | 50000 | tp |
  8. | 1202 | manisha | Proof reader | 50000 | TP |
  9. | 1203 | php dev | TECH WRITER | 50000 | AC |
  10. | 1204 | Nilesh | Domino dev | 70000 | AF |
  11. | 1205 | Vinayak | Java dev | 50000 | IT |
  12. | 1206 | Amish | Cog dev | 60000 | IT |
  13. | 1207 | Jatin | Oracel dev | 40001 | IT |
  14. | 1208 | Viren | Java dev | 70004 | IT |
  15. | 1209 | Ashish | Oracel dev | 40001 | IT |
  16. | 1210 | Satish | Java dev | 70004 | IT |
  17. | 1211 | Jag | be | 20000 | IT |
  18. +------+---------+--------------+--------+------+
  19. 11 rows in set (0.00 sec)

增量导入命令

  1. sqoop import \
  2. --connect jdbc:mysql://quickstart:3306/retail_db \
  3. --username retail_dba \
  4. --password cloudera \
  5. --table emp \
  6. --split-by id \
  7. --check-column id \
  8. --incremental append \
  9. --last-value 1210 \
  10. --warehouse-dir /user/sqoop/ \
  11. --as-textfile

导入后

  1. [cloudera@quickstart lib]$ hadoop fs -ls -R /user/sqoop/emp
  2. -rw-r--r-- 1 cloudera supergroup 0 2017-12-02 13:14 /user/sqoop/emp/_SUCCESS
  3. -rw-r--r-- 1 cloudera supergroup 70 2017-12-02 13:14 /user/sqoop/emp/part-m-00000
  4. -rw-r--r-- 1 cloudera supergroup 64 2017-12-02 13:14 /user/sqoop/emp/part-m-00001
  5. -rw-r--r-- 1 cloudera supergroup 86 2017-12-02 13:14 /user/sqoop/emp/part-m-00002
  6. -rw-r--r-- 1 cloudera cloudera 21 2017-12-02 13:48 /user/sqoop/emp/part-m-00003
  7. [cloudera@quickstart lib]$ hadoop fs -cat /user/sqoop/emp/part-m-00003
  8. 1211,Jag,be,20000,IT

模式:上次修改

  1. mysql> describe orders;
  2. +-------------------+-------------+------+-----+---------+----------------+
  3. | Field | Type | Null | Key | Default | Extra |
  4. +-------------------+-------------+------+-----+---------+----------------+
  5. | order_id | int(11) | NO | PRI | NULL | auto_increment |
  6. | order_date | datetime | NO | | NULL | |
  7. | order_customer_id | int(11) | NO | | NULL | |
  8. | order_status | varchar(45) | NO | | NULL | |
  9. +-------------------+-------------+------+-----+---------+----------------+
  10. 4 rows in set (0.00 sec)

将订单导入hdfs

  1. sqoop import \
  2. --connect jdbc:mysql://quickstart:3306/retail_db \
  3. --username retail_dba \
  4. --password cloudera \
  5. --table orders \
  6. --split-by order_id \
  7. --target-dir /user/sqoop/orders \
  8. --as-textfile

导入后

  1. [cloudera@quickstart lib]$ hadoop fs -ls -R /user/sqoop/orders
  2. -rw-r--r-- 1 cloudera supergroup 0 2017-12-02 16:01 /user/sqoop/orders/_SUCCESS
  3. -rw-r--r-- 1 cloudera supergroup 741597 2017-12-02 16:01 /user/sqoop/orders/part-m-00000
  4. -rw-r--r-- 1 cloudera supergroup 753022 2017-12-02 16:01 /user/sqoop/orders/part-m-00001
  5. -rw-r--r-- 1 cloudera supergroup 752368 2017-12-02 16:01 /user/sqoop/orders/part-m-00002
  6. -rw-r--r-- 1 cloudera supergroup 752940 2017-12-02 16:01 /user/sqoop/orders/part-m-00003

更新订单数据

  1. mysql> select * from orders where order_id=10;
  2. +----------+---------------------+-------------------+-----------------+
  3. | order_id | order_date | order_customer_id | order_status |
  4. +----------+---------------------+-------------------+-----------------+
  5. | 10 | 2013-07-25 00:00:00 | 5648 | PENDING_PAYMENT |
  6. +----------+---------------------+-------------------+-----------------+
  7. 1 row in set (0.00 sec)
  8. mysql> update orders set order_status='CLOSED', order_date=now() where order_id=10;
  9. Query OK, 1 row affected (0.01 sec)
  10. Rows matched: 1 Changed: 1 Warnings: 0
  11. mysql> select * from orders where order_id=10;
  12. +----------+---------------------+-------------------+--------------+
  13. | order_id | order_date | order_customer_id | order_status |
  14. +----------+---------------------+-------------------+--------------+
  15. | 10 | 2017-12-02 16:19:23 | 5648 | CLOSED |
  16. +----------+---------------------+-------------------+--------------+
  17. 1 row in set (0.00 sec)

导入其他数据

  1. sqoop import \
  2. --connect jdbc:mysql://quickstart:3306/retail_db \
  3. --username retail_dba \
  4. --password cloudera \
  5. --table orders \
  6. --split-by order_id \
  7. --check-column order_date \
  8. --merge-key order_id \
  9. --incremental lastmodified \
  10. --target-dir /user/sqoop/orders1 \
  11. --as-textfile

输出

  1. [cloudera@quickstart lib]$ hadoop fs -ls -R /user/sqoop/orders1
  2. -rw-r--r-- 1 cloudera cloudera 0 2017-12-02 16:07 /user/sqoop/orders1/_SUCCESS
  3. -rw-r--r-- 1 cloudera cloudera 2999918 2017-12-02 16:07 /user/sqoop/orders1/part-r-00000

注意:如果我们使用的目录(订单)与以前的相同,那么它将删除旧文件并创建新零件。

展开查看全部

相关问题