mapreduce.map.java.opts=“-duser.timezone=gmt”不工作

iklwldmw  于 2021-06-03  发布在  Sqoop
关注(0)|答案(1)|浏览(724)

该数据库的所有日期都以gmt为单位,而sqoop会自动获取本地(亚洲/加尔各答)的增量更新。
它可能是从jvm中获取的,但我需要它对某些作业使用gmt,对其他作业使用local。我该怎么办?
链接https://community.cloudera.com/t5/data-ingestion-integration/sqoop-s-metastore-timezone/td-p/16306
讨论同样的问题。有什么解决办法吗?线程中给出的解决方案对我来说并不真正有效。
以下是我在sqoop工作的经验:

sqoop job -D oracle.sessionTimeZone=GMT -D mapred.child.java.opts=" -Duser.timezone=GMT" --meta-connect jdbc:hsqldb:hsql://FQDN:16000/sqoop --create JOB_NAME -- import --driver com.mysql.jdbc.Driver --connect jdbc:mysql://IP/DB?zeroDateTimeBehavior=convertToNull --username root --password 'PASSWORD' --table TABLE_NAME--incremental lastmodified --check-column updated_at --last-value 0 --merge-key entity_id --split-by entity_id --target-dir LOCATION_SPECIFIED --hive-database Magento --hive-drop-import-delims --null-string '\\N' --null-non-string '\\N' --fields-terminated-by '\001' --input-null-string '\\N' --input-null-non-string '\\N' --input-null-non-string '\\N' --input-fields-terminated-by '\001'

日志:

5459 [uber-SubtaskRunner] WARN  org.apache.sqoop.tool.SqoopTool  - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
5497 [uber-SubtaskRunner] INFO  org.apache.sqoop.Sqoop  - Running Sqoop version: 1.4.6-cdh5.7.0
5817 [uber-SubtaskRunner] WARN  org.apache.sqoop.tool.BaseSqoopTool  - Setting your password on the command-line is insecure. Consider using -P instead.
5832 [uber-SubtaskRunner] WARN  org.apache.sqoop.ConnFactory  - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
5859 [uber-SubtaskRunner] WARN  org.apache.sqoop.ConnFactory  - Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time.
5874 [uber-SubtaskRunner] INFO  org.apache.sqoop.manager.SqlManager  - Using default fetchSize of 1000
5874 [uber-SubtaskRunner] INFO  org.apache.sqoop.tool.CodeGenTool  - Beginning code generation
6306 [uber-SubtaskRunner] INFO  org.apache.sqoop.manager.SqlManager  - Executing SQL statement: SELECT t.* FROM sales_flat_order AS t WHERE 1=0
6330 [uber-SubtaskRunner] INFO  org.apache.sqoop.manager.SqlManager  - Executing SQL statement: SELECT t.* FROM sales_flat_order AS t WHERE 1=0
6434 [uber-SubtaskRunner] INFO  org.apache.sqoop.orm.CompilationManager  - HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p0.45/lib/hadoop-mapreduce
9911 [uber-SubtaskRunner] INFO  org.apache.sqoop.orm.CompilationManager  - Writing jar file: /tmp/sqoop-yarn/compile/51c9a7f9e76b0547825eb7a852721bf9/sales_flat_order.jar
9928 [uber-SubtaskRunner] INFO  org.apache.sqoop.manager.SqlManager  - Executing SQL statement: SELECT t.* FROM sales_flat_order AS t WHERE 1=0
9941 [uber-SubtaskRunner] INFO  org.apache.sqoop.tool.ImportTool  - Incremental import based on column updated_at
9941 [uber-SubtaskRunner] INFO  org.apache.sqoop.tool.ImportTool  - Lower bound value: '0'
9941 [uber-SubtaskRunner] INFO  org.apache.sqoop.tool.ImportTool  - Upper bound value: '2016-06-30 11:40:36.0'
9943 [uber-SubtaskRunner] INFO  org.apache.sqoop.mapreduce.ImportJobBase  - Beginning import of sales_flat_order
9962 [uber-SubtaskRunner] INFO  org.apache.sqoop.manager.SqlManager  - Executing SQL statement: SELECT t.* FROM sales_flat_order AS t WHERE 1=0
10007 [uber-SubtaskRunner] WARN  org.apache.sqoop.mapreduce.JobBase  - SQOOP_HOME is unset. May not be able to find all job dependencies.
10672 [uber-SubtaskRunner] INFO  org.apache.sqoop.mapreduce.db.DBInputFormat  - Using read commited transaction isolation
10674 [uber-SubtaskRunner] INFO  org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat  - BoundingValsQuery: SELECT MIN(entity_id), MAX(entity_id) FROM sales_flat_order WHERE ( updated_at >= '0' AND updated_at < '2016-06-30 11:40:36.0' )
11667 [uber-SubtaskRunner] INFO  org.apache.sqoop.mapreduce.db.IntegerSplitter  - Split size: 86592; Num splits: 4 from: 1 to: 346372
Heart beat
42986 [uber-SubtaskRunner] INFO  org.apache.sqoop.mapreduce.ImportJobBase  - Transferred 300.3027 MB in 32.9683 seconds (9.1088 MB/sec)
42995 [uber-SubtaskRunner] INFO  org.apache.sqoop.mapreduce.ImportJobBase  - Retrieved 339510 records.
43008 [uber-SubtaskRunner] INFO  org.apache.sqoop.tool.ImportTool  - Saving incremental import state to the metastore
43224 [uber-SubtaskRunner] INFO  org.apache.sqoop.tool.ImportTool  - Updated data for job: sales_flat_order
fbcarpbf

fbcarpbf1#

oracle方面可能的解决方法是:
向表中添加虚拟列
使用它来“显示”从sqoop本地时区开始的原始gmt日期时间,使用几个cast()和at时区转换
然后可选地在该虚拟列上创建一个索引,并检查它是否被sqoop导出实际使用(以及它是否产生足够大的差异来补偿在插入时更新索引的开销)

相关问题