the step is :
- Using mysql reader and hdfs writer, the number of record from mysql is 258507, and Datax say"读出记录总数:258507"。
- then I go to see hive table, the number of the record in hive table is 259461, so i guess some records duplicate
- run in hive sql: "select fskuid,count() from tablenamexxxx group by fskuid having count()>1", then some records listed: 66 8 (the value of fskuid is 66, and get 8 duplicates)
- then I go to Datax Json config file, and modify mysql reader'querySql to add "where fskuid=66" and re-run
- I get only one record.
暂无答案!
目前还没有任何答案,快来回答吧!