DataX Mysql Reader and Hdfs writer, some records duplicate but port only single record can't reproduce

oxcyiej7  于 2021-11-29  发布在  Java
关注(0)|答案(0)|浏览(146)

the step is :

  1. Using mysql reader and hdfs writer, the number of record from mysql is 258507, and Datax say"读出记录总数:258507"。
  2. then I go to see hive table, the number of the record in hive table is 259461, so i guess some records duplicate
  3. run in hive sql: "select fskuid,count() from tablenamexxxx group by fskuid having count()>1", then some records listed: 66 8 (the value of fskuid is 66, and get 8 duplicates)
  4. then I go to Datax Json config file, and modify mysql reader'querySql to add "where fskuid=66" and re-run
  5. I get only one record.

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题