hive为json对象计算错误的和

dxxyhpgq  于 2021-06-04  发布在  Hadoop
关注(0)|答案(1)|浏览(552)

我有一个带有一列数据的外部表,其中数据是json对象
当我运行以下配置单元查询时

hive> select get_json_object(data, "$.ev") from data_table limit 3;     

Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201212171824_0218, Tracking URL = http://master:50030/jobdetails.jsp?jobid=job_201212171824_0218
Kill Command = /usr/lib/hadoop/bin/hadoop job  -Dmapred.job.tracker=master:8021 -kill job_201212171824_0218
 2013-01-24 10:41:37,271 Stage-1 map = 0%,  reduce = 0%
 ....
 2013-01-24 10:41:55,549 Stage-1 map = 100%,  reduce = 100%
 Ended Job = job_201212171824_0218
 OK
 2
 2
 2
 Time taken: 21.449 seconds

但当我运行总和聚合时,结果很奇怪

hive> select sum(get_json_object(data, "$.ev")) from data_table limit 3;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
 set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
 set mapred.reduce.tasks=<number>
Starting Job = job_201212171824_0217, Tracking URL =   http://master:50030/jobdetails.jsp?jobid=job_201212171824_0217
Kill Command = /usr/lib/hadoop/bin/hadoop job  -Dmapred.job.tracker=master:8021 -kill  job_201212171824_0217
2013-01-24 10:39:24,485 Stage-1 map = 0%,  reduce = 0%
.....
2013-01-24 10:41:00,760 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201212171824_0217
OK
9.4031522E7
Time taken: 100.416 seconds

有人能解释一下为什么吗?我该怎么做才能正常工作呢?

i1icjdpr

i1icjdpr1#

hive似乎将json中的值作为 float s而不是 int s、 看起来你的表很大,所以hive可能用“指数”表示大的浮点数,所以 9.4031522E7 可能意味着 94031522 .
如果你想确定你在做一个 sum 在int上,您可以将json的字段强制转换为int,并且sum应该能够返回int:

$ hive -e "select sum(get_json_object(dt, '$.ev')) from json_table"
8.806305E7
$ hive -e "select sum(cast(get_json_object(dt, '$.ev') as int)) from json_table"
88063050

相关问题