hive查询错误分组;表别名或列引用无效

kq4fsx7k  于 2021-06-03  发布在  Hadoop
关注(0)|答案(0)|浏览(409)

最仁慈的,
我试图扩展一些工作配置单元查询,但似乎达不到要求。只是想按函数测试group,这对于我需要完成的许多查询来说是常见的。下面是我尝试执行的查询:

DROP table CurrentCostDataSamples_MySQL_Dump_Last_1_Hour_Summary;

CREATE EXTERNAL TABLE IF NOT EXISTS CurrentCostDataSamples_MySQL_Dump_Last_1_Hour_Summary ( messageRowID STRING, payload_sensor INT, messagetimestamp BIGINT, payload_temp FLOAT, payload_timestamp BIGINT, payload_timestampmysql STRING, payload_watt INT, payload_wattseconds INT ) 
STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
WITH SERDEPROPERTIES ( "cassandra.host" = "127.0.0.1",
    "cassandra.port" = "9160",
    "cassandra.ks.name" = "EVENT_KS",
    "cassandra.ks.username" = "admin",
    "cassandra.ks.password" = "admin",
    "cassandra.cf.name" = "currentcost_stream",
    "cassandra.columns.mapping" = ":key, payload_sensor, Timestamp, payload_temp, payload_timestamp, payload_timestampmysql, payload_watt, payload_wattseconds" );

select messageRowID, payload_sensor, messagetimestamp, payload_temp, payload_timestamp, payload_timestampmysql, payload_watt, payload_wattseconds, hour(from_unixtime(payload_timestamp)) AS hourly
FROM CurrentCostDataSamples_MySQL_Dump_Last_1_Hour_Summary 
WHERE payload_timestamp > unix_timestamp() - 3024*60*60
GROUP BY hourly;

这将产生以下错误:
错误:执行配置单元脚本时出错。查询返回非零代码:10,原因:失败:语义分析错误:行1:320无效的表别名或列引用“hourly”:(可能的列名为:messagerowid、payload\u sensor、messagetimestamp、payload\u temp、payload\u timestamp、payload\u timestampmysql、payload\u watt、payload\u wattseconds)
这样做的目的是在payload\u wattsecond等的sum()上创建一个有时限的查询(比如最近24小时)。为了开始创建摘要表,我开始构建一个group by查询,该查询将派生select查询的小时锚定。
但问题是上面的错误。如果有人能指出这里的问题,我将不胜感激。。我自己好像找不到,不过我还是个 hive 里的新手。
提前谢谢。。
更新:尝试更新查询。下面是我刚刚尝试运行的查询:

DROP table CurrentCostDataSamples_MySQL_Dump_Last_1_Hour_Summary;

CREATE EXTERNAL TABLE IF NOT EXISTS CurrentCostDataSamples_MySQL_Dump_Last_1_Hour_Summary ( messageRowID STRING, payload_sensor INT, messagetimestamp BIGINT, payload_temp FLOAT, payload_timestamp BIGINT, payload_timestampmysql STRING, payload_watt INT, payload_wattseconds INT ) 
STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
WITH SERDEPROPERTIES ( "cassandra.host" = "127.0.0.1",
    "cassandra.port" = "9160",
    "cassandra.ks.name" = "EVENT_KS",
    "cassandra.ks.username" = "admin",
    "cassandra.ks.password" = "admin",
    "cassandra.cf.name" = "currentcost_stream",
    "cassandra.columns.mapping" = ":key, payload_sensor, Timestamp, payload_temp, payload_timestamp, payload_timestampmysql, payload_watt, payload_wattseconds" );

select messageRowID, payload_sensor, messagetimestamp, payload_temp, payload_timestamp, payload_timestampmysql, payload_watt, payload_wattseconds, hour(from_unixtime(payload_timestamp))
FROM CurrentCostDataSamples_MySQL_Dump_Last_1_Hour_Summary 
WHERE payload_timestamp > unix_timestamp() - 3024*60*60
GROUP BY hour(from_unixtime(payload_timestamp));

.. 但是,这会产生另一个错误,即:

ERROR: Error while executing Hive script.Query returned non-zero code: 10, cause: FAILED: Error in semantic analysis: Line 1:7 Expression not in GROUP BY key 'messageRowID'

思想?
更新#2)下面是一些示例的快速转储,这些示例派生到wso2bam中的事件Šks cf中。最后一列是计算的(在perl守护进程中..)瓦特秒,它将在查询中用于计算总计为kwh的聚合总和,然后将这些总和转储到mysql表中,以便与包含ui/ux层的应用程序同步。。

[12:03:00] [jskogsta@enterprise ../Product Centric Opco Modelling]$ ~/local/apache-cassandra-2.0.8/bin/cqlsh localhost 9160 -u admin -p admin --cqlversion="3.0.5"
Connected to Test Cluster at localhost:9160.
[cqlsh 4.1.1 | Cassandra 1.2.13 | CQL spec 3.0.5 | Thrift protocol 19.36.2]
Use HELP for help.
cqlsh> use "EVENT_KS";
cqlsh:EVENT_KS> select * from currentcost_stream limit 5;

 key                                       | Description               | Name               | Nick_Name            | StreamId                  | Timestamp     | Version | payload_sensor | payload_temp | payload_timestamp | payload_timestampmysql | payload_watt | payload_wattseconds
-------------------------------------------+---------------------------+--------------------+----------------------+---------------------------+---------------+---------+----------------+--------------+-------------------+------------------------+--------------+---------------------
  1403365575174::10.11.205.218::9443::9919 | Sample data from CC meter | currentcost.stream | Currentcost Realtime | currentcost.stream:1.0.18 | 1403365575174 |  1.0.18 |              1 |        13.16 |        1403365575 |    2014-06-21 23:46:15 |         6631 |               19893
  1403354553932::10.11.205.218::9443::2663 | Sample data from CC meter | currentcost.stream | Currentcost Realtime | currentcost.stream:1.0.18 | 1403354553932 |  1.0.18 |              1 |         14.1 |        1403354553 |    2014-06-21 20:42:33 |        28475 |                   0
 1403374113341::10.11.205.218::9443::11852 | Sample data from CC meter | currentcost.stream | Currentcost Realtime | currentcost.stream:1.0.18 | 1403374113341 |  1.0.18 |              1 |        10.18 |        1403374113 |    2014-06-22 02:08:33 |        17188 |              154692
  1403354501924::10.11.205.218::9443::1894 | Sample data from CC meter | currentcost.stream | Currentcost Realtime | currentcost.stream:1.0.18 | 1403354501924 |  1.0.18 |              1 |        10.17 |        1403354501 |    2014-06-21 20:41:41 |        26266 |                   0
 1403407054092::10.11.205.218::9443::15527 | Sample data from CC meter | currentcost.stream | Currentcost Realtime | currentcost.stream:1.0.18 | 1403407054092 |  1.0.18 |              1 |        17.16 |        1403407054 |    2014-06-22 11:17:34 |         6332 |                6332

(5 rows)

cqlsh:EVENT_KS>

我将尝试对这个表发出一个查询(实际倍数取决于所需的各种表示聚合…),并基于小时总和、10分钟总和、每日总和、每月总和等呈现一个视图。根据查询,groupby打算给出这个“索引”。现在正在测试这个。。所以我们会看看最后的结局。希望这有意义?!
所以不想删除重复的。。。
更新3)这一切都错了。。对下面给出的提示进行了更多的思考。因此,只要简化整个查询就可以得到正确的结果。下面的查询给出了整个数据集每小时的kwh总量。有了它,我可以创建不同时间段的kwh迭代,比如
过去24小时每小时
去年每天
过去一个小时的一分钟
.. 等等等等。
以下是查询:

DROP table CurrentCostDataSamples_MySQL_Dump_Last_1_Hour_Summary;

CREATE EXTERNAL TABLE IF NOT EXISTS CurrentCostDataSamples_MySQL_Dump_Last_1_Hour_Summary ( messageRowID STRING, payload_sensor INT, messagetimestamp BIGINT, payload_temp FLOAT, payload_timestamp BIGINT, payload_timestampmysql STRING, payload_watt INT, payload_wattseconds INT ) 
STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
WITH SERDEPROPERTIES ( "cassandra.host" = "127.0.0.1",
    "cassandra.port" = "9160",
    "cassandra.ks.name" = "EVENT_KS",
    "cassandra.ks.username" = "admin",
    "cassandra.ks.password" = "admin",
    "cassandra.cf.name" = "currentcost_stream",
    "cassandra.columns.mapping" = ":key, payload_sensor, Timestamp, payload_temp, payload_timestamp, payload_timestampmysql, payload_watt, payload_wattseconds" );

select hour(from_unixtime(payload_timestamp)) AS hourly, (sum(payload_wattseconds)/(60*60)/1000)
FROM CurrentCostDataSamples_MySQL_Dump_Last_1_Hour_Summary 
GROUP BY hour(from_unixtime(payload_timestamp));

此查询基于示例数据生成以下结果:

hourly  _c1
0   16.91570472222222
1   16.363228888888887
2   15.446414166666667
3   11.151388055555556
4   18.10564666666667
5   2.2734924999999997
6   17.370668055555555
7   17.991484444444446
8   38.632728888888884
9   16.001440555555554
10  15.887023888888889
11  12.709341944444445
12  23.052629722222225
13  14.986092222222222
14  16.182284722222224
15  5.881564999999999
18  2.8149172222222223
19  17.484405
20  15.888274166666665
21  15.387210833333333
22  16.088641666666668
23  16.49990916666667

它是整个数据集上每小时的总千瓦时。。
所以,现在开始下一个问题

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题