最仁慈的,
我试图扩展一些工作配置单元查询,但似乎达不到要求。只是想按函数测试group,这对于我需要完成的许多查询来说是常见的。下面是我尝试执行的查询:
DROP table CurrentCostDataSamples_MySQL_Dump_Last_1_Hour_Summary;
CREATE EXTERNAL TABLE IF NOT EXISTS CurrentCostDataSamples_MySQL_Dump_Last_1_Hour_Summary ( messageRowID STRING, payload_sensor INT, messagetimestamp BIGINT, payload_temp FLOAT, payload_timestamp BIGINT, payload_timestampmysql STRING, payload_watt INT, payload_wattseconds INT )
STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
WITH SERDEPROPERTIES ( "cassandra.host" = "127.0.0.1",
"cassandra.port" = "9160",
"cassandra.ks.name" = "EVENT_KS",
"cassandra.ks.username" = "admin",
"cassandra.ks.password" = "admin",
"cassandra.cf.name" = "currentcost_stream",
"cassandra.columns.mapping" = ":key, payload_sensor, Timestamp, payload_temp, payload_timestamp, payload_timestampmysql, payload_watt, payload_wattseconds" );
select messageRowID, payload_sensor, messagetimestamp, payload_temp, payload_timestamp, payload_timestampmysql, payload_watt, payload_wattseconds, hour(from_unixtime(payload_timestamp)) AS hourly
FROM CurrentCostDataSamples_MySQL_Dump_Last_1_Hour_Summary
WHERE payload_timestamp > unix_timestamp() - 3024*60*60
GROUP BY hourly;
这将产生以下错误:
错误:执行配置单元脚本时出错。查询返回非零代码:10,原因:失败:语义分析错误:行1:320无效的表别名或列引用“hourly”:(可能的列名为:messagerowid、payload\u sensor、messagetimestamp、payload\u temp、payload\u timestamp、payload\u timestampmysql、payload\u watt、payload\u wattseconds)
这样做的目的是在payload\u wattsecond等的sum()上创建一个有时限的查询(比如最近24小时)。为了开始创建摘要表,我开始构建一个group by查询,该查询将派生select查询的小时锚定。
但问题是上面的错误。如果有人能指出这里的问题,我将不胜感激。。我自己好像找不到,不过我还是个 hive 里的新手。
提前谢谢。。
更新:尝试更新查询。下面是我刚刚尝试运行的查询:
DROP table CurrentCostDataSamples_MySQL_Dump_Last_1_Hour_Summary;
CREATE EXTERNAL TABLE IF NOT EXISTS CurrentCostDataSamples_MySQL_Dump_Last_1_Hour_Summary ( messageRowID STRING, payload_sensor INT, messagetimestamp BIGINT, payload_temp FLOAT, payload_timestamp BIGINT, payload_timestampmysql STRING, payload_watt INT, payload_wattseconds INT )
STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
WITH SERDEPROPERTIES ( "cassandra.host" = "127.0.0.1",
"cassandra.port" = "9160",
"cassandra.ks.name" = "EVENT_KS",
"cassandra.ks.username" = "admin",
"cassandra.ks.password" = "admin",
"cassandra.cf.name" = "currentcost_stream",
"cassandra.columns.mapping" = ":key, payload_sensor, Timestamp, payload_temp, payload_timestamp, payload_timestampmysql, payload_watt, payload_wattseconds" );
select messageRowID, payload_sensor, messagetimestamp, payload_temp, payload_timestamp, payload_timestampmysql, payload_watt, payload_wattseconds, hour(from_unixtime(payload_timestamp))
FROM CurrentCostDataSamples_MySQL_Dump_Last_1_Hour_Summary
WHERE payload_timestamp > unix_timestamp() - 3024*60*60
GROUP BY hour(from_unixtime(payload_timestamp));
.. 但是,这会产生另一个错误,即:
ERROR: Error while executing Hive script.Query returned non-zero code: 10, cause: FAILED: Error in semantic analysis: Line 1:7 Expression not in GROUP BY key 'messageRowID'
思想?
更新#2)下面是一些示例的快速转储,这些示例派生到wso2bam中的事件Šks cf中。最后一列是计算的(在perl守护进程中..)瓦特秒,它将在查询中用于计算总计为kwh的聚合总和,然后将这些总和转储到mysql表中,以便与包含ui/ux层的应用程序同步。。
[12:03:00] [jskogsta@enterprise ../Product Centric Opco Modelling]$ ~/local/apache-cassandra-2.0.8/bin/cqlsh localhost 9160 -u admin -p admin --cqlversion="3.0.5"
Connected to Test Cluster at localhost:9160.
[cqlsh 4.1.1 | Cassandra 1.2.13 | CQL spec 3.0.5 | Thrift protocol 19.36.2]
Use HELP for help.
cqlsh> use "EVENT_KS";
cqlsh:EVENT_KS> select * from currentcost_stream limit 5;
key | Description | Name | Nick_Name | StreamId | Timestamp | Version | payload_sensor | payload_temp | payload_timestamp | payload_timestampmysql | payload_watt | payload_wattseconds
-------------------------------------------+---------------------------+--------------------+----------------------+---------------------------+---------------+---------+----------------+--------------+-------------------+------------------------+--------------+---------------------
1403365575174::10.11.205.218::9443::9919 | Sample data from CC meter | currentcost.stream | Currentcost Realtime | currentcost.stream:1.0.18 | 1403365575174 | 1.0.18 | 1 | 13.16 | 1403365575 | 2014-06-21 23:46:15 | 6631 | 19893
1403354553932::10.11.205.218::9443::2663 | Sample data from CC meter | currentcost.stream | Currentcost Realtime | currentcost.stream:1.0.18 | 1403354553932 | 1.0.18 | 1 | 14.1 | 1403354553 | 2014-06-21 20:42:33 | 28475 | 0
1403374113341::10.11.205.218::9443::11852 | Sample data from CC meter | currentcost.stream | Currentcost Realtime | currentcost.stream:1.0.18 | 1403374113341 | 1.0.18 | 1 | 10.18 | 1403374113 | 2014-06-22 02:08:33 | 17188 | 154692
1403354501924::10.11.205.218::9443::1894 | Sample data from CC meter | currentcost.stream | Currentcost Realtime | currentcost.stream:1.0.18 | 1403354501924 | 1.0.18 | 1 | 10.17 | 1403354501 | 2014-06-21 20:41:41 | 26266 | 0
1403407054092::10.11.205.218::9443::15527 | Sample data from CC meter | currentcost.stream | Currentcost Realtime | currentcost.stream:1.0.18 | 1403407054092 | 1.0.18 | 1 | 17.16 | 1403407054 | 2014-06-22 11:17:34 | 6332 | 6332
(5 rows)
cqlsh:EVENT_KS>
我将尝试对这个表发出一个查询(实际倍数取决于所需的各种表示聚合…),并基于小时总和、10分钟总和、每日总和、每月总和等呈现一个视图。根据查询,groupby打算给出这个“索引”。现在正在测试这个。。所以我们会看看最后的结局。希望这有意义?!
所以不想删除重复的。。。
更新3)这一切都错了。。对下面给出的提示进行了更多的思考。因此,只要简化整个查询就可以得到正确的结果。下面的查询给出了整个数据集每小时的kwh总量。有了它,我可以创建不同时间段的kwh迭代,比如
过去24小时每小时
去年每天
过去一个小时的一分钟
.. 等等等等。
以下是查询:
DROP table CurrentCostDataSamples_MySQL_Dump_Last_1_Hour_Summary;
CREATE EXTERNAL TABLE IF NOT EXISTS CurrentCostDataSamples_MySQL_Dump_Last_1_Hour_Summary ( messageRowID STRING, payload_sensor INT, messagetimestamp BIGINT, payload_temp FLOAT, payload_timestamp BIGINT, payload_timestampmysql STRING, payload_watt INT, payload_wattseconds INT )
STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
WITH SERDEPROPERTIES ( "cassandra.host" = "127.0.0.1",
"cassandra.port" = "9160",
"cassandra.ks.name" = "EVENT_KS",
"cassandra.ks.username" = "admin",
"cassandra.ks.password" = "admin",
"cassandra.cf.name" = "currentcost_stream",
"cassandra.columns.mapping" = ":key, payload_sensor, Timestamp, payload_temp, payload_timestamp, payload_timestampmysql, payload_watt, payload_wattseconds" );
select hour(from_unixtime(payload_timestamp)) AS hourly, (sum(payload_wattseconds)/(60*60)/1000)
FROM CurrentCostDataSamples_MySQL_Dump_Last_1_Hour_Summary
GROUP BY hour(from_unixtime(payload_timestamp));
此查询基于示例数据生成以下结果:
hourly _c1
0 16.91570472222222
1 16.363228888888887
2 15.446414166666667
3 11.151388055555556
4 18.10564666666667
5 2.2734924999999997
6 17.370668055555555
7 17.991484444444446
8 38.632728888888884
9 16.001440555555554
10 15.887023888888889
11 12.709341944444445
12 23.052629722222225
13 14.986092222222222
14 16.182284722222224
15 5.881564999999999
18 2.8149172222222223
19 17.484405
20 15.888274166666665
21 15.387210833333333
22 16.088641666666668
23 16.49990916666667
它是整个数据集上每小时的总千瓦时。。
所以,现在开始下一个问题
暂无答案!
目前还没有任何答案,快来回答吧!