这真的是一个多问题,我为问题超载道歉,但我只是需要这些优雅地完成。我可以处理mysql中的简单查询,但是那些复杂的表格通常很适合我,而且我还不熟悉动态sql。寻找简单的解决方案(但不是硬编码的):。我不确定这是不是太多的问题,所以,如果它确实是太多,请回答一个或两个聚合,给我的工具,使我可以建立这些类型的聚合自己。
我的数据结构如下:
+-----------------------------------------------------------------+
| timestamp group url metric columns here |
+-----------------------------------------------------------------+
| 2018-05-01 14:30:00 6732 abc.com -0.3673 -0.0914 4.0183 |
| 2018-05-01 14:30:00 6732 xyz.com 4.2187 0.3407 12.3832 |
| 2018-05-01 14:30:00 6732 pqr.org -2.3875 -0.4064 5.8743 |
| 2018-05-01 14:30:00 6732 many.com -4.4194 -1.0665 4.144 |
| 2018-05-01 14:00:00 7174 abc.com -6.4021 -1.419 4.5117 |
| 2018-05-01 14:00:00 7174 xyz.com -1.7971 -1.0396 1.7286 |
| 2018-05-01 14:00:00 7174 many.com 0.5276 0.2621 2.013 |
| 2018-05-01 13:30:00 7174 many.com -0.4941 -0.1098 4.4982 |
| 2018-05-01 13:30:00 7184 diff.com -0.6783 -0.1384 4.9013 |
| 2018-05-01 13:30:00 7184 sites.com -0.1293 -0.0246 5.2608 |
| 2018-05-01 13:30:00 7184 here.com -0.2703 -0.0669 4.0377 |
+-----------------------------------------------------------------+
基本上,对于每个时间戳,我们有来自不同组的数据,对于每个组,我们有url,对于每个url,我们有被捕获的度量。url和adgroups具有多对多关系。
我必须根据案例需求,以多种方式提取和汇总这些数据。通常,我选择所需的任何度量,并按timestamp、group和url中的一个或多个进行分组。然而,有时我想在一个组中查看数据/聚合,但最终我会为它们运行一个单独的查询。例如,在一个特定的时间窗口中,某个度量值下降或上升,我想分别挖掘每个时间窗口,我必须分别重复这个过程,因为在一个时间窗口中,某些组可以上下挖掘,以获得url阶段需要一个单独的查询。我需要的是一种在顶层进行聚合的方法—时间戳和组,但也要显示来自底层的聚合。举个例子:
像这样的事情真的很有帮助:
+---------------------+-------------+-------------+------------------+------------------------------+------------------------------+--------------------+--------------------------------+--------------------------------+---------------------+---------------------------------+---------------------------------+
| timestamp | aggregate_1 | aggregate_2 | window_top_group | window_top_group_aggregate_1 | window_top_group_aggregate_2 | window_top_group_2 | window_top_group_2_aggregate_1 | window_top_group_2_aggregate_2 | window_loss_group_1 | window_loss_group_1_aggregate_1 | window_loss_group_1_aggregate_2 |
+---------------------+-------------+-------------+------------------+------------------------------+------------------------------+--------------------+--------------------------------+--------------------------------+---------------------+---------------------------------+---------------------------------+
| 2018-05-01 14:30:00 | -0.3673 | -0.0914 | 6732 | -0.3673 | -0.3673 | 7174 | -0.3673 | -0.3673 | 7184 | -0.3673 | -0.3673 |
| 2018-05-01 14:00:00 | 4.2187 | 0.3407 | 6732 | 4.2187 | 4.2187 | 7174 | 4.2187 | 4.2187 | 7184 | 4.2187 | 4.2187 |
| 2018-05-01 13:30:00 | -2.3875 | -0.4064 | 6732 | -2.3875 | -2.3875 | 7174 | -2.3875 | -2.3875 | 7184 | -2.3875 | -2.3875 |
| 2018-05-01 13:00:00 | -4.4194 | -1.0665 | 6732 | -4.4194 | -4.4194 | 7174 | -4.4194 | -4.4194 | 7184 | -4.4194 | -4.4194 |
| 2018-05-01 12:30:00 | -6.4021 | -1.419 | 7174 | -6.4021 | -6.4021 | 7184 | -6.4021 | -6.4021 | 6732 | -6.4021 | -6.4021 |
| 2018-05-01 12:00:00 | -1.7971 | -1.0396 | 7174 | -1.7971 | -1.7971 | 7184 | -1.7971 | -1.7971 | 6732 | -1.7971 | -1.7971 |
| 2018-05-01 11:30:00 | 0.5276 | 0.2621 | 7174 | 0.5276 | 0.5276 | 7184 | 0.5276 | 0.5276 | 6732 | 0.5276 | 0.5276 |
| 2018-05-01 11:00:00 | -0.4941 | -0.1098 | 7174 | -0.4941 | -0.4941 | 6732 | -0.4941 | -0.4941 | 7184 | -0.4941 | -0.4941 |
| 2018-05-01 10:30:00 | -0.6783 | -0.1384 | 7184 | -0.6783 | -0.6783 | 6732 | -0.6783 | -0.6783 | 7174 | -0.6783 | -0.6783 |
| 2018-05-01 10:00:00 | -0.1293 | -0.0246 | 7184 | -0.1293 | -0.1293 | 6732 | -0.1293 | -0.1293 | 7174 | -0.1293 | -0.1293 |
| 2018-05-01 9:30:00 | -0.2703 | -0.0669 | 7184 | -0.2703 | -0.2703 | 6732 | -0.2703 | -0.2703 | 7174 | -0.2703 | -0.2703 |
+---------------------+-------------+-------------+------------------+------------------------------+------------------------------+--------------------+--------------------------------+--------------------------------+---------------------+---------------------------------+---------------------------------+
也许我们能再深入一层?比如说,在聚合时间戳时,获取顶级组的顶级url或者顶级组url组合?
很少有其他聚合真正有帮助:
1) 例如,对于特定的时间范围,例如一个完整的月份:按URL聚合,显示整个范围内的最佳/最差时间和值,但也可以在整个月份的某个时间段对它们进行平均,并在那里获取聚合,如图所示:
+-----------+-------------+-------------+------------------------------------+-------------------------------------+--------------------------+----------------------------+--------------+----------------------------+----------------+------------------------------+
| url | aggregate_1 | aggregate_2 | best performance timestamp overall | worst performance timestamp overall | peak time of average day | trough time of average day | mean_at_peak | standard_deviation_at_peak | mean_at_trough | standard_deviation_at_trough |
+-----------+-------------+-------------+------------------------------------+-------------------------------------+--------------------------+----------------------------+--------------+----------------------------+----------------+------------------------------+
| abc.com | -0.3673 | -0.3673 | 2018-05-01 14:30:00 | 2018-05-01 14:30:00 | 2018-05-01 9:30:00 | 2018-05-01 9:30:00 | 0.5276 | 0.5276 | 0.5276 | 0.5276 |
| xyz.com | 4.2187 | 4.2187 | 2018-05-01 14:00:00 | 2018-05-01 14:00:00 | 2018-05-01 10:00:00 | 2018-05-01 10:00:00 | 0.5276 | 0.5276 | 0.5276 | 0.5276 |
| pqr.org | -2.3875 | -2.3875 | 2018-05-01 13:30:00 | 2018-05-01 13:30:00 | 2018-05-01 10:30:00 | 2018-05-01 10:30:00 | 4.2187 | 4.2187 | 4.2187 | 4.2187 |
| many.com | -4.4194 | -4.4194 | 2018-05-01 13:00:00 | 2018-05-01 13:00:00 | 2018-05-01 10:30:00 | 2018-05-01 10:30:00 | 5.449066667 | 5.449066667 | 5.449066667 | 5.449066667 |
| abc.com | -6.4021 | -6.4021 | 2018-05-01 12:30:00 | 2018-05-01 10:30:00 | 2018-05-01 12:00:00 | 2018-05-01 12:00:00 | 4.2187 | 4.2187 | 4.2187 | 4.2187 |
| xyz.com | -1.7971 | -1.7971 | 2018-05-01 12:00:00 | 2018-05-01 12:00:00 | 2018-05-01 10:30:00 | 2018-05-01 10:30:00 | 0.5276 | 0.5276 | 0.5276 | 0.5276 |
| pqr.org | 0.5276 | 0.5276 | 2018-05-01 11:30:00 | 2018-05-01 10:30:00 | 2018-05-01 10:30:00 | 2018-05-01 10:30:00 | 7.985716667 | 7.985716667 | 7.985716667 | 7.985716667 |
| many.com | -0.4941 | -0.4941 | 2018-05-01 11:00:00 | 2018-05-01 11:00:00 | 2018-05-01 11:00:00 | 2018-05-01 11:00:00 | 4.2187 | 4.2187 | 4.2187 | 4.2187 |
| many.com | -0.6783 | -0.6783 | 2018-05-01 10:30:00 | 2018-05-01 10:30:00 | 2018-05-01 9:30:00 | 2018-05-01 9:30:00 | 0.5276 | 0.5276 | 0.5276 | 0.5276 |
| sites.com | -0.1293 | -0.1293 | 2018-05-01 10:00:00 | 2018-05-01 10:00:00 | 2018-05-01 10:30:00 | 2018-05-01 10:30:00 | 9.522366667 | 9.522366667 | 9.522366667 | 9.522366667 |
| here.com | -0.2703 | -0.2703 | 2018-05-01 9:30:00 | 2018-05-01 9:30:00 | 2018-05-01 10:00:00 | 2018-05-01 10:00:00 | 4.2187 | 4.2187 | 4.2187 | 4.2187 |
+-----------+-------------+-------------+------------------------------------+-------------------------------------+--------------------------+----------------------------+--------------+----------------------------+----------------+------------------------------+
2) 对于指定的URL列表或让查询本身构建URL列表(例如,与模式匹配的URL列表或在每个窗口中使用度量1的前3个URL列表),显示所提供或所需度量的百分比贡献:
+---------------------+----------+-------------------------------+-------------------------------+-------------------------------+----------+-------------------------------+-------------------------------+-------------------------------+
| timestamp | metric_1 | contribution_percentage_url_1 | contribution_percentage_url_2 | contribution_percentage_url_3 | metric_2 | contribution_percentage_url_1 | contribution_percentage_url_2 | contribution_percentage_url_3 |
+---------------------+----------+-------------------------------+-------------------------------+-------------------------------+----------+-------------------------------+-------------------------------+-------------------------------+
| 2018-05-01 14:30:00 | -0.3673 | 33 | 26 | 18 | -0.3673 | 53 | 30 | 11 |
| 2018-05-01 14:00:00 | 4.2187 | 33 | 29 | 12 | 4.2187 | 30 | 32 | 20 |
| 2018-05-01 13:30:00 | -2.3875 | 53 | 29 | 17 | -2.3875 | 37 | 32 | 11 |
| 2018-05-01 13:00:00 | -4.4194 | 39 | 27 | 19 | -4.4194 | 31 | 34 | 10 |
| 2018-05-01 10:30:00 | -6.4021 | 41 | 25 | 15 | -6.4021 | 31 | 30 | 16 |
| 2018-05-01 12:00:00 | -1.7971 | 45 | 27 | 12 | -1.7971 | 32 | 30 | 12 |
| 2018-05-01 10:30:00 | 0.5276 | 50 | 35 | 18 | 0.5276 | 41 | 25 | 13 |
| 2018-05-01 11:00:00 | -0.4941 | 33 | 33 | 16 | -0.4941 | 44 | 34 | 13 |
| 2018-05-01 10:30:00 | -0.6783 | 53 | 33 | 18 | -0.6783 | 54 | 33 | 16 |
| 2018-05-01 10:00:00 | -0.1293 | 38 | 31 | 14 | -0.1293 | 42 | 31 | 17 |
| 2018-05-01 9:30:00 | -0.2703 | 30 | 35 | 11 | -0.2703 | 30 | 35 | 16 |
+---------------------+----------+-------------------------------+-------------------------------+-------------------------------+----------+-------------------------------+-------------------------------+-------------------------------+
3) 数据透视:对于提供的日期列表或从提供的日期到特定关键度量值的+-5天:跨天比较度量值:
+-------------+---------+-------------+-------------+-------------+-------------+---------+-------------+-------------+-------------+-------------+---------+
| time of day | date-5 | date-4 | date-3 | date-2 | date-1 | date | date+1 | date+2 | date+3 | date+4 | date+5 |
+-------------+---------+-------------+-------------+-------------+-------------+---------+-------------+-------------+-------------+-------------+---------+
| 14:30:00 | -0.3673 | 0.5276 | 0.5276 | 0.5276 | 0.5276 | -0.3673 | 0.5276 | 0.5276 | 0.5276 | 0.5276 | -0.3673 |
| 14:00:00 | 4.2187 | 0.5276 | 0.5276 | 0.5276 | 0.5276 | 4.2187 | 0.5276 | 0.5276 | 0.5276 | 0.5276 | 4.2187 |
| 13:30:00 | -2.3875 | 4.2187 | 4.2187 | 4.2187 | 4.2187 | -2.3875 | 4.2187 | 4.2187 | 4.2187 | 4.2187 | -2.3875 |
| 13:00:00 | -4.4194 | 5.449066667 | 5.449066667 | 5.449066667 | 5.449066667 | -4.4194 | 5.449066667 | 5.449066667 | 5.449066667 | 5.449066667 | -4.4194 |
| 12:30:00 | -6.4021 | 4.2187 | 4.2187 | 4.2187 | 4.2187 | -6.4021 | 4.2187 | 4.2187 | 4.2187 | 4.2187 | -6.4021 |
| 12:00:00 | -1.7971 | 0.5276 | 0.5276 | 0.5276 | 0.5276 | -1.7971 | 0.5276 | 0.5276 | 0.5276 | 0.5276 | -1.7971 |
| 11:30:00 | 0.5276 | 7.985716667 | 7.985716667 | 7.985716667 | 7.985716667 | 0.5276 | 7.985716667 | 7.985716667 | 7.985716667 | 7.985716667 | 0.5276 |
| 11:00:00 | -0.4941 | 4.2187 | 4.2187 | 4.2187 | 4.2187 | -0.4941 | 4.2187 | 4.2187 | 4.2187 | 4.2187 | -0.4941 |
| 10:30:00 | -0.6783 | 0.5276 | 0.5276 | 0.5276 | 0.5276 | -0.6783 | 0.5276 | 0.5276 | 0.5276 | 0.5276 | -0.6783 |
| 10:00:00 | -0.1293 | 9.522366667 | 9.522366667 | 9.522366667 | 9.522366667 | -0.1293 | 9.522366667 | 9.522366667 | 9.522366667 | 9.522366667 | -0.1293 |
| 9:30:00 | -0.2703 | 4.2187 | 4.2187 | 4.2187 | 4.2187 | -0.2703 | 4.2187 | 4.2187 | 4.2187 | 4.2187 | -0.2703 |
+-------------+---------+-------------+-------------+-------------+-------------+---------+-------------+-------------+-------------+-------------+---------+
4) 有一个称为metric_lg的度量,它表示基于url计数的url的寿命或生命周期。因此,假设从指定的日期或组的第一个时间戳开始,根据其计数计算某些度量聚合,即,对于单个url,范围将是1-5、5-10、10-20、20-50、50-80、80-200、200-1000、1000-10000、10000+:让我们称它们为阶段a、b、c、d、e、f、g、h、i。然而,问题是,这个计数需要累积,即,从它在组中的出现开始,从组开始。假设一个组7184是在2018-05-01 10:00:00开始的,7174是在2018-04-30 12:00:00开始的,那么两个组中出现的一个特定url将从各自组的开始累积其度量值,即7184中的生命周期阶段将是从7184开始累积度量值,即。,2018-05-01 10:00:00,其在7174中的生命周期阶段将是从7174开始的公制长度的累积,即2018-04-30 12:00:00。
因此,对于所提供的组列表,类似这样的操作将有所帮助:根据其度量\ lg生命周期阶段计算其他度量聚合,并比较按生命周期阶段划分的组性能。
+---------------------+--------------------+---------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+
| timestamp | A_aggregate_metric | B _aggregate_metric | C_aggregate_metric | D_aggregate_metric | E_aggregate_metric | F_aggregate_metric | G_aggregate_metric | H_aggregate_metric | I_aggregate_metric |
+---------------------+--------------------+---------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+
| 2018-05-01 14:30:00 | -0.3673 | 0.5276 | 0.5276 | 0.5276 | 0.5276 | -0.3673 | 0.5276 | 0.5276 | 0.5276 |
| 2018-05-01 14:00:00 | 4.2187 | 0.5276 | 0.5276 | 0.5276 | 0.5276 | 4.2187 | 0.5276 | 0.5276 | 0.5276 |
| 2018-05-01 13:30:00 | -2.3875 | 4.2187 | 4.2187 | 4.2187 | 4.2187 | -2.3875 | 4.2187 | 4.2187 | 4.2187 |
| 2018-05-01 13:00:00 | -4.4194 | 5.449066667 | 5.449066667 | 5.449066667 | 5.449066667 | -4.4194 | 5.449066667 | 5.449066667 | 5.449066667 |
| 2018-05-01 10:30:00 | -6.4021 | 4.2187 | 4.2187 | 4.2187 | 4.2187 | -6.4021 | 4.2187 | 4.2187 | 4.2187 |
| 2018-05-01 12:00:00 | -1.7971 | 0.5276 | 0.5276 | 0.5276 | 0.5276 | -1.7971 | 0.5276 | 0.5276 | 0.5276 |
| 2018-05-01 10:30:00 | 0.5276 | 7.985716667 | 7.985716667 | 7.985716667 | 7.985716667 | 0.5276 | 7.985716667 | 7.985716667 | 7.985716667 |
| 2018-05-01 11:00:00 | -0.4941 | 4.2187 | 4.2187 | 4.2187 | 4.2187 | -0.4941 | 4.2187 | 4.2187 | 4.2187 |
| 2018-05-01 10:30:00 | -0.6783 | 0.5276 | 0.5276 | 0.5276 | 0.5276 | -0.6783 | 0.5276 | 0.5276 | 0.5276 |
| 2018-05-01 10:00:00 | -0.1293 | 9.522366667 | 9.522366667 | 9.522366667 | 9.522366667 | -0.1293 | 9.522366667 | 9.522366667 | 9.522366667 |
| 2018-05-01 9:30:00 | -0.2703 | 4.2187 | 4.2187 | 4.2187 | 4.2187 | -0.2703 | 4.2187 | 4.2187 | 4.2187 |
+---------------------+--------------------+---------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+
如果您需要数据上下文,假设有三个度量:度量1:以美元表示的收入度量2:以美元表示的成本度量lg:以千为单位的流量计数
ps:mysql优于python这样做是可取的,因为其中的一些步骤将用于创建自定义视图,这样就可以经常查看这些视图并对其进行进一步的分析。
非常感谢,真的
暂无答案!
目前还没有任何答案,快来回答吧!