INSERT INTO TABLE test_dev_db.test_1 VALUES
('A','B',124,1),
('A','B',123,2),
('B','C',133,1),
('G','G',231,1);
假设您已经从文件中加载了以下数据
INSERT INTO TABLE test_dev_db.test_2 VALUES
('A','B',222,1),
('K','D',228,1),
('G','G',241,1);
以下是您的问题:
WITH CTE AS (
SELECT col1,col2,value,version FROM test_dev_db.test_1
UNION
SELECT col1,col2,value,version FROM test_dev_db.test_2
)
insert overwrite table test_dev_db.test_1
SELECT a.col1,a.col2,a.value, row_number() over(partition by a.col1,a.col2 order by a.col1,a.col1) as new_version
FROM CTE a;
hive> select * from test_dev_db.test_1;
OK
A B 123 1
A B 124 2
A B 222 3
B C 133 1
G G 231 1
G G 241 2
K D 228 1
13条答案
按热度按时间6yoyoihd1#
-+
|列1 |列2 |值| ts |新的\u版本|
+----+----+-----+
wb1gzix02#
---------+-------+
合并后
+----+----+-----+
ukqbszuj3#
---------+-------+
|a | b | 777 | 2019-01-01 00:00:00 | 1|
|k | d | 228 | 2019-01-01 00:00:00 | 1|
|g | g | 241 | 2019-01-01 00:00:00 | 1|
+----+----+-----+
63lcw9qa4#
---------+
jei2mxaa5#
-+
|b | c | 133 | 2018-01-03 00:00:00 | 1|
|k | d | 228 | 2019-01-01 00:00:00 | 1|
|a | b | 999 | 2018-01-01 00:00:00 | 1|
|a | b | 888 | 2018-01-02 00:00:00 | 2|
|a | b | 777 | 2019-01-01 00:00:00 | 3|
|g | g | 231 | 2018-01-01 00:00:00 | 1|
|g | g | 241 | 2019-01-01 00:00:00 | 2|
+----+----+-----+
m1m5dgzv6#
-+
kq0g1dla7#
---------+-------+
||列1 |列2 |值| ts |版本|
+----+----+-----+
vlf7wbxs8#
---------+-------+
||列1 |列2 |值| ts |版本|
+----+----+-----+
yrwegjxp9#
---------+
ncgqoxb010#
---------+
b1uwtaje11#
---------+-------+
|a | b | 999 | 2018-01-01 00:00:00 | 1|
|a | b | 888 | 2018-01-02 00:00:00 | 2|
|b | c | 133 | 2018-01-03 00:00:00 | 1|
|g | g | 231 | 2018-01-01 00:00:00 | 1|
+----+----+-----+
rfbsl7qr12#
现有主配置单元表:
假设您已经从文件中加载了以下数据
以下是您的问题:
对于spark:
创建从文件和配置单元表读取的Dataframe并合并它们
保存到Hive
fhg3lkii13#
---------+-------+
不接收来自外部系统的版本,但如果我们需要它进行比较,那么它将始终是1
Hive测向
+----+----+-----+