apachespark—将数据从源配置单元表的多列复制到不同行中的目标配置单元表的单列

6ss1mwsb  于 2021-06-24  发布在  Hive
关注(0)|答案(2)|浏览(220)

我需要将数据从一个配置单元源表复制到另一个目标表。下面是带有示例数据的源表结构:

source_table
Userid  Name    Phone1   Phone2  Phone3  Address1   Address2    Address3
123     Jitu    123456   987654  111111  DELHI      GURGAON     NOIDA       
234     Mark    123456   987654  111111  UK         USA         IND

在将数据从源复制到目标时,我的要求是在目标表的单个列中包含phone1、phone2、phone3以及相应的address1、address2和address3列。下面是目标表中数据的外观:

Target_table
Userid  Name    Phone_no    Address
123     Jitu    123456      DELHI
123     Jitu    987654      GURGAON
123     Jitu    111111      NOIDA
234     Mark    123456      UK
234     Mark    987654      USA
234     Mark    111111      IND

我知道最简单的方法是使用配置单元查询语言或sparkDataframe,对源表中的每个电话和地址列的目标表进行多次插入。
有没有其他有效的方法我可以用来实现这一点。

qc6wkl3g

qc6wkl3g1#

以防万一,如果您也对配置单元解决方案感兴趣,则横向视图在连接多个阵列结果集时会产生笛卡尔积。使用posexplode可以获得相同的结果,如下所示:

select Userid,Name,phone,address
from source_table
lateral view posexplode(array(Phone1,Phone2,Phone3))  valphone as x,phone
lateral view posexplode(array(Address1,Address2,Address3)) valaddress as t,address
where x=t
;

hive> set hive.cli.print.header=true;

userid  name    phone   address
123     Jitu    123456  DELHI
123     Jitu    987654  GURGAON
123     Jitu    111111  NOIDA
234     Mark    123456  UK
234     Mark    987654  USA
234     Mark    111111  IND
Time taken: 2.759 seconds, Fetched: 6 row(s)
wgx48brx

wgx48brx2#

可以为每个列索引多次选择原始Dataframe,然后通过“union”将选定的Dataframe组合成一个:

val df = Seq(
  (123, "Jitu", "123456", "987654", "111111", "DELHI", "GURGAON", "NOIDA"),
  (234, "Mark", "123456", "987654", "111111", "UK", "USA", "IND")
).toDF(
  "Userid", "Name", "Phone1", "Phone2", "Phone3", "Address1", "Address2", "Address3"
)

val columnIndexes = Seq(1, 2, 3)
val onlyOneIndexDfs = columnIndexes.map(idx =>
  df.select(
    $"Userid",
    $"Name",
    col(s"Phone$idx").alias("Phone_no"),
    col(s"Address$idx").alias("Address")))

val result = onlyOneIndexDfs.reduce(_ union _)

输出:

+------+----+--------+-------+
|Userid|Name|Phone_no|Address|
+------+----+--------+-------+
|123   |Jitu|123456  |DELHI  |
|123   |Jitu|111111  |NOIDA  |
|123   |Jitu|987654  |GURGAON|
|234   |Mark|123456  |UK     |
|234   |Mark|987654  |USA    |
|234   |Mark|111111  |IND    |
+------+----+--------+-------+

相关问题