hadoop—在hive中将json数据从一个表插入到另一个表

zbdgwd5y 于 2021-05-29 发布在 Hadoop

关注(0)|答案(2)|浏览(460)

我想根据数据上的关键字段将json数据从一个表插入到其他表中。
我的数据是这样的
{“rtype”：{“ver”：“1”，“os”：“ms”，“type”：“ns”，“vehicle”：“mh-3412”，“mod”：{“version”：[{“”：{“xyz”：“123.dfer”，“founder”：“3.0”，“ght”：“florida”，“fashion”：“fg45”，“cdc”：“new”，“dof”：“yes”，“ts”：“2000-04-01t00:00.171z”}}}}
{“rtype”：{“ver”：“1”，“os”：“ms”，“type”：“ns”，“vehicle”：“mh-3412”，“mod”：{“version”：[{“gap”：{“xvy”：“123.dfer”，“fah”：“3.0”，“ght”：“florida”，“fashion”：“fg45”，“cdc”：“new”，“dof”：“yes”，“ts”：“2000-04-01t00:00.171z”}}}}
{“rtype”：{“ver”：“1”，“os”：“ms”，“type”：“ns”，“vehicle”：“mh-3412”，“mod”：{“version”：[{“box”：{“vog”：“123.dfer”，“fah”：“3.0”，“fax”：“florida”，“fashion”：“fg45”，“cdc”：“new”，“dof”：“yes”，“ts”：“2000-04-01t00:00.171z”}}}}
这里基于版本，不管是“box”还是“gap”还是“”，我想将特定json行上的字段填充到另一个表中
例如：如果版本是“gap”，则填充一个表中的特定行如果是“box”，则填充到另一个表中。。。我是说所有的一排盒子。。。
如何使用hive实现这一点。请帮忙。
注意：我的json数据在一个表中作为string类型的列

hadoop Hive JSON

来源：https://stackoverflow.com/questions/43383544/inserting-json-data-from-one-table-to-another-tables-in-hive

2条答案

按热度按时间

clj7thdc1#

演示

create table src (myjson string);

insert into src values
    ('{"Rtype":{"ver":"1","os":"ms","type":"ns","vehicle":"Mh-3412","MOD":{"Version":[{"ABC":{"XYZ":"123.dfer","founder":"3.0","GHT":"Florida","fashion":"fg45","cdc":"new","dof":"yes","ts":"2000-04-01T00:00:00.171Z"}}]}}}')
   ,('{"Rtype":{"ver":"1","os":"ms","type":"ns","vehicle":"Mh-3412","MOD":{"Version":[{"GAP":{"XVY":"123.dfer","FAH":"3.0","GHT":"Florida","fashion":"fg45","cdc":"new","dof":"yes","ts":"2000-04-01T00:00:00.171Z"}}]}}}')
   ,('{"Rtype":{"ver":"1","os":"ms","type":"ns","vehicle":"Mh-3412","MOD":{"Version":[{"BOX":{"VOG":"123.dfer","FAH":"3.0","FAX":"Florida","fashion":"fg45","cdc":"new","dof":"yes","ts":"2000-04-01T00:00:00.171Z"}}]}}}')
;

create table trg_abc (myjson string);
create table trg_gap (myjson string);
create table trg_box (myjson string);

from src
insert into trg_abc select myjson where get_json_object(myjson,'$.Rtype.MOD.Version[0].ABC') is not null
insert into trg_gap select myjson where get_json_object(myjson,'$.Rtype.MOD.Version[0].GAP') is not null
insert into trg_box select myjson where get_json_object(myjson,'$.Rtype.MOD.Version[0].BOX') is not null
;

赞(0）回复(0）举报 2021-05-29

omjgkv6w2#

首先，您需要将数据作为json存储在配置单元表中：
我假设您的配置单元表是外部的（通常是-请检查它） SHOW CREATE TABLE your_table ).
例如，如果是这样，整个数据集就位于某个hdfs/s3路径中 s3a://your_bucket/your_jsons_location/ 下载json-udf-1.3.7-jar-with-dependencies.jar并运行 ADD JARS s3a://your_bucket/lib/json-udf-1.3.7-jar-with-dependencies.jar; 然后必须为每个json模式创建一个专用的json表：

CREATE EXTERNAL TABLE boxes
(Rtype struct<ver:string,os:string,type:string,vehicle:string,MOD:struct<Version:array<struct<BOX:struct<VOG:string,FAH:string,FAX:string,fashion:string,cdc:string,dof:string,ts:string>>>>>)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' location 's3a://your_bucket/your_jsons_location/';

CREATE EXTERNAL TABLE gaps
(Rtype struct<ver:string,os:string,type:string,vehicle:string,MOD:struct<Version:array<struct<GAP:struct<XVY:string,FAH:string,GHT:string,fashion:string,cdc:string,dof:string,ts:string>>>>>)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' location 's3a://your_bucket/your_jsons_location/';

CREATE EXTERNAL TABLE abcs
(Rtype struct<ver:string,os:string,type:string,vehicle:string,MOD:struct<Version:array<struct<ABC:struct<XYZ:string,founder:string,GHT:string,fashion:string,cdc:string,dof:string,ts:string>>>>>)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' location 's3a://your_jsons_location/';

现在如果你要跑：

SELECT * FROM boxes;
SELECT * FROM gaps;
SELECT * FROM abcs;

您将看到每个表只正确地解析了匹配的json（根据create语句中指定的模式）。每个表中不匹配的为空。
要筛选出不相关的记录，请运行： SELECT * FROM s WHERE Rtype.mod.version[0]. IS NOT NULL; 注意：这个完整的解释假设json存储在hive表的外部（我特别使用了s3，但也可以是hdfs）

赞(0）回复(0）举报 2021-05-29

我来回答

hadoop—在hive中将json数据从一个表插入到另一个表

2条答案

相关问题

热门标签

最新问答