我有个专栏叫 uf
包含json对象数组的。这是一个模型:
[
{"type": "browserId", "name": "", "value": "unknown"},
{"type": "campaign", "name": "", "value": "om_227dec0082a5"},
{"type": "custom", "name": "2351350529", "value": "10148"},
{"type": "custom", "name": "9501713387", "value": "true"},
{"type": "custom", "name": "9517735577", "value": "true"},
{"type": "custom", "name": "9507402548", "value": "true"},
{"type": "custom", "name": "9733902068", "value": "true"}
]
我试图把它作为儿童唱片,但由于某些原因,我找不到正确的方法来取消它第一。然后我注意到我的整个数组被 Package 到另一个json对象中。。
我就在这里:
我尝试了简单的选择,结果是:
[{type=[{"type": "browserId", "name": "", "value": "ff"}, name=null, value=null}]
此列的定义如下:
{
"Name": "uf",
"Type": "array<struct<type:string,name:string,value:string>>"
}
定义是否不正确,这就是为什么我将整个数组 Package 在另一个json数组中?
--编辑这里是我的csv文件(制表符分隔)的例子。在过去的两天里,我一直在试图弄清楚是不是json让glue无法将列识别为json数组,但我用简单的json数组创建了一个新列,并将其正确地分配为 array<struct
但在询问之后,我得到了与上面完全相同的问题
timestamp project_id campaign_id experiment_id variation_id layer_holdback audience_names end_user_id uuid session_id snippet_revision user_ip user_agent user_engine user_engine_version referer global_holdback event_type event_name uf active_views event_features event_metrics event_uuid
1570326511 74971132 11089500404 11097730080 11078120202 false [] oeu1535997971348r0.4399811351004357 AUTO 6540 5.91.170.0 Mozilla/5.0 (Linux; Android 7.0; SAMSUNG SM-G925F Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) SamsungBrowser/9.2 Chrome/67.0.3396.87 Mobile Safari/537.36 js 0.128.0 https://www.zavamed.com/uk/account/ false view_activated 10832783364 [{"type": "browserId", "name": "", "value": "unknown"}, {"type": "device", "name": "", "value": "mobile"}, {"type": "device_type", "name": "", "value": "phone"}, {"type": "referrer", "name": "", "value": "https:\/\/www.google.co.uk\/"}, {"type": "source_type", "name": "", "value": "campaign"}, {"type": "currentTimestamp", "name": "", "value": "-1631518596"}, {"type": "offset", "name": "", "value": "-60"}] [] [] [] 4926a5f1-bbb5-4553-9d0b-b26f773fa0f4
1条答案
按热度按时间b1zrtrql1#
我上传了一个示例csv文件到s3与您提供的内容。然后我在上面运行了一个胶水爬虫。下面是我最终得到的一个表模式:
正如你所看到的,这列
uf
作为弦,我并不感到惊讶。为了取消此列的注解,我必须手动将其转换为正确的类型ARRAY(JSON)
:结果:
然后我想创建一个雅典娜视图,在哪里,专栏
uf
正确铸造:但是,我得到了以下错误:
列uf的列类型无效:不支持的配置单元类型:json
我猜,这是列的模式吗
uf
要么太复杂,胶水爬虫,以正确地识别它或只是简单地不支持所使用的serde,即。'org.apache.hadoop.mapred.TextInputFormat'
或者'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
.