hive-如何在serde中处理多个引号

acruukt9  于 2021-05-31  发布在  Hadoop
关注(0)|答案(0)|浏览(250)

我有源文件csv和数据如下所示
“201814”,“39”,“0598824”,“黄色夹套陷阱w”,“piege guep.jau,ouest”,“act”,“7/20/2016”,“c/e”
“,”05“,”st“,”n“,”15“,”2484“,”985.3999999999999998“,”43.66“,”3762.36“,”53.05“,”n“,”5.83“,”7.9900“,”0000“,”0000“,”0000“,”3.82“,”3.8181“,”7162“,”英镑国际“,”d“,”12“,”yjtd-db12-w-“,”12“,”32“,”0“,”0“,”0“,”0“,”3.68“,”0“,”3.8181“,”7162“,”英镑
为了加载数据,我使用下面的create语句和serde

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
   "separatorChar" = "|",
   "quoteChar"     = '\"',
   "escapeChar"    = '\\')

问题是在“\”之后,文件中的任何数据都将变为null
你能告诉我怎么处理吗?
我的完整ddl

CREATE EXTERNAL TABLE
    excess_inventory
    (
        whole_record string,
        yyyyww string,
        excess_wks_num string,
        product_num string,
        eng_desc string,
        fr_desc string,
        status string,
        corp_status_change_date string,
        whse_region string,
        whse_id string,
        channel_cd string,
        eap_ind string,
        fwos string,
        non_alloc_qty string,
        excess_qty string,
        excess_cube string,
        excess_inventory_dollars string,
        monthly_storage_cost string,
        deal_600 string,
        go_ind string,
        next_5_deals string,
        reg_adlr string,
        reg_retail string,
        r52_best_promo_adlr string,
        r52_best_promo_retail string,
        landed_cost string,
        corp_cost string,
        vendor_num string,
        vendor_nm string,
        vendor_origin string,
        vendor_moq string,
        vendor_part_num string,
        vendor_lead_tm string,
        total_lead_tm string,
        ingate_qty string,
        on_order_qty string,
        dealer_restriction_cd string,
        quote_cost string,
        casting_charge string,
        action_cd string,
        action_yyyyww string,
        action_qty string,
        sugg_adlr string,
        comments string,
        create_yyyyww string,
        user_nm string,
    batch_ts timestamp
) 
PARTITIONED BY (partition_batch_ts bigint)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
   "separatorChar" = "|",
   "quoteChar"     = '\"',
   "escapeChar"    = '\\') 
 STORED AS TEXTFILE
LOCATION
'db/excess_inventory/table'
TBLPROPERTIES('skip.header.line.count'='1','serialization.null.format'='');

还要让我知道“separatorchar”=“|”,是用来表示数据作为管道分隔符保存在hdfs中,还是我们必须在源文件中指定分隔符?

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题