配置单元中只加载字符串定义的列,即带有int和double的列为null

ix0qys7i  于 2021-05-29  发布在  Hadoop
关注(0)|答案(2)|浏览(384)

配置单元中只加载字符串定义的列,即带有int和double的列为null

create table命令

create table A(
id STRING,
member_id STRING,
loan_amnt DOUBLE,   
funded_amnt DOUBLE,
`funded_amnt_inv` DOUBLE,
`term` STRING,
`int_rate`  STRING, 
`installment` DOUBLE,   
`grade` STRING, 
`sub_grade` STRING, 
`emp_title` STRING, 
`emp_length` STRING,    
`home_ownership` STRING,    
`nnual_inc` INT,
`verification_status` STRING,
`issue_d` STRING,
`loan_status` STRING,
`pymnt_plan`    STRING,
`url`   STRING,
`desc`  STRING,
`purpose`   STRING,
`title` STRING,
`zip_code` STRING,  
`addr_state`    STRING,
`dti`   DOUBLE,
`delinq_2yrs`   INT,
`earliest_cr_line` STRING,  
`inq_last_6mths`    STRING,
`mths_since_last_delinq`    STRING,
`mths_since_last_record`    STRING,
`open_acc`  INT,
`pub_rec`   INT,
`revol_bal` INT,
`revol_util`    STRING,    
`total_acc` INT,    
`initial_list_status`   STRING,    
`out_prncp` DOUBLE,    
`out_prncp_inv` DOUBLE,
`total_pymnt`   DOUBLE,
`total_pymnt_inv`   DOUBLE,
`total_rec_prncp`   DOUBLE,
`total_rec_int` DOUBLE,
`total_rec_late_fee`    DOUBLE,
`recoveries`    DOUBLE,
`collection_recovery_fee`   DOUBLE,
`last_pymnt_d`  STRING,
`last_pymnt_amnt`   DOUBLE,
`next_pymnt_d`  STRING,
`last_credit_pull_d`    STRING,
`collections_12_mths_ex_med`    INT,
`mths_since_last_major_derog`   STRING,
`policy_code`   STRING,
`application_type`  STRING,
`annual_inc_joint`  STRING,
`dti_joint` STRING,
`verification_status_joint` STRING, 
`acc_now_delinq`    STRING,
`tot_coll_amt`  STRING,
`tot_cur_bal`   STRING,
`open_acc_6m`   STRING,
`open_il_6m`    STRING,
`open_il_12m`   STRING,
`open_il_24m`   STRING,
`mths_since_rcnt_il` STRING,    
`total_bal_il`   STRING,
`il_util`    STRING,
`open_rv_12m ` STRING,  
`open_rv_24m`  STRING,  
`max_bal_bc`  STRING,   
`all_util`   STRING,    
`total_credit_rv`   STRING,
`inq_fi`  STRING,   
`total_fi_tl`    STRING,
`inq_last_12m`  STRING
)  

ROW FORMAT delimited
fields terminated by ','

STORED AS TEXTFILE;

将数据加载到表a中

load data local inpath '/home/cloudera/Desktop/Project-3/1/LoanStats3a.txt' into table A;

选择数据

hive> SELECT * FROM A LIMIT 1;

输出

“1077501”“1296599”null“36个月”“10.65%”null“b”“b2”“”“10年以上”“租金”null“已核实”“2011年12月”“已付清”“n”https://www.lendingclub.com/browse/loandetail.action?loan_id=1077501“”2011年12月22日添加的借款人>我需要升级我的业务技术。
“信用卡”“计算机”“860xx”“az”“null”“1985年1月”“1”“”“”“null”“83.7%”“null”“f”“null”“2015年1月”“null”“2015年12月”“null”“1”“个人”
"" "" "" "0" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""

sg24os4d

sg24os4d1#

我找到了解决办法:-
创建表stat2(id字符串、member\u id int、loan\u amnt float、funded\u amnt float、funded\u amnt\u inv float、term字符串、int\u rate字符串、分期付款float、grade字符串、sub\u grade字符串、emp\u title字符串、emp\u length字符串、home\u ownership字符串、annual\u inc float、verification\u status字符串、issue\d date、loan\u status字符串、pymnt\u plan字符串、,url字符串,descp字符串,purpose字符串,title字符串,zip代码字符串,addr\u state字符串,dti float,delinq\u 2yrs float,earlish\u cr\u line字符串,inq\u last\u 6mths float,mths\u since\u last\u delinq float,mths\u since\u last\u record float,open\u acc float,pub\u rec float,revol\u bal float,revol\u util字符串,total\u acc float,initial\u list\u status字符串,out\u prncp float,out\u prncp\u inv float,total\u pymnt float,total\u pymnt\u inv float,total\u rec\u prncp float,total\u rec\u int float,total\u rec\u late fee float,recoveries float,collection\u recovery\u fee float,last\u pymnt\d string,last\u pymnt\amnt float,next\u pymnt\d string,last\u credit\u pull\d string,collections\u 12个月前\u med float,自上一个季度以来的月度主要季度浮动、政策代码浮动、应用程序类型字符串、年度公司联合浮动、dti联合浮动、验证状态联合浮动、会计现退浮动、合计金额浮动、合计当前余额浮动、未结账户6m浮动、未结账户6m浮动、未结账户12m浮动、未结账户24m浮动、未结账户24m浮动、月度余额浮动、合计余额浮动,il\ U util float、open\ U rv\ U 12m float、open\ U rv\ U 24m float、max\ U bal\ U bc float、all\ U util float、total\ U rev\ U hi\ U lim float、inq\ U fi float、total\ U cu\ U tl float、inq\ U last\ U 12m float)
行格式serde'org.apache.hadoop.hive.serde2.opencsvserde'和serdeproperty(
“separatorchar”=“,”,
“quotechar”=“\”“
)
存储为textfile tblproperty(“skip.header.line.count”=“2”,“skip.footer.line.count”=“4”);

7rfyedvj

7rfyedvj2#

您的csv似乎在各个字段周围包含引号。周围的引号不受配置单元支持,因此它们成为字段的一部分。对于字符串字段,引号将成为字符串的一部分。对于数字字段,引号使字段成为无效数字,从而导致空值。
请参阅csv serde以获取支持csv文件中引号的serde。

相关问题