数据插入问题

s5a0g9ez  于 2021-06-24  发布在  Hive
关注(0)|答案(1)|浏览(300)

所以我在向hql代码中添加csv文件并在hdfs上运行它时遇到了这个问题。我发现当插入数据时,它在分区部分得到空值,一些列得到删除,我尝试了许多不同的方法来插入数据,但我仍然得到这个奇怪的符号和丢失的列,就像它不能读取csv文件,这里是pic输入图像描述这里和这里的代码`

CREATE database covid_db;

use covid_db;

SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions=500;
set hive.exec.max.dynamic.partitions.pernode=500;

CREATE TABLE IF NOT EXISTS covid_db.covid_staging 
(
 Country                            STRING,
 Total_Cases                        DOUBLE,
 New_Cases                          DOUBLE,
 Total_Deaths                       DOUBLE,
 New_Deaths                         DOUBLE,
 Total_Recovered                    DOUBLE,
 Active_Cases                       DOUBLE,
 Serious                            DOUBLE,
 Tot_Cases                          DOUBLE,
 Deaths                             DOUBLE,
 Total_Tests                        DOUBLE,
 Tests                              DOUBLE,
 CASES_per_Test                     DOUBLE,
 Death_in_Closed_Cases              STRING,
 Rank_by_Testing_rate               DOUBLE,
 Rank_by_Death_rate                 DOUBLE,
 Rank_by_Cases_rate                 DOUBLE,
 Rank_by_Death_of_Closed_Cases      DOUBLE
)
ROW FORMAT DELIMITED FIELDS TERMINATED by ','
STORED as TEXTFILE
LOCATION '/user/cloudera/ds/COVID_HDFS_LZ'
tblproperties ("skip.header.line.count"="1", "serialization.null.format" = "''");

CREATE EXTERNAL TABLE IF NOT EXISTS covid_db.covid_ds_partitioned 
(
 Country                            STRING,
 Total_Cases                        DOUBLE,
 New_Cases                          DOUBLE,
 Total_Deaths                       DOUBLE,
 New_Deaths                         DOUBLE,
 Total_Recovered                    DOUBLE,
 Active_Cases                       DOUBLE,
 Serious                            DOUBLE,
 Tot_Cases                          DOUBLE,
 Deaths                             DOUBLE,
 Total_Tests                        DOUBLE,
 Tests                              DOUBLE,
 CASES_per_Test                     DOUBLE,
 Death_in_Closed_Cases              STRING,
 Rank_by_Testing_rate               DOUBLE,
 Rank_by_Death_rate                 DOUBLE,
 Rank_by_Cases_rate                 DOUBLE,
 Rank_by_Death_of_Closed_Cases      DOUBLE
)
PARTITIONED BY (COUNTRY_NAME STRING)
STORED as TEXTFILE
LOCATION '/user/cloudera/ds/COVID_HDFS_PARTITIONED';

FROM
covid_db.covid_staging
INSERT INTO TABLE covid_db.covid_ds_partitioned PARTITION(COUNTRY_NAME)
SELECT *,Country WHERE Country is not null;

CREATE EXTERNAL TABLE covid_db.covid_final_output 
(
 TOP_DEATH                          STRING,
 TOP_TEST                           STRING
)
PARTITIONED BY (COUNTRY_NAME STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED by ','
STORED as TEXTFILE
LOCATION '/user/cloudera/ds/COVID_FINAL_OUTPUT';

`

mwg9r5ms

mwg9r5ms1#

第一:您正在检查文件内容,分区列不存储在文件中,而是存储在元数据中。动态创建的分区还有key=value格式的目录。所以,您在文件中看到的最后一列不是分区列,而是按已关闭案例的死亡排序。
第二:您没有在第二个表ddl中指定分隔符以及空格式。默认分隔符为“\001”(ctrl-a)。您可以指定分隔符,例如tab(\t)和所需的null:

ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
NULL DEFINED AS ''
STORED AS TEXTFILE;

但如果您希望能够区分空字符串和空字符串,最好不要重新定义空格式。

相关问题