HDFS 我使用配置单元成功地将数据从源表复制到分区表,但在分区表中找不到数据

nkhmeac6  于 2022-12-09  发布在  HDFS
关注(0)|答案(1)|浏览(175)

首先,我使用hive将数据从本地文件加载到未分区的表中
正在创建原始数据文件:

stephen@stephen-VirtualBox:~/Workspace$ cat> static_source_demo.txt
11,test,2300,admin,c1 
12,test2,2220,IT,c2 
21,test3,2342,admin,c1 
34,test5,2422,admin,c2 
35,test6,2411,admin1,c1

创建未分区表

hive> CREATE TABLE employee_source_demo ( eid int, name string, 
>     salary string, destination string,city string) 
>     ROW FORMAT DELIMITED 
>     FIELDS TERMINATED BY ',';
OK
Time taken: 0.153 seconds
hive>

然后将数据从文件加载到源表
将数据加载到源表:

hive> load data local inpath '/home/stephen/Workspace/static_source_demo.txt' into table employee_source_demo; 
Loading data to table zipcodes.employee_source_demo
OK
Time taken: 0.773 seconds

I确认数据是否在表中

hive> SELECT * FROM employee_source_demo;
OK
11  test    2300    admin   c1 
12  test2   2220    IT  c2 
21  test3   2342    admin   c1 
34  test5   2422    admin   c2 
35  test6   2411    admin1  c1 
Time taken: 0.228 seconds, Fetched: 5 row(s)
hive>

现在,我在同一数据库中创建分区表

hive> CREATE TABLE  employee_part1 ( eid int, name String, 
    > salary String, destination String) PARTITIONED by (city string) 
    > ROW FORMAT DELIMITED 
    > FIELDS TERMINATED BY ','; 
OK
Time taken: 0.151 seconds
hive>

在此之后,我尝试将数据插入到新表中,同时考虑到分区。

hive> INSERT INTO TABLE employee_part1  PARTITION (city='c1') SELECT eid, name, salary, 
destination FROM employee_source_demo WHERE city='c1';

我认为一切都很顺利。下面是我在构建/执行过程中得到的消息

WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = stephen_20220114230858_14789266-c13d-4e53-b411-474ac5bcbde7
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1642123763239_0009, Tracking URL = http://stephen-VirtualBox:8088/proxy/application_1642123763239_0009/
Kill Command = /home/stephen/opt/hadoop-2.7.3/bin/hadoop job  -kill job_1642123763239_0009
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2022-01-14 23:09:11,139 Stage-1 map = 0%,  reduce = 0%
2022-01-14 23:09:22,019 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.23 sec
MapReduce Total cumulative CPU time: 2 seconds 230 msec
Ended Job = job_1642123763239_0009
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://localhost:9000/user/hive/warehouse/zipcodes.db/employee_part1/city=c1/.hive-staging_hive_2022-01-14_23-08-58_343_2569185367107439586-1/-ext-10000
Loading data to table zipcodes.employee_part1 partition (city=c1)
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 2.23 sec   HDFS Read: 5027 HDFS Write: 57 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 230 msec
OK
Time taken: 25.388 seconds

我不认为有错误。所以,我检查了如下所示的分区表。什么都没有。

hive> SELECT * FROM employee_part1;
OK
Time taken: 0.34 seconds
hive>

我还查了Hive仓库。好像文件在那里但没有数据

hive> !hadoop fs -ls /user/hive/warehouse/zipcodes.db/employee_part1/city=c1;
Found 1 items
-rwxrwxr-x   1 stephen supergroup          0 2022-01-14 23:09 /user/hive/warehouse/zipcodes.db/employee_part1/city=c1/000000_0
hive> !hadoop fs -cat /user/hive/warehouse/zipcodes.db/employee_part1/city=c1/000000_0;
hive>

我会很感激任何解决方案我不知道我做错了什么

bvjveswy

bvjveswy1#

请尝试以下语句
'MSCK修复表employee_part1 '

相关问题