首先,我使用hive将数据从本地文件加载到未分区的表中
正在创建原始数据文件:
stephen@stephen-VirtualBox:~/Workspace$ cat> static_source_demo.txt
11,test,2300,admin,c1
12,test2,2220,IT,c2
21,test3,2342,admin,c1
34,test5,2422,admin,c2
35,test6,2411,admin1,c1
创建未分区表
hive> CREATE TABLE employee_source_demo ( eid int, name string,
> salary string, destination string,city string)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ',';
OK
Time taken: 0.153 seconds
hive>
然后将数据从文件加载到源表
将数据加载到源表:
hive> load data local inpath '/home/stephen/Workspace/static_source_demo.txt' into table employee_source_demo;
Loading data to table zipcodes.employee_source_demo
OK
Time taken: 0.773 seconds
I确认数据是否在表中
hive> SELECT * FROM employee_source_demo;
OK
11 test 2300 admin c1
12 test2 2220 IT c2
21 test3 2342 admin c1
34 test5 2422 admin c2
35 test6 2411 admin1 c1
Time taken: 0.228 seconds, Fetched: 5 row(s)
hive>
现在,我在同一数据库中创建分区表
hive> CREATE TABLE employee_part1 ( eid int, name String,
> salary String, destination String) PARTITIONED by (city string)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ',';
OK
Time taken: 0.151 seconds
hive>
在此之后,我尝试将数据插入到新表中,同时考虑到分区。
hive> INSERT INTO TABLE employee_part1 PARTITION (city='c1') SELECT eid, name, salary,
destination FROM employee_source_demo WHERE city='c1';
我认为一切都很顺利。下面是我在构建/执行过程中得到的消息
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = stephen_20220114230858_14789266-c13d-4e53-b411-474ac5bcbde7
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1642123763239_0009, Tracking URL = http://stephen-VirtualBox:8088/proxy/application_1642123763239_0009/
Kill Command = /home/stephen/opt/hadoop-2.7.3/bin/hadoop job -kill job_1642123763239_0009
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2022-01-14 23:09:11,139 Stage-1 map = 0%, reduce = 0%
2022-01-14 23:09:22,019 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.23 sec
MapReduce Total cumulative CPU time: 2 seconds 230 msec
Ended Job = job_1642123763239_0009
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://localhost:9000/user/hive/warehouse/zipcodes.db/employee_part1/city=c1/.hive-staging_hive_2022-01-14_23-08-58_343_2569185367107439586-1/-ext-10000
Loading data to table zipcodes.employee_part1 partition (city=c1)
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 2.23 sec HDFS Read: 5027 HDFS Write: 57 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 230 msec
OK
Time taken: 25.388 seconds
我不认为有错误。所以,我检查了如下所示的分区表。什么都没有。
hive> SELECT * FROM employee_part1;
OK
Time taken: 0.34 seconds
hive>
我还查了Hive仓库。好像文件在那里但没有数据
hive> !hadoop fs -ls /user/hive/warehouse/zipcodes.db/employee_part1/city=c1;
Found 1 items
-rwxrwxr-x 1 stephen supergroup 0 2022-01-14 23:09 /user/hive/warehouse/zipcodes.db/employee_part1/city=c1/000000_0
hive> !hadoop fs -cat /user/hive/warehouse/zipcodes.db/employee_part1/city=c1/000000_0;
hive>
我会很感激任何解决方案我不知道我做错了什么
1条答案
按热度按时间bvjveswy1#
请尝试以下语句
'MSCK修复表employee_part1 '