1、UDF函数可以直接应用于select语句,对查询结构做格式化处理后,再输出内容。
2、编写UDF函数的时候需要注意一下几点:
a)自定义UDF需要继承import org.apache.hadoop.hive.ql.exec.UDF。
b)需要定义并实现evaluate函数。
3、步骤
a)把程序打包放到目标机器上去;
b)进入hive客户端,添加jar包:
hive> add jar /hiveUDF.jar;
c)创建临时函数:
hive> create temporary function getType as 'com.edu.hive.udf.HiveUDF';
d)查询HQL语句:
select *,getType(booktype) from book;
select id,name,getType(booktype) from book;
e)销毁临时函数:
hive> drop temporary function getType;
注:UDF只能实现一进一出的操作,如果需要实现多进一出,则需要实现UDAF
1、创建book分区表,以bookType分区
建表
hive> create table book (id bigint,name string) partitioned by (bookType string)
row format delimited fields terminated by '\t';
OK
Time taken: 0.134 seconds
加载数据
vim book.wlwx
1 zhaohuanwansui
2 douluodalu
3 doupochangqiong
4 qindi
5 jiushen
6 binhuomochu
7 shanlaingshishen
8 guangzhizi
9 tunshixinkong
10 shenmu
11 qlqshi
hive> load data local inpath '/book.wlwx' into table book partition (bookType='wangluowenxue');
Copying data from file:/book.txt
Copying file: file:/book.txt
Loading data to table default.book partition (booktype=wangluowenxue)
Partition default.book{booktype=wangluowenxue} stats: [numFiles=1, numRows=0, totalSize=148, rawDataSize=0]
OK
Time taken: 0.702 seconds
vim book.jy
1 yuwen
2 shuxue
3 yingyu
4 shehui
5 jisuanji
6 makeshi
7 shenwu
hive> load data local inpath '/book.jy' into table book partition (bookType='jiaoyu');
2、Java代码
package com.edu.hive.udf;
import java.util.HashMap;
import java.util.Map;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.hive.ql.exec.UDF;
public class HiveUDF extends UDF{
private static Map<String, String> map = new HashMap<>();
static {
map.put("wangluowenxue", "网络文学");
map.put("jiaoyu", "课本");
}
private Text text = new Text();
public Text evaluate(Text bookType) {
String booktype = bookType.toString();
String value = map.get(booktype);
if(value == null) {
value = "其他";
}
text.set(value);
return text;
}
}
3、打包上传到虚拟机
4、添加jar包(在hive命令行里面执行)
hive> add jar /hiveUDF.jar;
Added /hiveUDF.jar to class path
Added resource: /hiveUDF.jar
5、创建临时函数:
hive> create temporary function getType as 'com.edu.hive.udf.HiveUDF';
OK
Time taken: 0.081 seconds
6、查询HQL语句:
select *,getType(booktype) from book;
hive> select *,getType(booktype) from book;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1532966988105_0007, Tracking URL = http://hadoop03:8088/proxy/application_
1532966988105_0007/
Kill Command = /softWare/hadoop-2.2.0/bin/hadoop job -kill job_1532966988105_0007
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2018-07-30 14:39:28,422 Stage-1 map = 0%, reduce = 0%
2018-07-30 14:39:36,079 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.4 sec
MapReduce Total cumulative CPU time: 1 seconds 400 msec
Ended Job = job_1532966988105_0007
MapReduce Jobs Launched:
Job 0: Map: 1 Cumulative CPU: 1.4 sec HDFS Read: 510 HDFS Write: 608 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 400 msec
OK
1 yuwen jiaoyu 课本
2 shuxue jiaoyu 课本
3 yingyu jiaoyu 课本
4 shehui jiaoyu 课本
5 jisuanji jiaoyu 课本
6 makeshi jiaoyu 课本
7 shenwu jiaoyu 课本
1 zhaohuanwansui wangluowenxue 网络文学
2 douluodalu wangluowenxue 网络文学
3 doupochangqiong wangluowenxue 网络文学
4 qindi wangluowenxue 网络文学
5 jiushen wangluowenxue 网络文学
6 binhuomochu wangluowenxue 网络文学
7 shanlaingshishen wangluowenxue 网络文学
8 guangzhizi wangluowenxue 网络文学
9 tunshixinkong wangluowenxue 网络文学
10 shenmu wangluowenxue 网络文学
11 qlqshi wangluowenxue 网络文学
Time taken: 56.565 seconds, Fetched: 18 row(s)
select id,name,getType(booktype) from book;
hive> select id,name,getType(booktype) from book;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1532966988105_0008, Tracking URL = http://hadoop03:8088/proxy/application_
1532966988105_0008/
Kill Command = /softWare/hadoop-2.2.0/bin/hadoop job -kill job_1532966988105_0008
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2018-07-30 14:43:28,955 Stage-1 map = 0%, reduce = 0%
2018-07-30 14:43:36,627 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.39 sec
MapReduce Total cumulative CPU time: 1 seconds 390 msec
Ended Job = job_1532966988105_0008
MapReduce Jobs Launched:
Job 0: Map: 1 Cumulative CPU: 1.39 sec HDFS Read: 510 HDFS Write: 405 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 390 msec
OK
1 yuwen 课本
2 shuxue 课本
3 yingyu 课本
4 shehui 课本
5 jisuanji 课本
6 makeshi 课本
7 shenwu 课本
1 zhaohuanwansui 网络文学
2 douluodalu 网络文学
3 doupochangqiong 网络文学
4 qindi 网络文学
5 jiushen 网络文学
6 binhuomochu 网络文学
7 shanlaingshishen 网络文学
8 guangzhizi 网络文学
9 tunshixinkong 网络文学
10 shenmu 网络文学
11 qlqshi 网络文学
Time taken: 36.267 seconds, Fetched: 18 row(s)
7、将查询结果保存到HDFS中
create table result row format delimited fields terminated by '\t'
as select id,name,getType(booktype) from book;
8、销毁临时函数:
hive> drop temporary function getType;
OK
Time taken: 0.018 seconds
hive> select id,name,getType(name) from book;
FAILED: SemanticException [Error 10011]: Line 1:15 Invalid function 'getType'
内容来源于网络,如有侵权,请联系作者删除!