运行python udf的配置单元twitter表在关闭操作符时出现配置单元运行时错误

okxuctiv 于 2021-06-26 发布在 Hive

关注(0)|答案(0)|浏览(280)

我正在尝试在hive中运行python自定义项，以便对flume捕获的twitter数据进行情绪分析。
我的twitter表代码：

CREATE EXTERNAL TABLE tweets (
  id bigint, 
  created_at string,
  source STRING,
   favorited BOOLEAN,
   retweeted_status STRUCT<
     text:STRING,
     user:STRUCT<screen_name:STRING,name:STRING>,
     retweet_count:INT>,
   entities STRUCT<
     urls:ARRAY<STRUCT<expanded_url:STRING>>,
     user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
     hashtags:ARRAY<STRUCT<text:STRING>>>,
  lang string,
  retweet_count int,
  text string,
  user STRUCT<
     screen_name:STRING,
     name:STRING,
     friends_count:INT,
     followers_count:INT,
     statuses_count:INT,
     verified:BOOLEAN,
     utc_offset:INT,
     time_zone:STRING>
       )
PARTITIONED BY (datehour int)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION  'hdfs://192.168.0.73:8020/user/flume/tweets'

我的python代码：

import hashlib
import sys
for line in sys.stdin:
    line = line.strip()
    (lang, text) = line.split('\t')
    positive = set(["love", "good", "great", "happy", "cool", "best", "awesome", "nice", "helpful", "enjoyed"])
    negative = set(["hate", "bad", "stupid", "terrible", "unhappy"])
    words = text.split()
    word_count = len(words)
    positive_matches = [1 for word in words if word in positive]
    negative_matches = [-1 for word in words if word in negative]
    st = sum(positive_matches) + sum(negative_matches)
    if st > 0:
        print ('\t'.join([lang, text, 'positive', str(word_count)]))
    elif st < 0:
        print ('\t'.join([lang, text, 'negative', str(word_count)]))
    else:
        print ('\t'.join([lang, text, 'neutral', str(word_count)]))

最后是我的Hive查询：

ADD JAR /tmp/json-serde-1.3.9-SNAPSHOT-jar-with-dependencies.jar;
ADD FILE /tmp/my_py_udf.py;
SELECT
TRANSFORM (lang, text)
USING 'python my_py_udf.py'
AS  (lang, text, sentiment, word_count)
FROM tweets

通过此查询，我在关闭运算符时出错。
如果在python udf中仅使用一个变量，则查询将成功运行，前提是：

text = line.replace('\n',' ')

它可能来自分裂（'\t'）中的序列吗？
有人能帮忙吗？在过去的10天里，我对这件事很讨厌。。。

Hive python user-defined-functions twitter

来源：https://stackoverflow.com/questions/45442232/hive-twitter-table-running-python-udf-gives-hive-runtime-error-while-closing-ope

暂无答案！

目前还没有任何答案，快来回答吧！

我来回答

运行python udf的配置单元twitter表在关闭操作符时出现配置单元运行时错误

暂无答案！

相关问题

热门标签

最新问答